Unveiling Patterns in Maranhão's 2023 ENEM Results Through Unsupervised Machine Learning

Authors

  • Anderson Amorim Alves Programa de Pós-Graduação em Engenharia da Computação e Sistemas (PECS) / Universidade Estadual do Maranhão (UEMA) https://orcid.org/0009-0005-9785-8365
  • Omar Andres Carmona Cortes Departamento de Computação (DComp) / Instituto Federal do Maranhão (IFMA) https://orcid.org/0000-0002-5805-2490

DOI:

https://doi.org/10.5753/rbie.2026.6215

Keywords:

Educational Data Mining, Unsupervised Machine Learning, Pattern Discovery, Association Rules, ENEM

Abstract

This investigation examines the performance of students from Maranhão, Brazil, in the 2023 edition of the National High School Exam (ENEM) using unsupervised machine learning to uncover latent patterns in large-scale educational data. Leveraging the CRISP-DM framework on microdata from millions of ENEM participants, we applied Recursive Feature Elimination (RFE) with a Random Forest classifier to select key socioeconomic variables, followed by association rule mining using the FP-Growth algorithm across multiple experimental configurations. The results reveal strong associations between low academic performance and factors such as parental education and occupation, lack of household technology (e.g., computers and washing machines), and gender. These findings demonstrate the utility of unsupervised learning for descriptive educational analytics, providing practical insights for targeted policy-making, resource allocation, and regional equity monitoring. This research addresses the underrepresentation of Maranhão in the educational data mining literature and proposes a scalable analytical framework applicable to other developing regions. Despite limitations in data completeness, the approach offers a replicable model for using artificial intelligence to inform public education strategies in socially vulnerable areas.

Downloads

Não há dados estatísticos.

Referências

Alalawi, K., Athauda, R., & Chiong, R. (2024). An extended learning analytics framework integrating machine learning and pedagogical approaches for student performance prediction and intervention. International Journal of Artificial Intelligence in Education. https://doi.org/10.1007/s40593-024-00429-7 [GS Search].

Alvarez-Garcia, M., Arenas-Parra, M., & Ibar-Alonso, R. (2024). Uncovering student profiles. an explainable cluster analysis approach to PISA 2022. Computers & Education, 223, 105166. https://doi.org/10.1016/j.compedu.2024.105166 [GS Search].

Brazil. (1996). Law of guidelines and bases for national education, lei de diretrizes e bases [Accessed on Sept. 10, 2024]. [Link]

Brazil. (2016). Establishes the open data policy of the federal executive branch [Accessed on Sept. 10, 2024]. [Link]

Carvalho, A. C. P. L. F., Menezes, A. G., & Bonidia, R. P. (2024). Data science: Fundamentals and applications (1st). LTC.

Castro, L. N., & Ferrari, D. G. (2016). Introduction to data mining: Basic concepts, algorithms, and applications (1st). Saraiva.

Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., & Wirth, R. (2000). CRISP-DM 1.0: Step-by-step data mining guide. [Link] [GS Search].

Chen, W., & Wellman, B. (2007). Minding the cyber-gap: The internet and social inequality. In A. Lareau & D. Conley (Eds.), The blackwell companion to social inequalities (pp. 523–545). Blackwell Publishing. https://doi.org/10.1002/9780470996973.ch23 [GS Search].

Dutra, J. F., Firmino Júnior, J. B., & Fernandes, D. Y. S. (2023). Fatores que podem interferir no desempenho de estudantes no ENEM: Uma revisão sistemática da literatura. Rev. Bras. Informática Educ., 31, 323–351. https://doi.org/https://doi.org/10.5753/rbie.2023.3087 [GS Search].

Faceli, K., Lorena, A. C., Gama, J., Almeida, T. A., & Carvalho, A. C. P. L. F. (2024). Artificial intelligence: A machine learning approach (2nd). LTC.

Gabriel, M. (2024). Artificial intelligence: From zero to the metaverse (1st). Atlas.

Goldschmidt, R., Passos, E., & Bezerra, E. (2015). Data mining: Concepts, techniques, algorithms, guidelines, and applications. Elsevier.

Gomes, T., Gouveia, R., & Batista, M. C. (2017). Dados educacionais abertos: Associações em dados dos inscritos do exame nacional do ensino médio. Proc. Workshop de Informática na Escola (WIE), 23, 895–904. https://doi.org/10.5753/cbie.wie.2017.895 [GS Search].

Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. SIGMOD Rec., 29(2), 1–12. https://doi.org/10.1145/335191.33537 [GS Search].

Inep. (2024a). ENEM 2023 microdata [Accessed on Apr. 30, 2024]. [Link]

Inep. (2024b). MEC and Inep release results of the 2023 school census [Accessed on Sept. 6, 2024]. [Link]

Kaur, M., Singh, M., & Saini, M. (2024). Analyzing the relation among different factors leading to Ph.D. dropout using numerical association rule mining. Education and Information Technologies, 29, 375–399. https://doi.org/10.1007/s10639-023-12260-z [GS Search].

Lima, C. C. V., & Brighenti, C. R. G. (2023). Performance of students from Minas Gerais in the national high school exam considering socioeconomic variables. Educação e Pesquisa, 49, e253303. https://doi.org/10.1590/S1678-4634202349253303 [GS Search].

Ma, Y., Cain, K., & Ushakova, A. (2024). Application of cluster analysis to identify different reader groups through their engagement with a digital reading supplement. Computers & Education, 214, 105025. https://doi.org/10.1016/j.compedu.2024.105025 [GS Search].

Marconi, M. d. A., & Lakatos, E. M. (2022). Scientific methodology (8th). Atlas.

Martin, P. P., Kranz, D., & Graulich, N. (2024). Revealing rubric relations: Investigating the interdependence of a research-informed and a machine learning-based rubric in assessing student reasoning in chemistry. International Journal of Artificial Intelligence in Education. https://doi.org/10.1007/s40593-024-00440-y [GS Search].

Merkys, G., Vaitkevicius, S., Bubelienė, D., & Brazdeikis, V. (2025). The influence of social conditionality on the results in computer science test of graduates. Informatics in Education, 24(1), 145–173. https://doi.org/10.15388/infedu.2025.04 [GS Search].

Ministério da Educação. (2024). Ideb: Maranhão advances in early years of elementary education [Accessed on Sept. 6, 2024]. [Link]

Munim, Z. H., Kjeldsberg, F., Bustgaard, M., Bhagat, S., Haavardtun, P., Kim, T.-E., Lindroos, E., Thorvaldsen, H., Nyairo, F., & Lampiola, J. (2025). Predictive performance assessment in simulation training using machine learning. International Journal of Artificial Intelligence in Education. https://doi.org/10.1007/s40593-025-00464-y [GS Search].

Netto, A., & Maciel, F. (2021). Python for data science and machine learning: Simplified. Alta Books.

Ouassif, K., & Ziani, B. (2025). Predicting university major selection and academic performance through the combination of apriori algorithm and deep neural network. Education and Information Technologies, 30, 333–346. https://doi.org/10.1007/s10639-024-13022-1 [GS Search].

Sayak, P. (2020). Python feature selection tutorial: A beginner's guide [Accessed on Dec. 12, 2024]. [Link]

Silva, L. A., Morino, A. H., & Sato, T. M. C. (2014). Prática de mineração de dados no exame nacional do ensino médio. Proc. Workshops of the Brazilian Congress on Informatics in Education (WCBIE), 3, 651. https://doi.org/10.5753/cbie.wcbie.2014.651 [GS Search].

Silva, L. A., Peres, S. M., & Boscarioli, C. (2016). Introduction to data mining: With applications in R (1st). Elsevier.

Silva, V. A. A., Moreno, L. L. O., Gonçalves, L. B., Soares, S. S. R. F., & Souza Júnior, R. R. (2020). Identificação de desigualdades sociais a partir do desempenho dos alunos do ensino médio no ENEM 2019 utilizando mineração de dados. Proc. Brazilian Symposium on Informatics in Education (SBIE), 31, 72–81. https://doi.org/10.5753/cbie.sbie.2020.72 [GS Search].

Soares, R. C., Weber Neto, N., Coutinho, L. R., Santos, D. V., Silva, F. J. S., & Teles, A. S. (2023). Minerando dados para entender os fatores de influência da qualidade educacional do Maranhão. Revista Brasileira de Informática na Educação, 31, 378–406. https://doi.org/10.5753/rbie.2023.2831 [GS Search].

Souza, A. E., Santos, L. M. S., Larrucaim, I. M., & Besarria, C. N. (2022). Determinantes do desempenho no enem na região nordeste: Uma análise de dados em painel do período de 2015 a 2019. Rev. Bras. Estud. Reg. Urbanos, 15(4), 690–711. https://doi.org/https://doi.org/10.54766/rberu.v15i4.915 [GS Search].

Srikant, R., & Agrawal, R. (1997). Mining generalized association rules [Data Mining]. Future Generation Computer Systems, 13(2), 161–180. https://doi.org/https://doi.org/10.1016/S0167-739X(97)00019-8 [GS Search].

Travitzki, R. (2021). Possíveis contribuições do Enem para a democratização do acesso à educação superior no Brasil. Em Aberto, 34(112). https://doi.org/10.24109/2176-6673.emaberto.34i112.4993 [GS Search].

Vaarma, M., & Li, H. (2024). Predicting student dropouts with machine learning: An empirical study in finnish higher education. Technology in Society, 76, 102474. https://doi.org/10.1016/j.techsoc.2024.102474 [GS Search].

Arquivos adicionais

Published

2026-04-02

Como Citar

ALVES, A. A.; CORTES, O. A. C. Unveiling Patterns in Maranhão’s 2023 ENEM Results Through Unsupervised Machine Learning. Revista Brasileira de Informática na Educação, [S. l.], v. 34, p. 404–428, 2026. DOI: 10.5753/rbie.2026.6215. Disponível em: https://journals-sol.sbc.org.br/index.php/rbie/article/view/6215. Acesso em: 3 abr. 2026.

Issue

Section

Artigos