Predicting the risk of university dropout using machine learning: Characteristics and dynamics of dropout in Engineering courses at UFRJ-Macaé

Authors

DOI:

https://doi.org/10.5753/rbie.2025.5196

Keywords:

Evasion, University, Engineering, Academic performance, Machine learning

Abstract

The phenomenon of university dropout has been widely investigated due to losses in financial resources and labor, so understanding its dynamics is important to identify solutions and preventive strategies. This study uses data science and machine learning techniques to predict the risk of university dropout and analyze the characteristics and dynamics of dropout in Engineering courses at the Federal University of Rio de Janeiro in the city of Macaé, Brazil. Using the CRISP-DM methodology, Logistic Regression, Decision Tree, and eXtreme Gradient Boosting (XGBoost) models were trained, being optimized through Bayesian optimization. According to the results of the models, it was observed that the academic performance of students in their first period is sufficient to identify, with an AUC of 0.80, students who dropped out of the test set. In the exploratory analysis of the data, it was also revealed that more students dropped out than graduated, with 75% of dropouts occurring in the first three semesters. Furthermore, a strong correlation was observed between low academic performance and the risk of dropping out, particularly in the subjects of Calculus and Physics. This work highlights how a university's data can be explored to identify patterns and trends in student behavior and how machine learning can be used as a statistical tool to extract valuable information from large volumes of data, helping to improve quality and accessibility of education in higher education.

Downloads

Download data is not yet available.

References

Alharbi, M. (2020). The Economic Effect of Coronavirus (COVID-19) on Higher Education in Jordan: An Analytical Survey. International Journal of Economics and Business Administration, 8, 521–532. Disponível em: [link].

Aulck, L., Nambi, D., Velagapudi, N., Blumenstock, J., & West, J. (2019). Mining University Registrar Records to Predict First-Year Undergraduate Attrition. International Educational Data Mining Society. [GS Search]

Banachewicz, K., & Massaron, L. (2022). The Kaggle book: Data analysis and machine learning for competitive data science (First edition). Packt Publishing. [GS Search]

Bergstra, J., & Bengio, Y. (2012). Random Search for Hyper-Parameter Optimization. Journal of Machine Learning Research, 13(10), 281–305. [GS Search]

Bielschowsky, C. E., & Amaral, N. C. (2022). O custo do aluno das 2.537 instituições de educação superior brasileiras: cai um mito? Educação & Sociedade, 43 https://doi.org/10.1590/ES.243866 [GS Search]

Carvalhaes, F., Senkevics, A., & Costa Ribeiro, C. (2022). A interseção entre renda, raça e desempenho acadêmico no acesso ao ensino superior brasileiro. Social Science Research Network. [GS Search]

Chapman, P. (2000). CRISP-DM 1.0: Step-by-step data mining guide. Disponível em: [link]

Christo, M. M. S., Resende, L. M. M. d., & Kuhn, T. d. C. G. (2018). Por que os alunos de engenharia desistem de seus cursos – um estudo de caso. Nuances: Estudos sobre Educação, 29. https://doi.org/10.32930/nuances.v29i1.4391 [GS Search]

Coimbra, C. L., Silva, L. B. e., & Costa, N. C. D. (2021). A evasão na educação superior: definições e trajetórias. Educação e Pesquisa, 47, e228764. https://doi.org/10.1590/S1678-4634202147228764 [GS Search]

Crawford, J., Butler-Henderson, K., Rudolph, J., Malkawi, B., Glowatz, M., Burton, R., Magni, P. A., & Lam, S. (2020). COVID-19: 20 countries’ higher education intra-period digital pedagogy responses. Journal of Applied Learning and Teaching, 3(1), 09-28. https://doi.org/10.37074/jalt.2020.3.1.7 [GS Search]

da Silva, I. C. (2023). Mapeamento das experiências dos discentes dos cursos de Engenharia para redução da evasão: Um estudo de caso em uma Universidade Federal no interior do Rio de Janeiro [Trabalho de Conclusão de Curso da Universidade Federal do Rio de Janeiro].

Diaz-Pace, J. A., Cian Berrios, R., Tommasel, A., & Vazquez, H. C. (2022). A Metrics-based Approach for Assessing Architecture-Implementation Mappings. Anais Do XXV Congresso Ibero-Americano Em Engenharia de Software (CIbSE 2022), 16–30. https://doi.org/10.5753/cibse.2022.20960 [GS Search]

Ebner, M., Schön, S., Braun, C., Ebner, M., Grigoriadis, Y., Haas, M., Leitner, P., & Taraghi, B. (2020). COVID-19 Epidemic as E-Learning Boost? Chronological Development and Effects at an Austrian University against the Background of the Concept of “E-Learning Readiness”. Future Internet, 12(6), 94. https://doi.org/10.3390/fi12060094 [GS Search]

Esposito, C., Landrum, G. A., Schneider, N., Stiefl, N., & Riniker, S. (2021). GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning. Journal of Chemical Information and Modeling, 61(6), 2623–2640. https://doi.org/10.1021/acs.jcim.1c00160 [GS Search]

Godoy, E. V., & Almeida, E. d. (2017). A evasão nos cursos de Engenharia e a sua relação com a Matemática: uma análise a partir do COBENGE. Educação Matemática Debate, 1(3), 339–361. https://doi.org/10.24116/emd25266136v1n32017a05 [GS Search]

Gomes, L. B. (2021). Elaboração de modelo de previsão da evasão universitária na Universidade Federal Fluminense através de métodos de aprendizado de máquina. Disponível em: [link]

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2ª ed.). Springer. [GS Search]

Hernández-de-Menéndez, M., Morales-Menendez, R., Escobar, C. A., & Ramírez Mendoza, R. A. (2022). Learning analytics: state of the art. International Journal on Interactive Design and Manufacturing (IJIDeM), 16(3), 1209–1230. https://doi.org/10.1007/s12008-022-00930-0 [GS Search]

Izbicki, R., & dos Santos, T. M. (2020). Aprendizado de máquina: uma abordagem estatística. [GS Search]

Jesus, J. A. d., & Gusmão, R. P. d. (2024). Investigação da Evasão Estudantil por meio da Mineração de Dados e Aprendizagem de Máquina: Um Mapeamento Sistemático. Revista Brasileira de Informática na Educação, 32, 807–841. https://doi.org/10.5753/rbie.2024.3466 [GS Search]

Klitzke, M., & Carvalhaes, F. (2023). Fatores associados à evasão de curso na UFRJ: Uma análise de sobrevivência. Educação em Revista, 39. https://doi.org/10.1590/0102-469837576 [GS Search]

Kuhn, M., & Johnson, K. (2013). Applied Predictive Modeling. Springer. [GS Search]

Le, T., Vo, M. T., Vo, B., Lee, M. Y., & Baik, S. W. (2019). A Hybrid Approach Using Over-sampling Technique and Cost-Sensitive Learning for Bankruptcy Prediction. Complexity, 2019, 1–12. https://doi.org/10.1155/2019/8460934 [GS Search]

Lobo, M. B. C. d. M. (2012). Panorama da evasão no ensino superior brasileiro: Aspectos gerais das causas e soluções. Associação Brasileira de Mantenedoras do Ensino Superior, 23. [GS Search]

Manhães, L. M. B., da Cruz, S. M. S., Costa, R. J. M., Zavaleta, J., & Zimbrão, G. (2012). Previsão de Estudantes com Risco de Evasão Utilizando Técnicas de Mineração de Dados. Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educação - SBIE), 1. [GS Search]

Mienye, I. D., & Sun, Y. (2022). A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access, 10, 99129–99149. https://doi.org/10.1109/ACCESS.2022.3207287 [GS Search]

Murphy, M. P. A. (2020). COVID-19 and emergency eLearning: Consequences of the securitization of higher education for post-pandemic pedagogy. Contemporary Security Policy, 41(3), 492–505. https://doi.org/10.1080/13523260.2020.1761749 [GS Search]

Nagy, M., & Molontay, R. (2018). Predicting Dropout in Higher Education Based on Secondary School Performance. 2018 IEEE 22nd International Conference on Intelligent Engineering Systems (INES), 389–394. https://doi.org/10.1109/INES.2018.8523888 [GS Search]

de Oliveira, C. F., Sobral, S. R., Ferreira, M. J., & Moreira, F. (2021). How Does Learning Analytics Contribute to Prevent Students’ Dropout in Higher Education: A Systematic Literature Review. Big Data and Cognitive Computing, 5(4), 64. https://doi.org/10.3390/bdcc5040064 [GS Search]

Oliveira, R. d. S., & Medeiros, F. P. A. d. (2024). Modelo de Predição de Evasão Escolar com Base em Dados de Autoavaliação de Cursos de Graduação. Revista Brasileira de Informática na Educação, 32, 1–21. https://doi.org/10.5753/rbie.2024.3542 [GS Search]

Oliveira Júnior, J. G. d. (2015, dezembro 8). Identificação de padrões para a análise da evasão em cursos de graduação usando mineração de dados educacionais. Dissertação de Mestrado - Universidade Tecnológica Federal do Paraná. [GS Search]

Raju, V. N. G., Lakshmi, K. P., Jain, V. M., Kalidindi, A., & Padma, V. (2020). Study the Influence of Normalization/ Transformation process on the Accuracy of Supervised Classification. 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), 729–735. https://doi.org/10.1109/ICSSIT48917.2020.9214160 [GS Search]

Regehr, C., & Goel, V. (2020). Managing COVID-19 in a Large Urban Research-Intensive University. Journal of Loss and Trauma, 25, 1–17. https://doi.org/10.1080/15325024.2020.1771846 [GS Search]

Semesp. (2022). Mapa do Ensino Superior. Semesp. Disponível em: [link]

Shekhar, S., Bansode, A., & Salim, A. (2021). A Comparative study of Hyper-Parameter Optimization Tools. 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), 1–6. https://doi.org/10.1109/CSDE53843.2021.9718485 [GS Search]

Silva, H. R. B., & Adeodato, P. J. L. (2012). A data mining approach for preventing undergraduate students’ retention. The 2012 International Joint Conference on Neural Networks (IJCNN), 1–8. https://doi.org/10.1109/IJCNN.2012.6252437 [GS Search]

Solis, M., Moreira, T., Gonzalez, R., Fernandez, T., & Hernandez, M. (2018). Perspectives to Predict Dropout in University Students with Machine Learning. 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), 1–6. https://doi.org/10.1109/IWOBI.2018.8464191 [GS Search]

Teodoro, L. d. A., & Kappel, M. A. A. (2020). Aplicação de Técnicas de Aprendizado de Máquina Para Predição de Risco de Evasão Escolar em Instituições Públicas de Ensino Superior no Brasil. Revista Brasileira de Informática na Educação, 28, 838–863. https://doi.org/10.5753/rbie.2020.28.0.838 [GS Search]

Zhihao, P., Fenglong, Y., & Xucheng, L. (2019). Comparison of the Different Sampling Techniques for Imbalanced Classification Problems in Machine Learning. 2019 11th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), 431–434. https://doi.org/10.1109/ICMTMA.2019.00101 [GS Search]

Published

2025-09-17

How to Cite

CHAGAS, P. A.; GOMIDE, J. S.; SANTANA, L. E. A. dos S. Predicting the risk of university dropout using machine learning: Characteristics and dynamics of dropout in Engineering courses at UFRJ-Macaé. Brazilian Journal of Computers in Education, [S. l.], v. 33, p. 1226–1247, 2025. DOI: 10.5753/rbie.2025.5196. Disponível em: https://journals-sol.sbc.org.br/index.php/rbie/article/view/5196. Acesso em: 30 jan. 2026.

Issue

Section

Articles