Exploring Feature Reduction for Dropout Predicting in Higher Education in Brazil

Authors

DOI:

https://doi.org/10.5753/rbie.2025.4619

Keywords:

Dropout, Feature Selection, Machine Learning, Higher Education System

Abstract

Dropout within the higher education system is a prevalent and intricate phenomenon characterized by a multitude of reasons that can differ significantly from one context to another. Developing machine learning models and discerning the key features for diverse contexts poses a considerable challenge. In this paper, we propose a process based on feature selection to create and evaluate machine learning models for predicting dropout in the higher education system. The approach not only outlines the essential steps for model development in any context but also emphasizes the identification of the most critical features. We conducted a comprehensive study across five distinct contexts within Brazilian higher education, specifically focusing on face-to-face courses. Through this process, we identified the most important features for predicting dropout. The results highlight that the correlation between a student's enrollment duration and the percentage of the course completed emerges as the primary predictor of dropout. However, we noticed the fundamental role of context in predicting dropout. Moreover, in all scenarios explored, it was possible to create more accurate models with a reduced set of features compared to the original models.

Downloads

Não há dados estatísticos.

Biografia do Autor

André Menolli, Northern Paraná State University / State University of Londrina

 

 

 

 

Referências

Anand, N., Sehgal, R., Anand, S., & Kaushik, A. (2021). Feature selection on educational data using Boruta algorithm. International Journal of Computational Intelligence Studies, 10(1), 27–35. https://doi.org/10.1504/IJCISTUDIES.2021.113826 [GS Search]

Battiti, R. (1994). Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 5(4), 537–550. https://doi.org/10.1109/72.298224 [GS Search]

Berka, A., & Marek, M. (2021). Aplicação de Técnicas de Aprendizado de Máquina para Predição de Risco de Evasão Escolar em Instituições Públicas de Ensino Superior no Brasil. Revista Brasileira de Informática na Educação, 28, 838–863. https://doi.org/10.5753/rbie.2020.28.0.838 [GS Search]

Bhalaji, N., Kumar, K. S., & Selvaraj, C. (2018). Empirical study of feature selection methods over classification algorithms. International Journal of Intelligent Systems Technologies and Applications, 17(1-2), 98–108. https://doi.org/10.1504/IJISTA.2018.091590 [GS Search]

Cannistrà, T. C., Silva, J. C., & Cortes, O. A. C. (2018). Técnicas de mineração de dados: Um estudo de caso da evasão no ensino superior do Instituto Federal do Maranhão. Revista Brasileira de Computação Aplicada, 10(3), 11–20. https://doi.org/10.5335/rbca.v10i3.8427 [GS Search]

Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024 [GS Search]

Colpo, M. P., Primo, T. T., Aguiar, M. S., & Cechinel, C. (2024). Mineração de dados educacionais na predição da evasão estudantil: Tendências, oportunidades e desafios. Revista Brasileira de Informática na Educação, 32, 220–256. https://doi.org/10.5753/rbie.2024.3559

Costa, F. J., Bispo, M. S., & Pereira, R. C. F. (2018). Dropout and retention of undergraduate students in management: A study at a Brazilian Federal University. RAUSP Management Journal, 53, 74–85. https://doi.org/10.1016/j.rauspm.2017.12.007 [GS Search]

Delen, D. (2010). A comparative analysis of machine learning techniques for student retention management. Decision Support Systems, 49(4), 498–506. https://doi.org/10.1016/j.dss.2010.06.003 [GS Search]

Demeter, E., Dorodchi, M., Al-Hossami, E., Benedict, A., Slattery Walker, L., & Smail, J. (2022). Predicting first-time-in-college students' degree completion outcomes. Higher Education, 84, 589–609. https://doi.org/10.1007/s10734-021-00790-9 [GS Search]

Djulovic, A., & Li, D. (2013). Towards freshman retention prediction: A comparative study. International Journal of Information and Education Technology, 3(5), 494–500. [link] [GS Search]

Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157–1182. [link] [GS Search]

Hassan, C. A. U., Khan, M. S., & Shah, M. A. (2018). Comparison of machine learning algorithms in data classification. 2018 24th International Conference on Automation and Computing (ICAC), 1–6. https://doi.org/10.23919/IConAC.2018.8748995 [GS Search]

Jesus, J. A., & Gusmão, R. P. (2024). Investigação da evasão estudantil por meio da mineração de dados e aprendizagem de máquina: Um mapeamento sistemático. Revista Brasileira de Informática na Educação, 32. https://doi.org/10.5753/rbie.2024.3466 [GS Search]

Jiménez, O., Jesús, A., & Wong, L. (2023). Model for the Prediction of Dropout in Higher Education in Peru applying Machine Learning Algorithms: Random Forest, Decision Tree, Neural Network and Support Vector Machine. 2023 33rd Conference of Open Innovations Association (FRUCT), 116–124. https://doi.org/10.23919/FRUCT58615.2023.10143068 [GS Search]

Kehm, B. M., Larsen, M. R., & Sommersel, H. B. (2019). Student dropout from universities in Europe: A review of empirical literature. Hungarian Educational Research Journal, 9(2), 147–164. https://doi.org/10.1556/063.9.2019.1.18 [GS Search]

Kira, K., & Rendell, L. A. (1992). A practical approach to feature selection. In Machine learning proceedings 1992 (pp. 249–256). Elsevier. https://doi.org/10.1016/B978-1-55860-247-2.50037-1 [GS Search]

Kiss, B., Nagy, M., Molontay, R., & Csabay, B. (2019). Predicting dropout using high school and first-semester academic achievement measures. 2019 17th international conference on emerging eLearning technologies and applications (ICETA), 383–389. https://doi.org/10.1109/ICETA48886.2019.9040158 [GS Search]

Kononenko, I. (1994). Estimating attributes: Analysis and extensions of relief. European conference on machine learning, 171–182. https://doi.org/10.1007/3-540-57868-4_57 [GS Search]

Kursa, M. B., & Rudnicki, W. R. (2010). Feature Selection with the Boruta Package. Journal of statistical software, 36, 1–13. https://doi.org/10.18637/jss.v036.i11 [GS Search]

Lazar, C., Taminau, J., Meganck, S., Steenhoff, D., Coletta, A., Molter, C., de Schaetzen, V., Duque, R., Bersini, H., & Nowe, A. (2012). A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM transactions on computational biology and bioinformatics, 9(4), 1106–1119. https://doi.org/10.1109/TCBB.2012.33 [GS Search]

Liu, H., & Setiono, R. (1995). Chi2: Feature selection and discretization of numeric attributes. Proceedings of 7th IEEE international conference on tools with artificial intelligence, 388–391. https://doi.org/10.1109/TAI.1995.479783 [GS Search]

Lobo, M. B. C. M. (2012). Panorama da evasão no ensino superior brasileiro: Aspectos gerais das causas e soluções. Associação Brasileira de Mantenedoras de Ensino Superior. Cadernos, 25, 14. [GS Search]

Martins, M. V., Baptista, L., Machado, J., & Realinho, V. (2023). Multi-class phased prediction of academic performance and dropout in higher education. Applied Sciences, 13(8), 4702. https://doi.org/10.3390/app13084702 [GS Search]

Matz, S. C., Bukow, C. S., Peters, H., Deacons, C., Dinu, A., & Stachl, C. (2023). Using machine learning to predict student retention from socio-demographic characteristics and app-based engagement metrics. Scientific Reports, 13(1), 5705. https://doi.org/10.1038/s41598-023-32484-w [GS Search]

Maxwell, J. (1992). Understanding and validity in qualitative research. Harvard educational review, 62(3), 279–301. https://doi.org/10.17763/haer.62.3.8323320856251826 [GS Search]

Melo, E. C., & Souza, F. S. H. (2023). Improving the prediction of school dropout with the support of the semi-supervised learning approach. iSys-Brazilian Journal of Information Systems, 16(1), 10:1–10:26. https://doi.org/10.5753/isys.2023.2852 [GS Search]

Menolli, A., Horita, F., Dias, J. J. L., & Coelho, R. (2020). Bi-based methodology for analyzing higher education: A case study of dropout phenomenon in information systems courses. XVI Brazilian Symposium on Information Systems, 1–8. https://doi.org/10.1145/3411564.3411636 [GS Search]

Musso, M. F., Hernández, C. F. R., & Cascallar, E. C. (2020). Predicting key educational outcomes in academic trajectories: A machine-learning approach. Higher Education, 80, 875–894. https://doi.org/10.1007/s10734-020-00520-7 [GS Search]

Nagy, M., & Molontay, R. (2023). Interpretable dropout prediction: Towards XAI-based personalized intervention. International Journal of Artificial Intelligence in Education, 1–27. https://doi.org/10.1007/s40593-023-00331-8 [GS Search]

Nascimento, F. F., Dantas, L. C. O., Castro, A. F., & Queiroz, P. G. G. (2024). Técnicas de mineração de dados e aprendizado de máquina aplicados à evasão estudantil: Um mapeamento sistemático da literatura. Revista Brasileira de Informática na Educação, 32, 270–294. https://doi.org/10.5753/rbie.2024.3296 [GS Search]

Perez, B., Castellanos, C., & Correal, D. (2018). Applying data mining techniques to predict student dropout: A case study. 2018 IEEE 1st Colombian Conference on Applications in Computational Intelligence (ColCACI), 1–6. https://doi.org/10.1109/ColCACI.2018.8484847 [GS Search]

Petersen, K., & Gencel, C. (2013). Worldviews, research methods, and their relationship to validity in empirical software engineering research. 2013 Joint Conference of the 23rd International Workshop on Software Measurement and the 8th International Conference on Software Process and Product Measurement, 81–89. https://doi.org/10.1109/IWSM-Mensura.2013.22 [GS Search]

Rumberger, R. W. (2020). The economics of high school dropouts. The economics of education, 149–158. https://doi.org/10.1016/B978-0-12-815391-8.00012-4 [GS Search]

Rumberger, R. W., Addis, H., Allensworth, E. M., Balfanz, R., Bruch, J., Dillon, E., Duardo, D., Dynarski, M., Furgeson, J., Jayanthi, M., et al. (2017). Preventing dropout in secondary schools. National Center for Education Evaluation; Regional Assistance (NCEE), Institute of Education Sciences, U.S. Department of Education. [link] [GS Search]

Sastry, K., Goldberg, D., & Kendall, G. (2013). Genetic algorithms. In E. K. Burke & G. Kendall (Eds.), Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques (pp. 97–125). Springer US. https://doi.org/10.1007/0-387-28356-0_4 [GS Search]

Song, Z., Sung, S.-H., Park, D.-M., & Park, B.-K. (2023). All-year dropout prediction modeling and analysis for university students. Applied Sciences, 13(2), 1143. https://doi.org/10.3390/app13021143 [GS Search]

Teodoro, L. A., & Kappel, M. A. A. (2020). Aplicação de Técnicas de Aprendizado de Máquina para Predição de Risco de Evasão Escolar em Instituições Públicas de Ensino Superior no Brasil. Revista Brasileira de Informática na Educação, 28, 838–863. https://doi.org/10.5753/rbie.2020.28.0.838 [GS Search]

Theng, D., & Bhoyar, K. K. (2024). Feature selection techniques for machine learning: A survey of more than two decades of research. Knowledge and Information Systems, 66(3), 1575–1637. https://doi.org/10.1007/s10115-023-02010-5 [GS Search]

Vaarma, M., & Li, H. (2024). Predicting student dropouts with machine learning: An empirical study in Finnish higher education. Technology in Society, 76, 102474. https://doi.org/10.1016/j.techsoc.2024.102474 [GS Search]

Vora, S., & Yang, H. (2017). A comprehensive study of eleven feature selection algorithms and their impact on text classification. 2017 Computing Conference, 440–449. https://doi.org/10.1109/SAI.2017.8252136 [GS Search]

Yu, L., & Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. Proceedings of the 20th international conference on machine learning (ICML-03), 856–863. [link] [GS Search]

Yu, R., Lee, H., & Kizilcec, R. F. (2021). Should college dropout prediction models include protected attributes? Proceedings of the eighth ACM conference on learning @ scale, 91–100. https://doi.org/10.1145/3430895.3460139 [GS Search]

Zhang, W., Wang, Y., & Wang, S. (2022). Predicting academic performance using tree-based machine learning models: A case study of bachelor students in an engineering department in China. Education and Information Technologies, 27(9), 13051–13066. https://doi.org/10.1007/s10639-022-11170-w [GS Search]

Arquivos adicionais

Published

2025-04-02

Como Citar

MENOLLI, A.; DIONÍSIO, G. M.; FLORIANO, A. S. da P.; COLETI, T. A. Exploring Feature Reduction for Dropout Predicting in Higher Education in Brazil. Revista Brasileira de Informática na Educação, [S. l.], v. 33, p. 106–129, 2025. DOI: 10.5753/rbie.2025.4619. Disponível em: https://journals-sol.sbc.org.br/index.php/rbie/article/view/4619. Acesso em: 5 dez. 2025.

Issue

Section

Artigos