Artificial Intelligence Models for Predicting School Dropout: A systematic review

Authors

DOI:

https://doi.org/10.5753/reic.2025.5523

Keywords:

Distance learning, MOOCs, School dropout, Artificial intelligence, Machine learning, Systematic review

Abstract

Massive Open Online Courses (MOOCs) have emerged as accessible platforms for virtual education. In Brazil, the Ministry of Education (MEC) reported a 474% increase in MOOC enrollments between 2011 and 2021, indicating a growing trend among public and private institutions. However, high dropout rates remain a significant challenge. This study conducts a systematic review of the literature from 2018 to 2023 to investigate the application of artificial intelligence (AI) techniques for predicting student dropout in MOOCs. The results show that Random Forest (RF), Gaussian Naive Bayes (GNB), and Long Short-Term Memory (LSTM) algorithms are the most commonly used for prediction. The KDD CUP 2015 dataset and custom datasets were frequently employed. Precision, Recall, and F1- score were the primary evaluation metrics. The study identified several challenges, including limited and diverse data, the complexity of measuring model effectiveness, and the influence of external factors on student performance. While Random Forest and Naive Bayes algorithms are popular choices, there is a need for more diverse datasets and a deeper understanding of external factors, such as educational policies, to improve prediction accuracy.

Downloads

Download data is not yet available.

References

Alalawi, K., Athauda, R., and Chiong, R. (2023). Contextualizing the current state of research on the use of machine learning for student performance prediction: A systematic literature review. Engineering Reports, 5. DOI: 10.1002/eng2.12699.

Balaji, P., Alelyani, S., Qahmash, A., and Mohana, M. (2021). Contributions of Machine Learning Models towards Student Academic Performance Prediction: A Systematic Review. Applied Sciences, 11:10007. DOI: 10.3390/app112110007.

de Oliveira, C. F., Sobral, S. R., Ferreira, M. J., and Moreira, F. (2021). How Does Learning Analytics Contribute to Prevent Students’ Dropout in Higher Education: A Systematic Literature Review. Big Data and Cognitive Computing, 5:64. DOI: 10.3390/bdcc5040064.

Gamage, D., Staubitz, T., and Whiting, M. (2021). Peer assessment in MOOCs: Systematic literature review. Distance Education, 42:268–289. DOI: 10.1080/01587919.2021.1911626.

Herrera, V. M., Khoshgoftaar, T. M., Villanustre, F., and Furht, B. (2019). Random forest implementation and optimization for Big Data analytics on LexisNexis’s high performance computing cluster platform. Journal of Big Data, 6:1–36. DOI: 10.1186/s40537-019-0232-1.

Jin, C. (2021). Dropout prediction model in MOOC based on clickstream data and student sample weight. Soft Computing, 25:8971–8988. DOI: 10.1007/s00500-021-05795-1.

Mastour, H., Dehghani, T., Jajroudi, M., Moradi, E., Zarei, M., and Eslami, S. (2023). Prediction of medical sciences students’ performance on high-stakes examinations using machine learning models: a protocol for a systematic review. BMJ Open, 13:e064956. DOI: 10.1136/bmjopen-2022-064956.

Ministério da Educação (2022). Ensino a distância cresce 474% em uma década. Available at [link]. Acesso em: 10 abr. 2025.

Moher, D., Liberati, A., Tetzlaff, J., and Altman, D. G. (2009). Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Medicine, 6:e1000097. DOI: 10.1371/journal.pmed.1000097.

Moreno-Marcos, P. M., Alario-Hoyos, C., Munoz-Merino, P. J., and Kloos, C. D. (2019). Prediction in MOOCs: A Review and Future Research Directions. IEEE Transactions on Learning Technologies, 12:384–401. DOI: 10.1109/TLT.2018.2856808.

Mrhar, K., Benhiba, L., Bourekkache, S., and Abik, M. (2021). A Bayesian CNN-LSTM Model for Sentiment Analysis in Massive Open Online Courses MOOCs. International Journal of Emerging Technologies in Learning, 16:216–232. DOI: 10.3991/ijet.v16i23.24457.

Nazir, M., Noraziah, A., Rahmah, M., and Sharma, A. (2023). Examining the potential of machine learning for predicting academic achievement: A systematic review. Fusion: Practice and Applications, 13:71–90. DOI: 10.54216/FPA.130207.

Sato, S. N., Moreno, E. C., Rubio-Zarapuz, A., Dalamitros, A. A., Yañez-Sepulveda, R., Tornero-Aguilera, J. F., and Clemente-Suárez, V. J. (2024). Navigating the New Normal: Adapting Online and Distance Learning in the Post-Pandemic Era. Education Sciences, 14. DOI: 10.3390/educsci14010019.

Silva, J. J. D. and Roman, N. T. (2021). Predicting Dropout in Higher Education: a Systematic Review. In Anais do XXXII Simpósio Brasileiro de Informática na Educação (SBIE 2021), pages 1107–1117. Sociedade Brasileira de Computação - SBC. DOI: 10.5753/sbie.2021.217437.

Valverde-Berrocoso, J., del Carmen Garrido-Arroyo, M., Burgos-Videla, C., and Morales-Cevallos, M. B. (2020). Trends in Educational Research about e-Learning: A Systematic Literature Review (2009–2018). Sustainability 2020, 12:5153. DOI: 10.3390/SU12125153.

Zhang, J., Gao, M., and Zhang, J. (2021). The learning behaviours of dropouts in MOOCs: A collective attention network perspective. Computers Education, 167:104189. DOI: 10.1016/J.COMPEDU.2021.104189.

Published

2025-04-17

How to Cite

Magalhães, P. V. S., Sousa, R. A. da S., Santos, A. C. B., Aguiar, F. J. M., & Milfont, R. T. P. (2025). Artificial Intelligence Models for Predicting School Dropout: A systematic review. Electronic Journal of Undergraduate Research on Computing, 23(1), 48–54. https://doi.org/10.5753/reic.2025.5523

Issue

Section

Full Papers