Investigation of Student Dropout through Data Mining and Machine Learning: A Systematic Mapping

Authors

DOI:

https://doi.org/10.5753/rbie.2024.3466

Keywords:

Prediction, Classification, Student dropout, Machine learning, Data mining

Abstract

Dropping out of students in schools and universities is a recurrent problem in education, it is both harmful for the student in terms of learning, and generates financial expenses for educations institutions, whether public or private. Studies using data mining (DM) and machine learning (ML) techniques to investigate problems in education are on the rise. Student dropout is one such problem. Through these techniques, it is possible to identify patterns in individuals or groups that may drop out of studies. This article aims to systematically map state-of-the-art articles on the application of DM and ML in data classification in studies on school dropout. The search was carried out in 5 bibliographic databases, ACM Digital Library, IEEE Xplore, Scopus, ScienceDirect, and Web of Science, and returned a total of 336 primary studies. After applying the exclusion and inclusion criteria, 71 relevant studies remained. After extracting data from these studies, it was identified that the experiences with higher education students and in the face-to-face modality are the most recurrent in these articles, the year that most stood out in terms of publication was 2020, and the most frequently used algorithms for building the classification models are algorithms based on decision trees.

Downloads

Download data is not yet available.

References

Aguirre, C. E., & Pérez, J. C. (2020). Predictive data analysis techniques applied to dropping out of university studies. In 2020 XLVI Latin American Computing Conference (CLEI), 512-521. IEEE. DOI: 10.1109/clei52000.2020.00066. [GS Search]

Alban, M. S., & Mauricio, D. (2018). Prediction of university dropout through technological factors: a case study in Ecuador. In Revista Espacios, 39(52). DOI: 10.1109/educon.2018.8363371. [GS Search]

Ameri, S., Fard, M. J., Chinnam, R. B., & Reddy, C. K. (2016). Survival analysis based framework for early prediction of student dropouts. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, 903-912. DOI: 10.1145/2983323.2983351. [GS Search]

Baranyi, M., Nagy, M., & Molontay, R. (2020). Interpretable Deep Learning for University Dropout Prediction. In Proceedings of the 21st Annual Conference on Information Technology Education, 13-19. DOI: 10.1145/3368308.3415382. [GS Search]

Bello, F. A., Kühler, J., Hinrechsen, K., Araya, V., Hidalgo, L., & Jara, J. L. (2020). Using machine learning methods to identify significant variables for the prediction of first-year Informatics Engineering students dropout. In 2020 39th International Conference of the Chilean Computer Science Society (SCCC), 1-5. IEEE. DOI: 10.1109/sccc51225.2020.9281280. [GS Search]

Berens, J., Schneider, K., Görtz, S., Oster, S., & Burghoff, J. (2018). Early detection of students at risk--predicting student dropouts using administrative student data and machine learning methods. CESifo Working Paper. DOI: 10.2139/ssrn.3275433. [GS Search]

Burgos, C., Campanario, M. L., de la Peña, D., Lara, J. A., Lizcano, D., & Martínez, M. A. (2018). Data mining for modeling students’ performance: A tutoring action plan to prevent academic dropout. In Computers & Electrical Engineering, 66, 541-556. Elsevier. DOI: 10.1016/j.compeleceng.2017.03.005. [GS Search]

Chen, Y., Johri, A., & Rangwala, H. (2018). Running out of stem: a comparative study across stem majors of college students at-risk of dropping out early. In Proceedings of the 8th international conference on learning analytics and knowledge, 270-279. DOI: 10.1145/3170358.3170410. [GS Search]

Chung, J. Y., & Lee, S. (2019). Dropout early warning systems for high school students using machine learning. In Children and Youth Services Review, 96, 346-353. Elsevier. DOI: 10.1016/j.childyouth.2018.11.030. [GS Search]

Colpo, M. P., Primo, T. T., Pernas, A. M., & Cechinel, C. (2020). Mineração de dados educacionais na previsão de evasão: uma RSL sob a perspectiva do congresso brasileiro de informática na educação. In Anais do XXXI Simpósio Brasileiro de Informática na Educação, 1102-1111. DOI: 10.5753/cbie.sbie.2020.1102. [GS Search]

Costa, A. G., Queiroga, E., Primo, T. T., Mattos, J. C. B., & Cechinel, C. (2020). Prediction analysis of student dropout in a Computer Science course using Educational Data Mining. In 2020 XV Conferencia Latinoamericana de Tecnologias de Aprendizaje (LACLO), 1-6. IEEE. DOI: 10.1109/laclo50806.2020.9381166. [GS Search]

Coussement, K., Phan, M., De Caigny, A., Benoit, D. F., & Raes, A. (2020). Predicting student dropout in subscription-based online learning environments: The beneficial impact of the logit leaf model. In Decision Support Systems, 135, 113325. Elsevier. DOI: 10.1016/j.dss.2020.113325. [GS Search]

de la Peña, D., Lara, J. A., Lizcano, D., Martínez, M. A., Burgos, C., & Campanario, M. L. (2017). Mining activity grades to model students' performance. In 2017 International Conference on Engineering & MIS (ICEMIS), 1-6. IEEE. DOI: 10.1109/icemis.2017.8272963. [GS Search]

Del Bonifro, F., Gabbrielli, M., Lisanti, G., & Zingaro, S. P. (2020). Student dropout prediction. In International Conference on Artificial Intelligence in Education, 129-140. Springer. DOI: 10.1007/978-3-030-52237-7_11. [GS Search]

de Morais, F. L., Melo, A., Moutinho, M., & Fagundes, R. (2021). Modelos de regressão aplicados na previsão da evasão escolar do ensino básico: uma revisão sistemática da literatura. In Anais do XXXII Simpósio Brasileiro de Informática na Educação, 168-178. DOI: 10.5753/sbie.2021.218504. [GS Search]

Dewan, M. A. A., Lin, F., Wen, D., et al. (2015). Predicting dropout-prone students in e-learning education system. In 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom), 1735-1740. IEEE. DOI: 10.1109/uic-atc-scalcom-cbdcom-iop.2015.315. [GS Search]

Fei, M., & Yeung, D.-Y. (2015). Temporal models for predicting student dropout in massive open online courses. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW), 256-263. IEEE. DOI: 10.1109/icdmw.2015.174. [GS Search]

Figueroa-Cañas, J., & Vinuesa, T. S. (2020). Early prediction of dropout and final exam performance in an online statistics course. In IEEE Revista Iberoamericana de Tecnologias del Aprendizaje, 15(2), 86-94. IEEE. DOI: 10.1109/rita.2020.2987727. [GS Search]

Fu, Q., Gao, Z., Zhou, J., & Zheng, Y. (2021). CLSA: A novel deep learning model for MOOC dropout prediction. In Computers & Electrical Engineering, 94, 107315. Elsevier. DOI: 10.1016/j.compeleceng.2021.107315. [GS Search]

Gamao, A. O., & Gerardo, B. D. (2019). Prediction-based model for student dropouts using modified mutated firefly algorithm. In International Journal of Advanced Trends in Computer Science and Engineering, 8(6), 3461-3469. DOI: 10.30534/ijatcse/2019/122862019. [GS Search]

Guzmán-Castillo, S., Körner, F., Pantoja-García, J. I., Nieto-Ramos, L., Gómez-Charris, Y., Castro-Sarmiento, A., & Romero-Conrado, A. R. (2022). Implementation of a Predictive Information System for University Dropout Prevention. In Procedia Computer Science, 198, 566-571. Elsevier. DOI: 10.1016/j.procs.2021.12.287. [GS Search]

Hegde, V. (2016). Dimensionality reduction technique for developing undergraduate student dropout model using principal component analysis through R package. In 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC) (pp. 1-6). IEEE. DOI: 10.1109/iccic.2016.7919670. [GS Search]

Hegde, V., & Prageeth, P. P. (2018). Higher education student dropout prediction and analysis through educational data mining. In 2018 2nd International Conference on Inventive Systems and Control (ICISC), 694-699. IEEE. DOI: 10.1109/icisc.2018.8398887. [GS Search]

Heredia, D., Amaya, Y., & Barrientos, E. (2015). Student dropout predictive model using data mining techniques. In IEEE Latin America Transactions, 13(9), 3127-3134. IEEE. DOI: 10.1109/tla.2015.7350068. [GS Search]

Hossain, M., Azad, S. B. M. S., Hossen, M. L., Khan, S. I., & Masum, A. K. M. (2022). Predictive Analysis on University Dropout Rate of Bangladesh in Covid-19. In 2022 International Conference on Innovations in Science, Engineering and Technology (ICISET) (pp. 439-444). IEEE. DOI: 10.1109/iciset54810.2022.9775898. [GS Search]

Hutagaol, N., & Suharjito. (2019). Predictive modelling of student dropout using ensemble classifier method in higher education. In Asian Journal of Technology and Computer Science, 4(4), 206-211. ASTES Publishers. DOI: 10.25046/aj040425. [GS Search]

Jarbou, M., Won, D., Gillis-Mattson, J., & Romanczyk, R. (2022). Deep learning-based school attendance prediction for autistic students. In Scientific Reports, 12(1), 1-11. Nature Publishing Group. DOI: 10.1038/s41598-022-05258-z. [GS Search]

Kang, K., & Wang, S. (2018). Analyze and predict student dropout from online programs. In Proceedings of the 2nd International Conference on Compute and Data Analysis, 6-12. DOI: 10.1145/3193077.3193090. [GS Search]

Kotsiantis, S. B., Pierrakeas, C. J., & Pintelas, P. E. (2003). Preventing student dropout in distance learning using machine learning techniques. In International conference on knowledge-based and intelligent information and engineering systems, 267-274. Springer. DOI: 10.1007/978-3-540-45226-3_37. [GS Search]

Kuo, J. Y., Pan, C. W., & Lei, B. (2017). Using stacked denoising autoencoder for the student dropout prediction. In 2017 IEEE International Symposium on Multimedia (ISM) (pp. 483-488). IEEE. DOI: 10.1109/ism.2017.96. [GS Search]

Li, Y., Cui, X., & Zhang, Z. (2022). Dropout Rate Prediction for MOOC based on Inceptiontime Model. In Proceedings of the 7th International Conference on Distance Education and Learning, 54-59. DOI: 10.1145/3543321.3543330. [GS Search]

Limsathitwong, K., Tiwatthanont, K., & Yatsungnoen, T. (2018). Dropout prediction system to reduce discontinue study rate of information technology students. In 2018 5th International Conference on Business and Industrial Research (ICBIR), 110-114. IEEE. DOI: 10.1109/icbir.2018.8391176. [GS Search]

Lottering, R., Hans, R., & Lall, M. (2020). A Machine Learning Approach to Identifying Students at Risk of Dropout: A Case Study. DOI: 10.14569/ijacsa.2020.0111052. [GS Search]

Lottering, R., Hans, R., & Lall, M. (2020). A model for the identification of students at risk of dropout at a university of technology. In 2020 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD) (pp. 1-8). IEEE. DOI: 10.1109/icabcd49160.2020.9183874. [GS Search]

Lykourentzou, I., Giannoukos, I., Nikolopoulos, V., Mpardis, G., & Loumos, V. (2009). Dropout prediction in e-learning courses through the combination of machine learning techniques. In Computers & Education, 53(3), 950-965. Elsevier. DOI: 10.1016/j.compedu.2009.05.010. [GS Search]

Maksimova, N., Pentel, A., & Dunajeva, O. (2020). Predicting First-Year Computer Science Students Drop-Out with Machine Learning Methods: A Case Study. In International Conference on Interactive Collaborative Learning, 719-726. Springer. DOI: 10.1007/978-3-030-68201-9_70. [GS Search]

Manhães, L. M. B., da Cruz, S. M. S., & Zimbrão, G. (2014). WAVE: an architecture for predicting dropout in undergraduate courses using EDM. In Proceedings of the 29th annual acm symposium on applied computing, 243-247. DOI: 10.1145/2554850.2555135. [GS Search]

Martins, M. P., Migueis, V. L., Fonseca, D. S. B., & Gouveia, P. D. F. (2020). Previsão do abandono acadêmico numa instituição de ensino superior com recurso a data mining. [GS Search]

Meca, I., Rabasa, A., Sobrino, E., & López-Espín, J. J. (2020). Early Warning Methodology for dropping out of university degrees. In Eighth International Conference on Technological Ecosystems for Enhancing Multiculturality, 245-249. DOI: 10.1145/3434780.3436596. [GS Search]

Mnyawami, Y. N., Maziku, H. H., & Mushi, J. C. (2022). Enhanced Model for Predicting Student Dropouts in Developing Countries Using Automated Machine Learning Approach: A Case of Tanzanian’s Secondary Schools. In Applied Artificial Intelligence, 36(1), 2071406. Taylor & Francis. DOI: 10.1080/08839514.2022.2071406. [GS Search]

Moseley, L. G., & Mead, D. M. (2008). Predicting who will drop out of nursing courses: a machine learning exercise. In Nurse education today, 28(4), 469-475. Elsevier. DOI: 10.1016/j.nedt.2007.07.012. [GS Search]

Mubarak, A. A., Cao, H., & Hezam, I. M. (2021). Deep analytic model for student dropout prediction in massive open online courses. In Computers & Electrical Engineering, 93, 107271. Elsevier. DOI: 10.1016/j.compeleceng.2021.107271. [GS Search]

Mubarak, A. A., Cao, H., & Zhang, W. (2020). Prediction of students’ early dropout based on their interaction logs in online learning environment. In Interactive Learning Environments, 1-20. Taylor & Francis. DOI: 10.1080/10494820.2020.1727529. [GS Search]

Nagy, M., & Molontay, R. (2018). Predicting dropout in higher education based on secondary school performance. In 2018 IEEE 22nd international conference on intelligent engineering systems (INES) (pp. 389-394). IEEE. DOI: 10.1109/ines.2018.8523888. [GS Search]

Naseem, M., Chaudhary, K., & Sharma, B. (2022). Predicting Freshmen Attrition in Computing Science using Data Mining. In Education and Information Technologies, 1-31. Springer. DOI: 10.1007/s10639-022-11018-3. [GS Search]

Naseem, M., Chaudhary, K., Sharma, B., & Lal, A. G. (2019). Using ensemble decision tree model to predict student dropout in computing science. In 2019 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), 1-8. IEEE. DOI: 10.1109/csde48274.2019.9162389. [GS Search]

Niyogisubizo, J., Liao, L., Nziyumva, E., Murwanashyaka, E., & Nshimyumukiza, P. C. (2022). Predicting student's dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization. In Computers and Education: Artificial Intelligence, 3, 100066. Elsevier. DOI: 10.1016/j.caeai.2022.100066. [GS Search]

Nuankaew, P., Nuankaew, W., Phanniphong, K., Fooprateepsiri, R., & Bussaman, S. (2019). Analysis dropout situation of business computer students at University of Phayao. In International Conference on Interactive Collaborative Learning, 419-432. Springer. DOI: 10.1007/978-3-030-40274-7_42. [GS Search]

Nuanmeesri, S., Poomhiran, L., Chopvitayakun, S., & Kadmateekarun, P. (2022). Improving Dropout Forecasting during the COVID-19 Pandemic through Feature Selection and Multilayer Perceptron Neural Network. In International Journal of Information and Education Technology, 12(9). DOI: 10.18178/ijiet.2022.12.9.1693. [GS Search]

Panagiotakopoulos, T., Kotsiantis, S., Kostopoulos, G., Iatrellis, O., & Kameas, A. (2021). Early Dropout Prediction in MOOCs through Supervised Learning and Hyperparameter Optimization. In Electronics, 10(14), 1701. Multidisciplinary Digital Publishing Institute. DOI: 10.3390/electronics10141701. [GS Search]

Pereira, R. T., & Zambrano, J. C. (2017). Application of decision trees for detection of student dropout profiles. In 2017 16th IEEE international conference on machine learning and applications (ICMLA), 528-531. IEEE. DOI: 10.1109/icmla.2017.0-107. [GS Search]

Perez, B., Castellanos, C., & Correal, D. (2018). Applying data mining techniques to predict student dropout: a case study. In 2018 IEEE 1st colombian conference on applications in computational intelligence (colcaci), 1-6. IEEE. DOI: 10.1109/colcaci.2018.8484847. [GS Search]

Petersen, K., Feldt, R., Mujtaba, S., & Mattsson, M. (2008). Systematic mapping studies in software engineering. In 12th International Conference on Evaluation and Assessment in Software Engineering (EASE) 12, 1-10. DOI: 10.14236/ewic/ease2008.8. [GS Search]

Pradeep, A., Das, S., & Kizhekkethottam, J. J. (2015). Students dropout factor prediction using EDM techniques. In 2015 International Conference on Soft-Computing and Networks Security (ICSNS), 1-7. IEEE. DOI: 10.1109/icsns.2015.7292372. [GS Search]

Pérez, A., Grandón, E. E., Caniupán, M., & Vargas, G. (2018). Comparative analysis of prediction techniques to determine student dropout: Logistic regression vs decision trees. In 2018 37th International Conference of the Chilean Computer Science Society (SCCC), 1-8. IEEE. DOI: 10.1109/sccc.2018.8705262. [GS Search]

Pérez-Gutiérrez, B. R. (2020). Comparación de técnicas de mineria de datos para identificar indicios de deserción estudiantil, a partir del desempeño académico. In Revista UIS Ingenierías, 19(1), 193-204. DOI: 10.18273/revuin.v19n1-2020018. [GS Search]

Quishpe-Morales, S., Pillo-Guanoluisa, D., Revelo-Portilla, I., & Guerra-Torrealba, L. (2020). Modelo de predicción de la deserción universitaria mediante analítica de datos: Estrategia para la sustentabilidad. In Revista Ibérica de Sistemas e Tecnologias de Informação, E35, 38-47. Associação Ibérica de Sistemas e Tecnologias de Informacao. [GS Search]

Radovanović, S., Delibašić, B., & Suknović, M. (2020). Predicting dropout in online learning environments. In Computer Science and Information Systems, 17(4), 53-53. DOI: 10.2298/csis200920053r. [GS Search]

Revathy, M., Kamalakkannan, S., & Kavitha, P. (2022). Machine Learning based Prediction of Dropout Students from the Education University using SMOTE. In 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT) (pp. 1750-1758). IEEE. DOI: 10.1109/icssit53264.2022.9716450. [GS Search]

Rovira, S., Puertas, E., & Igual, L. (2017). Data-driven system to predict academic grades and dropout. In PLoS one, 12(3), e0171207. Public Library of Science. DOI: 10.1371/journal.pone.0171207. [GS Search]

Şahin, M. (2021). A comparative analysis of dropout prediction in massive open online courses. In Arabian Journal for Science and Engineering, 46(2), 1845-1861. Springer. DOI: 10.1007/s13369-020-05127-9. [GS Search]

Sallan, G., & Behal, S. (n.d.). Prediction of student dropout using enhanced machine learning algorithm. DOI: 10.37418/amsj.9.6.61. [GS Search]

Santana, M. A., de Barros Costa, E., dos Santos Neto, B. F., Silva, I. C. L., & Rego, J. B. A. (2015). A predictive model for identifying students with dropout profiles in online courses. In EDM (Workshops). [GS Search]

Santos Baggi, C. A. D., & Lopes, D. A. (2011). Evasão e avaliação institucional no ensino superior: uma discussão bibliográfica. Avaliação: Revista da Avaliação da Educação Superior (Campinas), 16, 355-374. DOI: 10.1590/s1414-40772011000200007. [GS Search]

Selvan, M. P., Navadurga, N., & Prasanna, N. L. (2019). An efficient model for predicting student dropout using data mining and machine learning techniques. In Proceedings of ICICT, 8, 750-752. DOI: 10.35940/ijitee.i1155.0789s219. [GS Search]

Sivakumar, S., Venkataraman, S., & Selvaraj, R. (2016). Predictive modeling of student dropout indicators in educational data mining using improved decision tree. In Indian Journal of Science and Technology, 9(4), 1-5. Indian Society for Education and Environment. DOI: 10.17485/ijst/2016/v9i4/87032. [GS Search]

Sorensen, L. C. (2019). “Big data” in educational administration: An application for predicting school dropout risk. In Educational Administration Quarterly, 55(3), 404-446. SAGE Publications Sage CA: Los Angeles, CA. DOI: 10.1177/0013161x18799439. [GS Search]

Su, M., Olson, L. A., Jarratt, D. C., Varma, S., Konstan, J. A., Keller, R. J. L., & Chen, B. (2022). Re-envisioning a K-12 Early Warning System with School Climate Factors. In Proceedings of the Ninth ACM Conference on Learning@ Scale, 405-408. DOI: 10.1145/3491140.3528670. [GS Search]

Tamada, M. M., de Magalhães Netto, J. F., & de Lima, D. P. R. (2019). Predicting and reducing dropout in virtual learning using machine learning techniques: A systematic review. In 2019 IEEE Frontiers in Education Conference (FIE), 1-9. DOI: 10.1109/fie43999.2019.9028545. [GS Search]

Tan, M., & Shao, P. (2015). Prediction of student dropout in e-Learning program through the use of machine learning method. In International journal of emerging technologies in learning, 10(1). DOI: 10.3991/ijet.v10i1.4189. [GS Search]

Tenpipat, W., & Akkarajitsakul, K. (2020). Student Dropout Prediction: A KMUTT Case Study. In 2020 1st International Conference on Big Data Analytics and Practices (IBDAP), 1-5. IEEE. DOI: 10.1109/ibdap50342.2020.9245457. [GS Search]

Vega, H., Sanez, E., De La Cruz, P., Moquillaza, S., & Pretell, J. (2022). Intelligent System to Predict University Students Dropout. In International Journal of Online & Biomedical Engineering, 18(7). DOI: 10.3991/ijoe.v18i07.30195. [GS Search]

Viloria, A., & Lezama, O. B. P. (2019). Mixture structural equation models for classifying university student dropout in Latin America. In Procedia Computer Science, 160, 629-634. Elsevier. DOI: 10.1016/j.procs.2019.11.036. [GS Search]

Wu, N., Zhang, L., Gao, Y., Zhang, M., Sun, X., & Feng, J. (2019). CLMS-Net: dropout prediction in MOOCs with deep learning. In Proceedings of the ACM Turing Celebration Conference-China, 1-6. DOI: 10.1145/3321408.3322848. [GS Search]

Xing, W., & Du, D. (2019). Dropout prediction in MOOCs: Using deep learning for personalized intervention. In Journal of Educational Computing Research, 57(3), 547-570. SAGE Publications Sage CA: Los Angeles, CA. DOI: 10.1177/0735633118757015. [GS Search]

Yaacob, W. F. W., Sobri, N. M., Nasir, S. A. M., Norshahidi, N. D., & Husin, W. Z. W. (2020). Predicting student drop-out in higher institution using data mining techniques. In Journal of Physics: Conference Series, 1496(1), 012005. IOP Publishing. DOI: 10.1088/1742-6596/1496/1/012005. [GS Search]

Published

2024-03-10

How to Cite

JESUS, J. A. de; GUSMÃO, R. P. de. Investigation of Student Dropout through Data Mining and Machine Learning: A Systematic Mapping. Brazilian Journal of Computers in Education, [S. l.], v. 32, 2024. DOI: 10.5753/rbie.2024.3466. Disponível em: https://journals-sol.sbc.org.br/index.php/rbie/article/view/3466. Acesso em: 4 jul. 2024.

Issue

Section

Articles