Data Mining and Machine Learning techniques applied to student dropout: a systematic literature mapping




Systematic literature mapping, Student dropout, School dropout, Data mining, Machine learning


This work presents a Systematic Mapping of the Literature on student dropout, from which we sought to answer the following research question: What tools, machine learning techniques, inducing factors, open databases, and algorithm evaluation metrics have been used to identify the possible causes of student dropout? The mapping protocol was developed based on the guidelines of Petersen (2008) and Kitchenham (2004). Thus, it consisted of defining research questions, selection criteria, search strings, and search sources, among other elements. Among the results, it is worth noting that the R tool was the most widely used, classification stood out among the machine learning techniques and the main works in the area focused on studying factors related to individual student characteristics. Additionally, 15 open databases were identified. Finally, regarding algorithm evaluation metrics, the following stand out: Recall, Accuracy, and Precision. The results of this mapping provide a comprehensive view of the state of the art from research on student dropout, including the most popular tools and techniques, and the most investigated inducing factors. Researchers can use the results of this study to direct research efforts toward the creation of models using the three types of inducing factors and the provision of open bases.


Download data is not yet available.


Barros, T. M., Silva, I., & Guedes, L. A. (2019). Determination of Dropout Student Profile Based on Correspondence Analysis Technique. IEEE Latin America Transactions, 17(09), 1517–1523. [GS Search]

Bonaccorso, G. (2017). Machine learning algorithms. Packt Publishing Ltd. [GS Search]

Costa, E., Baker, R. S., Amorim, L., Magalhães, J., & Marinho, T. (2013). Mineração de dados

educacionais: conceitos, técnicas, ferramentas e aplicações. Jornada de Atualização em Informática na Educação, 1(1), 1–29. [GS Search]

de Campos, A., Galafassi, C., Bastiani, E., Paz, F. J., Campos, R. L., Wives, L. K., Cazella, S. C., Reategui, E. B., & Barone, D. A. C. (2020). Mineração de Dados Educacionais e Learning Analytics no contexto educacional brasileiro: um mapeamento sistemático. Informática na educação: teoria & prática, 23(3 Set/Dez). [GS Search]

Deepika, K., & Sathvanaravana, N. (2018). Analyze and predicting the student academic performance using data mining tools. 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), 76–81. [GS Search]

Dombrovskaia, L., José, P., & Rodrıguez, P. (2020). Prediction of student’s retention in first year of engineering program at a technological chilean university. 2020 39th International Conference of the Chilean Computer Science Society (SCCC), 1–4. [GS Search]

Elhorst, J. P. (2014). Matlab software for spatial panels. International Regional Science Review, 37(3), 389–405. [GS Search]

Faria, S. M. S. M. L. d. (2014). Educational data mining e learning analytics na melhoria do ensino online (tese de dout.) [GS Search].

Freitas, F. A. d. S., Vasconcelos, F. F., Peixoto, S. A., Hassan, M. M., Dewan, M., Albuquerque, V. H. C. d., et al. (2020). IoT System for School Dropout Prediction Using Machine Learning Techniques Based on Socioeconomic Data . Electronics, 9(10), 1613. [GS Search]

Fuzeto, R., & Braga, R. (2016). Um mapeamento sistemático em progresso sobre internet das coisas e educação à distância. Anais dos Workshops do Congresso Brasileiro de Informática na Educação, 5(1), 1334. [GS Search]

German, D. M., Adams, B., & Hassan, A. E. (2013). The evolution of the R software ecosystem. 2013 17th European Conference on Software Maintenance and Reengineering, 243–252. [GS Search]

Grandini, M., Bagli, E., & Visani, G. (2020). Metrics for multi-class classification: an overview. arXiv preprint arXiv:2008.05756. [GS Search]

Ignacio, L. F. F. (2021). Aprendizado de máquina: da teoria à aplicação. [GS Search]

Kitchenham, B. (2004). Procedures for performing systematic reviews. Keele, UK, Keele University, 33(2004), 1–26. [GS Search]

Lima, J., Alves, P., Pereira, M., & Almeida, S. (2018). Using academic analytics to predict dropout risk in engineering courses. 17th European Conference on e-Learning ECEL 2018, 316–321. [GS Search]

Lobo, M. B. d. C. M. (2012). Panorama da evasão no ensino superior brasileiro: aspectos gerais das causas e soluções. [GS Search].

Marques, L. T. (2020). Mateo: uma abordagem de descoberta de conhecimento para desvendar as causas da evasão escolar . [GS Search]

Marques, L. T., De Castro, A. F., Marques, B. T., Silva, J. C. P., & Queiroz, P. G. G. (2019). Mineração de dados auxiliando na descoberta das causas da evasão escolar: Um Mapeamento Sistemático da Literatura. Revista Novas Tecnologias na Educação, 17(3), 194–203. [GS Search]

MEC, M. d. E. (2019). Censo da Educação Superior. [Link]

Muschelli, J. (2020). ROC and AUC with a binary predictor: a potentially misleading metric. Journal of classification, 37(3), 696–708. [GS Search]

Nagy, M., & Molontay, R. (2018). Predicting dropout in higher education based on secondary school performance. 2018 IEEE 22nd international conference on intelligent engineering systems (INES), 000389–000394. [GS Search]

Naik, A., & Samant, L. (2016). Correlation review of classification algorithm using data mining tool: WEKA, Rapidminer, Tanagra, Orange and Knime. Procedia Computer Science, 85, 662–668. [GS Search]

Pašic, Ð., & Ku ´ cak, D. (2020). Machine learning model for detecting high school students as ˇcandidates for drop-out from a study program. 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO), 1140–1144. [GS Search]

Petersen, K., Feldt, R., Mujtaba, S., & Mattsson, M. (2008). Systematic mapping studies in software engineering. 12th International Conference on Evaluation and Assessment in Software Engineering (EASE) 12, 1–10. [GS Search]

Pinto, S. C. (2021). Os custos da evasão de discentes das universidades brasileiras na modalidade de ensino presencial: uma perspectiva de custos contábeis e custos econômicos.[GS Search]

Ratra, R., & Gulia, P. (2020). Experimental evaluation of open source data mining tools (WEKA and Orange). Int. J. Eng. Trends Technol, 68(8), 30–35. [GS Search]

Sari, E. Y., Sunyoto, A., et al. (2019). Optimization of Weight Backpropagation with Particle Swarm Optimization for Student Dropout Prediction. 2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), 423–428. [GS Search]

Silva, I., & Moody, G. B. (2014). An open-source toolbox for analysing and processing physionet databases in matlab and octave. Journal of open research software, 2(1). < a href="">https: // [GS Search]

Sofyan, Y., & Kurniawan, H. (2009). Teknik Analisis Statistik terlengkap dengan Software SPSS. Salemba Infotek, Jakarta. [GS Search]

Sousa, L. R. d., Carvalho, V. O. d., Penteado, B. E., & Affonso, F. J. (2021). A systematic mapping on the use of data mining for the face-to-face school dropout problem. Proceedings. [GS Search]

Stancin, I., & Jovi ˇ c, A. (2019). An overview and comparison of free Python libraries for data ´mining and big data analysis. 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 977–982. [GS Search]

Tinto, V. (1975). Dropout from higher education: A theoretical synthesis of recent research. Review of educational research, 45(1), 89–125.

Valente, A., Holanda, M., Mariano, A. M., Furuta, R., & Da Silva, D. (2022). Analysis of Academic Databases for Literature Review in the Computer Science Education Field. 2022 IEEE Frontiers in Education Conference (FIE), 1–7. https:// [GS Search]

Wen, J., Li, S., Lin, Z., Hu, Y., & Huang, C. (2012). Systematic literature review of machine learning based software development effort estimation models. Information and Software Technology, 54(1), 41–59. [GS Search]



How to Cite

NASCIMENTO, F. F. do; DANTAS, L. C. de O.; CASTRO, A. F. de; QUEIROZ, P. G. G. . Data Mining and Machine Learning techniques applied to student dropout: a systematic literature mapping. Brazilian Journal of Computers in Education, [S. l.], v. 32, p. 270–294, 2024. DOI: 10.5753/rbie.2024.3296. Disponível em: Acesso em: 4 jul. 2024.


