Improving the prediction of school dropout with the support of the semi-supervised learning approach
DOI:
https://doi.org/10.5753/isys.2023.2852Keywords:
School Dropout, Machine Learning, Semi-supervised Learning, Educational Data MiningAbstract
School dropout is a phenomenon characterized by being influenced by several variables. This research used Machine Learning techniques, especially in the context of the semi-supervised learning strategy, to predict the risk of dropout in undergraduate courses at a Brazilian higher education institution. Two phases of experiments were conducted, the first using Feature Selection techniques and the second applying a semi-supervised learning strategy to improve performance metrics collected from the increase in the number of instances of students labeled as Graduated. As a main result, we obtained a model capable of classifying dropout with 90% accuracy and 86% Macro-F1.
Downloads
References
Abdi, H. and Williams, L. J. (2010). Newman-Keuls test and Tukey test. In Encyclopedia of research design, 2, 1-11. [link]
Agrusti, F., Bonavolontà, G. and Mezzini, M. (2019). University Dropout Prediction through Educational Data Mining Techniques: A Systematic Review. In Journal of E-Learning and Knowledge Society, 15(3), 161-182. https://doi.org/10.20368/1971-8829/1135017
Andifes. (2019). V Pesquisa do Perfil Socioeconômico e Cultural dos Estudantes de Graduação das Instituições Federais de Ensino Superior Brasileiras. Associação Nacional dos Dirigentes das Instituições Federais de Ensino Superior (ANDIFES). Retrieved May 1, 2022. [link]
Assis, L. R. S. (2017). Perfil de evasão no ensino superior brasileiro: uma abordagem de mineração de dados. Retrieved April 22, 2022. [link]
Ayodele, T. O. (2010). Types of machine learning algorithms. In New advances in machine learning, 3, 19-48. https://doi.org/10.5772/9385
Baggi, C. A. S. and Lopes, D. A. (2011). Evasão e avaliação institucional no ensino superior: uma discussão bibliográfica. In Revista da Avaliação da Educação Superior (Campinas), 16(2), 355-374. https://doi.org/10.1590/S1414-40772011000200007
Balkis, M. (2018). Academic motivation and intention to school dropout: the mediation role of academic achievement and absenteeism. In Asia Pacific Journal of Education, 38(2), 257-270. https://doi.org/10.1080/02188791.2018.1460258
Batista, G. E. (2003). Pré-processamento de Dados em Aprendizado de Máquina supervisionado. Retrieved November 10, 2021. https://doi.org/10.11606/T.55.2003.tde-06102003-160219
Berka, P. and Marek, L. (2021). Bachelor’s degree student dropouts: Who tend to stay and who tend to leave?. In Studies in Educational Evaluation, 70. https://doi.org/10.1016/j.stueduc.2021.100999
Biazus, C. A. (2004). Sistema de fatores que influenciam o aluno a evadir se dos cursos de graduação na UFSM e na UFSC: um estudo no curso de ciências contábeis. Retrieved January 23, 2022. [link]
Brasil. (2007). Decreto Nº 6.096, de 24 de abril de 2007. Institui o Programa de Apoio a Planos de Reestruturação e Expansão das Universidades Federais - REUNI. Retrieved January 7, 2022. [link]
Castro, A. K. S. and Teixeira, M. A. P. (2013). A evasão em um curso de psicologia: uma análise qualitativa. In Psicologia em Estudo, 18(2), 199-209. [link]
Ceratti, M. R. N. (2008). Evasão escolar: causas e consequências. Retrieved February 12, 2022. [link]
Davok, D. F. and Bernard, R. P. (2016). Avaliação dos índices de evasão nos cursos de graduação da Universidade do Estado de Santa Catarina-UDESC. In Revista da Avaliação da Educação Superior, 21(2), 503-522. https://doi.org/10.1590/S1414-40772016000200010
Demeter, E., Dorodchi, M., Al-Hossami, E., Benedict, A., Walker, L. and Smail, J. (2022). Predicting first-time-in-college students’ degree completion outcomes. In Higher Education, 1-21. https://doi.org/10.1007/s10734-021-00790-9
Diniz, R. V. and Goergen, P. L. (2019). Educação Superior no Brasil: panorama da contemporaneidade. In Revista da Avaliação da Educação Superior, 24(3), 573-593. https://doi.org/10.1590/s1414-40772019000300002
Fialho, M. G. D. and Prestes, E. M. T. (2014). Evasão escolar no curso de pedagogia da UFPB: na compreensão dos gestores educacionais. In Mpgoa, 3(1), 42-63. [link]
Fisher, D. H., Pazzani, M. J. and Langley, P. (2014). Concept formation: Knowledge and experience in unsupervised learning. Morgan Kaufmann.
Flores, V., Heras, S. and Julian, V. (2022). Comparison of Predictive Models with Balanced Classes Using the SMOTE Method for the Forecast of Student Dropout in Higher Education. In Electronics, 11(3), 457. https://doi.org/10.3390/electronics11030457
Gibson, B. R., Rogers, T. T. and ZHU, X. (2013). Human semi‐supervised learning. In Topics in cognitive science, 5(1), 132-172. https://doi.org/10.1111/tops.12010
Gonçalves, T. C., Silva, J. C. and Cortes, O. A. C. (2018). Técnicas de mineração de dados: um estudo de caso da evasão no ensino superior do Instituto Federal do Maranhão. In Revista Brasileira de Computação Aplicada, 10(3), 11-20. https://doi.org/10.5335/rbca.v10i3.8427
Guo, J., Wang, Q. and Li, Y. (2021). Semi‐supervised learning based on convolutional neural network and uncertainty filter for façade defects classification. In Computer‐Aided Civil and Infrastructure Engineering, 36(3), 302-317. https://doi.org/10.1111/mice.12632
Hegde, V. and Prageeth, P. P. (2018). Higher education student dropout prediction and analysis through educational data mining. In 2nd International Conference on Inventive Systems and Control (ICISC), 694-699. https://doi.org/10.1109/ICISC.2018.8398887
Helm, J., Swiergosz, A., Haeberle, H., Karnuta, J., Schaffer, J., Krebs, V., Spitzer, A. and Ramkumar, P. (2020). Machine learning and artificial intelligence: definitions, applications, and future directions. In Current reviews in musculoskeletal medicine, 13(1), 69-76. https://doi.org/10.1007/s12178-020-09600-8
Hsu, H. H. and Hsieh, C. (2010). Feature Selection via Correlation Coefficient Clustering. In Journal of Software, 5(12), 1371-1377. https://doi.org/10.4304/jsw.5.12.1371-1377
Inep. (2019). Resumo técnico do Censo da Educação Superior 2019. Retrieved December 12, 2021. [link]
Jagodics, B. and Szabó, E. (2022). Student burnout in higher education: A demand-resource model approach. In Trends in Psychology, 1-20. https://doi.org/10.1007/s43076-021-00137-4
Jia, P. and Maloney, T. (2015). Using predictive modelling to identify students at risk of poor university outcomes. In Higher Education, 70(1), 127-149. https://doi.org/10.1007/s10734-014-9829-7
John, T. J., Walsh, M., Raczek, A., Vuilleumier, C., Foley, C., Heberle, A., Sibley, E. and Dearing, E. (2018). The long-term impact of systemic student support in elementary school: Reducing high school dropout. In AERA Open, 4(4). https://doi.org/10.1177/2332858418799085
Jordan, M. I. and Mitchell, T. (2015). Machine learning: Trends, perspectives, and prospects. In Science, 349(6245), 255-260. https://doi.org/10.1126/science.aaa8415
José, A. R., Broilo, C. L. and Andreoli, G. S. A evasão na Unipampa – diagnosticando processos, acompanhando trajetórias e itinerários de formação. Retrieved January 20, 2022. [link]
Kantorski, G., Flores, E., Schmitt, J., Hoffmann, I. and Barbosa, F. (2016). Predição da evasão em cursos de graduação em instituições públicas. In Simpósio Brasileiro de Informática na Educação-SBIE, 27(1). http://dx.doi.org/10.5753/cbie.sbie.2016.906
Kehm, B. M., Larsen, M. R. and Sommersel, H. B. (2019). Student dropout from universities in Europe: A review of empirical literature. In Hungarian Educational Research Journal, 9(2), 147-164. https://doi.org/10.1556/063.9.2019.1.18
Koc, M., Zorbaz, O. and Demirtas-zorbaz, S. (2020). Has the ship sailed? The causes and consequences of school dropout from an ecological viewpoint. In Social Psychology of Education, 23(5), 1149-1171. https://doi.org/10.1007/s11218-020-09568-w
Kursa, M. B. and Rudnicki, W. R. (2010). Feature Selection with the Boruta Package. In Journal of Statistical Software, 36(11), 1–13. https://doi.org/10.18637/jss.v036.i11
Lee, S. and Chung, J. Y. (2019). The machine learning-based dropout early warning system for improving the performance of dropout prediction. In Applied Sciences, 9(15), 3093. https://doi.org/10.3390/app9153093
Martins, C. B. N. (2007). Evasão de alunos nos cursos de graduação em uma instituição de ensino superior. Retrieved April 3, 2022. [link]
Melo, A. S. C. (2016). Previsão automática de evasão estudantil: um estudo de caso na UFCG. Retrieved January 15, 2022. [link]
Mitchell, T. (1997). Machine Learning. McGraw-Hill.
Momm, A. M. P. and Momm, S. F. (2020). A evasão escolar no curso superior de tecnologia em Jaraguá do Sul. Retrieved November 12, 2021. [link]
Monard, M. C. and Baranauskas, J. A. (2003). Conceitos sobre aprendizado de máquina. In Sistemas inteligentes - Fundamentos e aplicações, 1(1). [link]
Morais, J. I., Abonizio, H. Q., Tavares, G. M., da Fonseca, A. A., and Barbon Jr, S. (2020). A Multi-label Classification System to Distinguish among Fake, Satirical, Objective and Legitimate News in Brazilian Portuguese. In ISys - Brazilian Journal of Information Systems, 13(4), 126–149. https://doi.org/10.5753/isys.2020.833
Musso, M. F., Hernández, C. F. R. and Cascallar, E. C. (2020). Predicting key educational outcomes in academic trajectories: a machine-learning approach. In Higher Education, 80(5), 875-894. https://doi.org/10.1007/s10734-020-00520-7
Nagai, N. P. and Cardoso, A. L. J. (2017). A evasão universitária: Uma análise além dos números. In Revista Estudo & Debate, 24(1). http://dx.doi.org/10.22410/issn.1983-036X.v24i1a2017.1271
Neves, C. E. B. and Martins, C. B. (2016). Ensino superior no Brasil: uma visão abrangente. Retrieved November 1, 2021. [link]
Niaksu, O. (2015). CRISP data mining methodology extension for medical domain. In Baltic Journal of Modern Computing, 3(2), 92-109. [link]
Nicoletti, M. C. (2019). Revisiting the Tinto's Theoretical Dropout Model. In Higher Education Studies, 9(3), 52-64. [link]
Nonato, B. F., Nogueira, C., Lima, L. and Otoni, S. (2020). Mudanças no perfil dos estudantes da UFMG: desafios para a prática docente. In Revista Docência do Ensino Superior, 10, 1-21. https://doi.org/10.35699/2237-5864.2020.20463
Pascoe, M. C., Hetrick, S. E. and Parker, A. G. The impact of stress on students in secondary school and higher education. In International Journal of Adolescence and Youth, 25(1), 104-112. https://doi.org/10.1080/02673843.2019.1596823
Perez, B., Castellanos, C. and Correal, D. (2018). Applying data mining techniques to predict student dropout: a case study. In Colombian Conference on Applications in Computational Intelligence (ColCACI), 1-6. https://doi.org/10.1109/ColCACI.2018.8484847
Rumberger, R. W. (2020). The economics of high school dropouts. In The economics of education, 1, 149-158. https://doi.org/10.1016/B978-0-12-815391-8.00012-4
Shirasu, M. R. and Arraes, R. A. (2016). Determinantes da evasão e repetência escolar. 2016. Retrieved March 12, 2022. [link]
Silva Filho, R. L. L., Motejunas, P. R., Hipólito, O. and Lobo, M. B. C. (2007). A evasão no ensino superior brasileiro. In Caderno de Pesquisa, 37(132), 641-659. https://doi.org/10.1590/S0100-15742007000300007
Soares, L. C. C., Ronzani, R., Carvalho, R. and Silva, A. (2020). Aplicação de Técnicas de Aprendizado de Máquina em um Contexto Acadêmico com Foco na Identificação dos Alunos Evadidos e não Evadidos. In Humanidades & Inovação, 7(8), 223-235. [link]
Sousa, M. C. C. (2020). Uma análise do algoritmo K-means como introdução ao aprendizado de máquinas. Retrieved January 3, 2022. [link]
Stadler, M. J., Becker, N., Greiff, S. and Spinath, F. M. (2015). The complex route to success: complex problem-solving skills in the prediction of university success. In Higher Education Research & Development, 35, 1–15. https://doi.org/10.1080/07294360.2015.1087387
Teodoro, L. A. and Kappel, M. A. A. (2020). Aplicação de Técnicas de Aprendizado de Máquina para Predição de Risco de Evasão Escolar em Instituições Públicas de Ensino Superior no Brasil. In Revista Brasileira de Informática na Educação, 28, 838-863. http://dx.doi.org/10.5753/rbie.2020.28.0.838
Van Engelen, J. E. and Hoos, H. H. (2020). A survey on semi-supervised learning. In Mach Learn, 109, 373–440. https://doi.org/10.1007/s10994-019-05855-6
Wang, Z. and Taylor, M. E. (2017). Improving Reinforcement Learning with Confidence-Based Demonstrations. In International Joint Conference on Artificial Intelligence (IJCAI-17), 3027-3033. https://doi.org/10.24963/ijcai.2017/422
Wirth, R. and Hipp, J. (2000). CRISP-DM: Towards a standard process model for data mining. In Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining. 29-39. [link]
Zhu, X. and Goldberg, A. B. (2009). Introduction to Semi-Supervised Learning. In Synthesis lectures on artificial intelligence and machine learning, 3(1), 1-130. https://doi.org/10.2200/S00196ED1V01Y200906AIM006
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 iSys - Brazilian Journal of Information Systems
This work is licensed under a Creative Commons Attribution 4.0 International License.