An Extended Process Mining Framework for the Multi-factor Analysis of Student Trajectories in Higher Education: The Dropout Problem
DOI:
https://doi.org/10.5753/jbcs.2026.6559Keywords:
Data science in education, Dropout problem, Higher Education, Process MiningAbstract
While higher education is the backbone for human capital development and economic growth, its high dropout rates remain a global concern that leads to wasted resources and unfulfilled student potential. Understanding dropout requires integrating social, economic, academic, and technical factors across students’ trajectories, often interrelated in intricate, non-obvious ways. In this context, Process Mining (PM) offers a promising approach by uncovering patterns in students’ interactions with academic programs and courses. However, traditional PM methods are typically established over mono perspectives of processes, which limits their ability to capture the multi-factor and correlated nature of educational trajectories. To address this gap, this paper proposes an extended PM-based approach that incorporates enriched labeling strategies that allow the simultaneous analysis of multiple dimensions of students' academic trajectories. Furthermore, the article presents a detailed application of the labeled method over real data of a Brazilian public university with 437,690 events from eight different programs, including students from the Unified Selection System (SISU). By comparing students' outcomes and paths, while considering their enrollment method, course option, and demographic information, we discovered that admission score, program, high school type, gender, and place of origin are the variables with a higher correlation to successful and less successful students. A deeper analysis of a specific program is also outlined to show how the approach can be customized for particular cases, under minor effort, while keeping standard input data.
Downloads
References
Araque, F., Roldán, C., and Salguero, A. (2009). Factors influencing university drop out rates. Computers & Education, 53(3):563-574. DOI: 10.1016/j.compedu.2009.03.013.
Awang Long, Z., Faizuddin Mohd Noor, M., et al. (2023). Factors influencing dropout students in higher education. Education Research International, 2023(1):7704142. DOI: 10.1155/2023/7704142.
Bäulke, L., Grunschel, C., and Dresel, M. (2022). Student dropout at university: A phase-orientated view on quitting studies and changing majors. European Journal of Psychology of Education, 37(3):853-876. DOI: 10.1007/s10212-021-00557-x.
Bean, J. P. (1980). Dropouts and turnover: The synthesis and test of a causal model of student attrition. Research in higher education, 12(2):155-187. DOI: 10.1007/bf00976194.
Behr, A., Giese, M., Teguim K, H. D., and Theune, K. (2020). Early prediction of university dropouts-a random forest approach. Jahrbücher für Nationalökonomie und Statistik, 240(6):743-789. DOI: 10.1515/jbnst-2019-0006.
Bifet, A. and Gavaldà, R. (2007). Learning from time-changing data with adaptive windowing. In Proceedings of the 7th SIAM International Conference on Data Mining, pages 443-448, Minneapolis, Minnesota, USA. Society for Industrial and Applied Mathematics. DOI: 10.1137/1.9781611972771.42.
Boneau, C. A. (1960). The effects of violations of assumptions underlying the t test. Psychological Bulletin, 57(1):49. DOI: 10.1037/h0041412.
Brasil (2012). Lei nº 12.711, de 29 de agosto de 2012. Available at:[link].
Briskiewicz, L. B. (2016). Identificação dos gastos dos cursos de graduação da universidade tecnológica federal do paraná câmpus pato branco e mensuração do custo ideal por aluno. Available at:[link].
Cerdeira, J. M., Nunes, L. C., Reis, A. B., and Seabra, C. (2018). Predictors of student success in higher education: Secondary school internal scores versus national exams. Higher Education Quarterly, 72(4):304-313. DOI: 10.1111/hequ.12158.
Chapela-Campa, D., Dumas, M., Mucientes, M., and Lama, M. (2022). Efficient edge filtering of directly-follows graphs for process mining. Inf. Sciences, 610:830-846. DOI: 10.1016/j.ins.2022.07.170.
Chen, Y., Johri, A., and Rangwala, H. (2018). Running out of STEM: a comparative study across stem majors of college students at-risk of dropping out early. In Proceedings of the 8th international conference on learning analytics and knowledge, pages 270-279. DOI: 10.1145/3170358.31704.
Corder, G. W. and Foreman, D. I. (2014). Nonparametric statistics: A step-by-step approach. John Wiley & Sons. Book.
Costa, O. S. d. and Gouveia, L. B. (2018). Modelos de retenção de estudantes: abordagens e perspectivas. REAd. Revista Eletrônica de Administração (Porto Alegre), 24(3):155–182. DOI: 10.1590/1413-2311.226.85489.
Freitas, M., Lara, G., Southier, L., Favarim, F., Dosciatti, E., Teixeira, L., and Teixeira, M. (2023). Assimetria de gênero na computação: um estudo de caso em uma universidade pública brasileira. In Anais do XXXIV Simpósio Brasileiro de Informática na Educação, pages 1007-1017, Porto Alegre, RS, Brasil. SBC. DOI: 10.5753/sbie.2023.234686.
Härdle, W., Müller, M., Sperlich, S., Werwatz, A., et al. (2004). Nonparametric and semiparametric models, volume 1. Springer. DOI: 10.1201/b10510-5.
IBGE (2022). Ipca - Índice nacional de preços ao consumidor amplo. Available at:[link].
Inep (2020). Censo da educação superior 2019. Available at:[link].
Inep (2022). Indicadores de fluxo da educação superior. Available at:[link].
Levy, Y. (2007). Comparing dropouts and persistence in e-learning courses. Computers & education, 48(2):185-204. DOI: 10.1016/j.compedu.2004.12.004.
Maimon, O. Z. and Rokach, L. (2014). Data mining with decision trees: theory and applications, volume 81. World scientific. DOI: 10.1142/9789812771728.
Marques, F. T. (2020). The return to higher education of dropout students in Brazil. Cadernos de Pesquisa, 50:1061-1077. DOI: 10.1590/198053147158.
Morais, D. C. S. d., Pontual Falcão, T., and Tedesco, P. (2024). Promoting children’s participation in a participatory design process in a rural school: A new role needed? Journal of the Brazilian Computer Society, 30(1):116–132. DOI: 10.5753/jbcs.2024.3114.
Nakatumba, J. and van der Aalst, W. M. (2009). Analyzing resource behavior using process mining. In Int. Conf. on BPM, pages 69-80. Springer. DOI: 10.1007/978-3-642-12186-9_8.
OECD (2022). Education at a Glance 2022. DOI: 10.1787/3197152b-en.
Ortiz-Lozano, J. M., Rua-Vieites, A., Bilbao-Calabuig, P., and Casadesús-Fa, M. (2018). University student retention: Best time and data to identify undergraduate students at risk of dropout. Innovations in education and teaching international. DOI: 10.1080/14703297.2018.1502090.
Paura, L. and Arhipova, I. (2014). Cause analysis of students’ dropout rate in higher education study program. Procedia - Social and Behavioral Sciences, 109:1282-1286. DOI: 10.1016/j.sbspro.2013.12.625.
Salgado, L. C. C., Moro, M. M., Araujo, A., de Figueiredo, R. V., Cappelli, C., Nakamura, F., and de Santana, T. S. (2025). Wit comes of age: The successful story of the women in information technology workshop. Journal of the Brazilian Computer Society, 31(1):36–49. DOI: 10.5753/jbcs.2025.4506.
Saqr, M., López-Pernas, S., Helske, S., and Hrastinski, S. (2023). The longitudinal association between engagement and achievement varies by time, students’ profiles, and achievement state: A full program study. Computers & Education, 199:104787. DOI: 10.1016/j.compedu.2023.104787.
Sato, D. M. V., Barddal, J. P., and Scalabrin, E. E. (2021). Interactive Process Drift Detection Framework. In ICAISC 2021: Artificial Intelligence and Soft Computing, volume 12855 LNAI, pages 192-204, Zakopane, Poland. Springer, Cham. DOI: 10.1007/978-3-030-87897-9_18.
Sato, D. M. V., De Freitas, S. C., Barddal, J. P., and Scalabrin, E. E. (2022). A Survey on Concept Drift in Process Mining. ACM Computing Surveys, 54(9):1-38. DOI: 10.1145/3472752.
SISU (2022). Sistema de seleção unificada. Available at:[link].
Tayebi, A., Gomez, J., and Delgado, C. (2021). Analysis on the lack of motivation and dropout in engineering students in Spain. IEEE Access, 9:66253-66265. DOI: 10.1109/ACCESS.2021.3076751.
Tinto, V. (1975). Dropout from higher education: A theoretical synthesis of recent research. Review of educational research, 45(1):89-125. DOI: 10.3102/00346543045001089.
Van Der Aalst, W. (2016). Data science in action. In Process mining. Springer. DOI: 10.1007/978-3-662-49851-4_1.
Voelkle, M. C. and Sander, N. (2008). University dropout: A structural equation approach to discrete-time survival analysis. Journal of Individual Differences, 29(3):134. DOI: 10.1027/1614-0001.29.3.134.
Xenos, M., Pierrakeas, C., and Pintelas, P. (2002). A survey on student dropout rates and dropout causes concerning the students in the course of informatics of the Hellenic Open University. Computers & Education, 39(4):361-377. DOI: 10.1016/S0360-1315(02)00072-6.
Zhang, J., Gao, M., and Zhang, J. (2021). The learning behaviours of dropouts in moocs: A collective attention network perspective. Computers & education, 167:104189. DOI: 10.1016/j.compedu.2021.104189.
Zhao, B. (2017). Web scraping. Encyclopedia of big data, pages 1-3. DOI: 10.1007/978-3-319-32001-4_483-1.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Luiz F. P. Southier, Marcelo Teixeira, Lovania R. Teixeira, Sheila C. Freitas, Denise M. V. Sato, Jair J. Ferronatto, Edson E. Scalabrin

This work is licensed under a Creative Commons Attribution 4.0 International License.

