Educational Data Mining Process applied to Student Performance Prediction: A comparison between Machine Learning and Deep Learning Techniques

Authors

  • Vanessa Faria de Souza UFRGS
  • Tony Carlos Bignardi dos Santos UFRGS

DOI:

https://doi.org/10.5753/rbie.2021.29.0.519

Keywords:

Educational Data Mining, Deep Learning, Machine Learning, Performance Prediction

Abstract

With the increase in the availability of data, especially in the educational context, specific areas have emerged for the extraction of relevant information, such as Educational Data Mining (EDM), which integrates numerous techniques that support the capture, processing and analysis of these sets of records. The main technique associated with MDE is Machine Learning (ML), which has been used for decades in data processing in different contexts, but with the technological evolution other techniques have stood out such as Deep Learning (DL), based on the application of Multilayer Artificial Neural Networks. With a focus on this context, this study aims to predict the performance of students, using a set of public data, and to compare ML and DL techniques, in addition to indicating which are the main predictive attributes for student performance. For this, an EDM process based on 4 steps was implemented: 1) Data collection; 2) Resource extraction and data cleaning (pre-processing and transformation); 3) Analytical processing and algorithms; and 4) Analysis and interpretation of results. As a result, it was identified that the models generated from the traditional ML algorithms have a good performance, but inferior to the DL model, which had an accuracy of 94%, and it was found that attributes related to school activities are more predictive for the performance of students. students than data on demographic and socioeconomic characteristics. Keywords: Educational Data Mining, Deep Learning, Machine Learning, Performance Prediction.

Downloads

Download data is not yet available.

Author Biographies

Vanessa Faria de Souza, UFRGS

Doutoranda no PPGIE (Programa de Pós-Graduação em Informática na Educação) da Universidade Federal do Rio Grande do Sul (UFRGS). Mestre em Informática pelo PPGI (Programa de Pós-Graduação em Informática) da Universidade Tecnológica Federal do Paraná (UTFPR), na área de Computação Aplicada, e ênfase em Engenharia de Software. Possuo especialização em Educação Especial Inclusiva, com ênfase em Tecnologia Assistiva. Sou graduada em Sistemas de Informação pela Universidade Estadual do Norte do Paraná (2011). Completei a Licenciatura em Matemática, pela UTFPR. Atualmente sou professora dedicação exclusiva no Instituto Federal do Rio Grande do Sul, Campus Ibirubá nos Cursos de Ciência da Computação e no Técnico em Informática Integrado do Ensino Médio o qual estou atualmente como Coodenadora. Também já Atuei como Professora do Magistério Superior na Universidade Estadual do Norte do Paraná (UENP) nos cursos de Graduação Ciência da Computação e Sistemas de informação, nas disciplinas de Sistemas Digitais, Projeto e Análise de Algoritmo, Tópicos Avançados em Computação, Computação Simbólica e Numérica, Metodologia Científica. Assim como na UTFPR. Também já atuei como professora de Matemática no Ensino Básico.

Tony Carlos Bignardi dos Santos, UFRGS

Professor do IFMS no Campus Coxim (2012) e estudante do Doutorado em Informática na Educação pela UFRGS (2019). Possui graduação em Sistemas de Informação pela UFMS (2007), pós-graduação em Docência para a Educação Profissional, Científica e Tecnológica pelo IFMS (2015) e mestrado em Arquitetura de Computadores pela UFMS-FACOM (2016).

References

Aggarwal, C. C. (2015). Data Mining: The Textbook. 1. ed. New York, USA: Springer. E-book. doi: 10.1007/978-3-319-14142-8

Aggarwal, C. C. (2018). Neural Networks and Deep Learning: A Textbook. 1. ed. New York, USA: Springer, 2018. E-book. doi: 10.1007/978-3-319-94463-0

Alvim, P. (2010). Open Source com jCompany© Developer Suite. 3a Ed. ed. Belo Horizonte: E-book. [GS Search]

Badar, M., Haris, M., & Fatima, A. (2020). Application of Deep Learning for retinal image analysis: A review. Computer Science Review, 35, 1–18. doi: 10.1016/j.cosrev.2019.100203 [GS Search]

Bahrampour, S., et al. (2015). Comparative Study of Deep Learning Software Frameworks. Cornell Univeristy, 3, 1–9, 2015. [GS Search]

Baker, R., Isotani, S., & Carvalho, A. (2011). Mineração de Dados Educacionais: Oportunidades para o Brasil. Revista Brasileira de Informática na Educação, 19, 02, 3–13. doi: 10.5753/rbie.2011.19.02.03 [GS Search]

Baker, R. S. J. D. (2015). Big data and education. 2. ed. New York, USA: A Massive Online Open Textbook (MOOT) - Teachers College, Columbia University. [GS Search]

Baker, R. S., & Inventado, P. S. (2014). Educational Data Mining and Learning Analytics. In: J.A. Larusson and B. White (EDS.) (org.). Learning Analytics: From Research to Practice. 1. ed. New York, USA: Springer, 1–195. E-book. doi: 10.1007/978-1-4614-3305-7 [GS Search]

Bishop, C. M. (1995). Neural networks for pattern recognition. 1. ed. EUA. E-book. [GS Search]

Boulemtafes, A., Derhab, A., & Challal, Y. (2020). A review of privacy-preserving techniques for Deep Learning. Neurocomputing, 384, 21–45. doi: 10.1016/j.neucom.2019.11.041.[GS Search]

Cortez, P., & Silva, A. (2008). Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008). [GS Search]

De Los Reyes, D. A. G. et al. (2019). Predição de sucesso acadêmico de estudantes: uma análise sobre a demanda por uma abordagem baseada em transfer learning. Revista Brasileira de Informática na Educação, 27, 1, 1–25. doi: 10.5753/rbie.2019.27.01.01 [GS Search]

De Souza, V. F., & Perry, G. T. (2020). Tendências de Pesquisas em Mineração de Dados Educacionais em MOOCs: um Mapeamento Sistemático. Revista Brasileira de Informática na Educação, 28, 491-508. doi: 10.5753/rbie.2020.28.0.491] [GS Search]

EDM. Sociedade Internacional de Educational Data Mining. (2020). Disponível em: http://educationaldatamining.org/. Acesso em: 31 jan. 2021.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. Cambridge, MA, USA, 2016. E-book. [GS Search]

Guo, S. X., et al. (2019). Attention-Based Character-Word Hybrid Neural Networks With Semantic and Structural Information for Identifying of Urgent Posts in MOOC Discussion Forums. IEEE Access, 7, 120522–120532. doi: 10.1109/ACCESS.2019.2929211 [GS Search]

Hand, D. J. (1997). Construction and Assessment of Classification Rules. 1. ed. New York. E-book. [GS Search]

Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18, 7, 1527–1554. doi: 10.1162/neco.2006.18.7.1527 [GS Search]

Igual, L., & Seguí, S. (2017). Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications. 1. Ed. Springer. E-book. doi: 10.1007/978-3-319-50017-1

Japkowicz, N., & Shah, M. (2014). Evaluating Learning Algorithms: A Classification Perspective. 1a Ed. ed. Cambridge, E-book. [GS Search]

Kovalev, V., Kalinovsky, A., & Kovalev, S. (2016). Deep Learning with Theano, Torch, Caffe, TensorFlow, and Deeplearning4J: Which One Is the Best in Speed and Accuracy? In: 13th International Conference on Pattern Recognition and Information Processing (PRIP 2016), 99–103. [GS Search]

Kubat, M. (2017). An Introduction to Machine Learning. 2. ed. Coral Gables, FL, USA: Springer. E-book. doi: 10.1007/978-3-319-63913-0

Landis, J. R., & Koch, G. G. (1977). An Application of Hierarchical Kappa-type Statistics in the Assessment of Majority Agreement among Multiple Observers. Biometrics, 33, 2, 363–374, 1977. doi: 10.2307/2529786 [GS Search]

Le, Q., Torrisi, M., & Pollastri, G. (2020). Deep Learning methods in protein structure prediction. Computational and Structural Biotechnology Journal, 426, 1–10. doi: 10.1016/j.csbj.2019.12.011 [GS Search]

Lecun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521, 7553, 436–444. doi: 10.1038/nature14539 [GS Search]

Lin, J., et al. (2019). Automatic Knowledge Discovery in Lecturing Videos via Deep Representation. IEEE Access, 7, 33957–33963. doi: 10.1109/ACCESS.2019.2904046 [GS Search]

MURAT, F., et al. (2020). Application of Deep Learning techniques for heartbeats detection using ECG signals-analysis and review. Computers in Biology and Medicine, 120, 1–14. doi: 10.1016/j.compbiomed.2020.103726 [GS Search]

NG, S. S. Y. et al. (2016). An independent study of two Deep Learning platforms - H2O and SINGA. In: 2016, Bali, Indonesia. International Conference on Industrial Engineering and Engineering Management (IEEM 2016). Bali, Indonesia: 1279–1283. doi: 10.1109/IEEM.2016.7798084 [GS Search]

Rigo, S. J. et al. (2014). Aplicações de Mineração de Dados Educacionais e Learning Analytics com foco na evasão escolar: oportunidades e desafios. Revista Brasileira de Informática na Educação, 22, 01, 168–177. doi: 10.5753/RBIE.2014.22.01.132 [GS Search]

Ripley, B. D. (1996). Pattern recognition and neural networks. 1. ed. Cambridge, E-book. [GS Search]

Romero, C., & Ventura, S. (2013). Data mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 3, 1, 12–27. doi: 10.1002/widm.1075 [GS Search]

Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10, 3, 1–21. doi: 10.1002/widm.1355 [GS Search]

Schmidhuber, J. (2015). Deep Learning in neural networks: An overview. Neural Networks, 61, 85–117. doi: 10.1016/j.neunet.2014.09.003 [GS Search]

Sengupta, S. et al. (2020). Ophthalmic diagnosis using Deep Learning with fundus images – A critical review. Artificial Intelligence in Medicine, 102, 1–36. doi: 10.1016/j.artmed.2019.101758 [GS Search]

Sezer, O. B., Gudelek, M. U., & Ozbayoglu, A. M. (2020). Financial time series forecasting with Deep Learning: A systematic literature review: 2005–2019. Applied Soft Computing Journal, 90, 1–65, 2020. doi: 10.1016/j.asoc.2020.106181 [GS Search]

Shahiri, A. M., Husain, W., & Rashid, N. A. (2015). A Review on Predicting Student’s Performance Using Data Mining Techniques. Procedia Computer Science, 72, 414–422. doi: 10.1016/j.procs.2015.12.157 [GS Search]

Soffer, S., et al. (2019). Convolutional Neural Networks for Radiologic Images: A Radiologist’s Guide. Radiology, 290, 3, 590–606. doi: 10.1148/radiol.2018180547 [GS Search]

Waheed, H., et al. (2020). Predicting academic performance of students from VLE big data using Deep Learning models. Computers in Human Behavior, 104, 1–13, 2020. doi: 10.1016/j.chb.2019.106189 [GS Search]

Wen, Y., et al. (2020). Consideration of the local correlation of learning behaviors to predict dropouts from MOOCs. Tsinghua Science and Technology, 25, 3, 336–347. doi: 10.26599/TST.2019.9010013 [GS Search]

Xin, Y., et al. (2018). Machine Learning and Deep Learning Methods for Cybersecurity. IEEE Access, 20, 1–9. doi: 10.1109/ACCESS.2018.2836950 [GS Search]

Yang, J., Zhang, X. L., & Su, P. (2018). Deep-Learning-Based Agile Teaching Framework of Software Development Courses in Computer Science Education. Procedia Computer Science, 154, 137–145, 2018. doi: 10.1016/j.procs.2019.06.021 [GS Search]

Zhang, Y. & Wu, B. (2019). Research and application of grade prediction model based on decision tree algorithm. In: 2019, Chengdu, China. Turing Celebration Conference (ACM TURC 2019). Chengdu, China: ACM, 1–6. doi: 10.1145/3321408.3322857 [GS Search]

Zhao, Rui et al. (2019). Deep Learning and its applications to machine health monitoring. Mechanical Systems and Signal Processing, 115, 213–237. doi: 10.1016/j.ymssp.2018.05.050 [GS Search]

Published

2021-06-13

How to Cite

SOUZA, V. F. de; SANTOS, T. C. B. dos. Educational Data Mining Process applied to Student Performance Prediction: A comparison between Machine Learning and Deep Learning Techniques. Brazilian Journal of Computers in Education, [S. l.], v. 29, p. 519–546, 2021. DOI: 10.5753/rbie.2021.29.0.519. Disponível em: https://journals-sol.sbc.org.br/index.php/rbie/article/view/2975. Acesso em: 18 oct. 2024.

Issue

Section

Articles