Educational Data Mining Process applied to Student Performance Prediction: A comparison between Machine Learning and Deep Learning Techniques
DOI:
https://doi.org/10.5753/rbie.2021.29.0.519Keywords:
Educational Data Mining, Deep Learning, Machine Learning, Performance PredictionAbstract
With the increase in the availability of data, especially in the educational context, specific areas have emerged for the extraction of relevant information, such as Educational Data Mining (EDM), which integrates numerous techniques that support the capture, processing and analysis of these sets of records. The main technique associated with MDE is Machine Learning (ML), which has been used for decades in data processing in different contexts, but with the technological evolution other techniques have stood out such as Deep Learning (DL), based on the application of Multilayer Artificial Neural Networks. With a focus on this context, this study aims to predict the performance of students, using a set of public data, and to compare ML and DL techniques, in addition to indicating which are the main predictive attributes for student performance. For this, an EDM process based on 4 steps was implemented: 1) Data collection; 2) Resource extraction and data cleaning (pre-processing and transformation); 3) Analytical processing and algorithms; and 4) Analysis and interpretation of results. As a result, it was identified that the models generated from the traditional ML algorithms have a good performance, but inferior to the DL model, which had an accuracy of 94%, and it was found that attributes related to school activities are more predictive for the performance of students. students than data on demographic and socioeconomic characteristics. Keywords: Educational Data Mining, Deep Learning, Machine Learning, Performance Prediction.
Downloads
References
Aggarwal, C. C. (2015). Data Mining: The Textbook. 1. ed. New York, USA: Springer. E-book. doi: 10.1007/978-3-319-14142-8
Aggarwal, C. C. (2018). Neural Networks and Deep Learning: A Textbook. 1. ed. New York, USA: Springer, 2018. E-book. doi: 10.1007/978-3-319-94463-0
Alvim, P. (2010). Open Source com jCompany© Developer Suite. 3a Ed. ed. Belo Horizonte: E-book. [GS Search]
Badar, M., Haris, M., & Fatima, A. (2020). Application of Deep Learning for retinal image analysis: A review. Computer Science Review, 35, 1–18. doi: 10.1016/j.cosrev.2019.100203 [GS Search]
Bahrampour, S., et al. (2015). Comparative Study of Deep Learning Software Frameworks. Cornell Univeristy, 3, 1–9, 2015. [GS Search]
Baker, R., Isotani, S., & Carvalho, A. (2011). Mineração de Dados Educacionais: Oportunidades para o Brasil. Revista Brasileira de Informática na Educação, 19, 02, 3–13. doi: 10.5753/rbie.2011.19.02.03 [GS Search]
Baker, R. S. J. D. (2015). Big data and education. 2. ed. New York, USA: A Massive Online Open Textbook (MOOT) - Teachers College, Columbia University. [GS Search]
Baker, R. S., & Inventado, P. S. (2014). Educational Data Mining and Learning Analytics. In: J.A. Larusson and B. White (EDS.) (org.). Learning Analytics: From Research to Practice. 1. ed. New York, USA: Springer, 1–195. E-book. doi: 10.1007/978-1-4614-3305-7 [GS Search]
Bishop, C. M. (1995). Neural networks for pattern recognition. 1. ed. EUA. E-book. [GS Search]
Boulemtafes, A., Derhab, A., & Challal, Y. (2020). A review of privacy-preserving techniques for Deep Learning. Neurocomputing, 384, 21–45. doi: 10.1016/j.neucom.2019.11.041.[GS Search]
Cortez, P., & Silva, A. (2008). Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008). [GS Search]
De Los Reyes, D. A. G. et al. (2019). Predição de sucesso acadêmico de estudantes: uma análise sobre a demanda por uma abordagem baseada em transfer learning. Revista Brasileira de Informática na Educação, 27, 1, 1–25. doi: 10.5753/rbie.2019.27.01.01 [GS Search]
De Souza, V. F., & Perry, G. T. (2020). Tendências de Pesquisas em Mineração de Dados Educacionais em MOOCs: um Mapeamento Sistemático. Revista Brasileira de Informática na Educação, 28, 491-508. doi: 10.5753/rbie.2020.28.0.491] [GS Search]
EDM. Sociedade Internacional de Educational Data Mining. (2020). Disponível em: http://educationaldatamining.org/. Acesso em: 31 jan. 2021.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. Cambridge, MA, USA, 2016. E-book. [GS Search]
Guo, S. X., et al. (2019). Attention-Based Character-Word Hybrid Neural Networks With Semantic and Structural Information for Identifying of Urgent Posts in MOOC Discussion Forums. IEEE Access, 7, 120522–120532. doi: 10.1109/ACCESS.2019.2929211 [GS Search]
Hand, D. J. (1997). Construction and Assessment of Classification Rules. 1. ed. New York. E-book. [GS Search]
Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18, 7, 1527–1554. doi: 10.1162/neco.2006.18.7.1527 [GS Search]
Igual, L., & Seguí, S. (2017). Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications. 1. Ed. Springer. E-book. doi: 10.1007/978-3-319-50017-1
Japkowicz, N., & Shah, M. (2014). Evaluating Learning Algorithms: A Classification Perspective. 1a Ed. ed. Cambridge, E-book. [GS Search]
Kovalev, V., Kalinovsky, A., & Kovalev, S. (2016). Deep Learning with Theano, Torch, Caffe, TensorFlow, and Deeplearning4J: Which One Is the Best in Speed and Accuracy? In: 13th International Conference on Pattern Recognition and Information Processing (PRIP 2016), 99–103. [GS Search]
Kubat, M. (2017). An Introduction to Machine Learning. 2. ed. Coral Gables, FL, USA: Springer. E-book. doi: 10.1007/978-3-319-63913-0
Landis, J. R., & Koch, G. G. (1977). An Application of Hierarchical Kappa-type Statistics in the Assessment of Majority Agreement among Multiple Observers. Biometrics, 33, 2, 363–374, 1977. doi: 10.2307/2529786 [GS Search]
Le, Q., Torrisi, M., & Pollastri, G. (2020). Deep Learning methods in protein structure prediction. Computational and Structural Biotechnology Journal, 426, 1–10. doi: 10.1016/j.csbj.2019.12.011 [GS Search]
Lecun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521, 7553, 436–444. doi: 10.1038/nature14539 [GS Search]
Lin, J., et al. (2019). Automatic Knowledge Discovery in Lecturing Videos via Deep Representation. IEEE Access, 7, 33957–33963. doi: 10.1109/ACCESS.2019.2904046 [GS Search]
MURAT, F., et al. (2020). Application of Deep Learning techniques for heartbeats detection using ECG signals-analysis and review. Computers in Biology and Medicine, 120, 1–14. doi: 10.1016/j.compbiomed.2020.103726 [GS Search]
NG, S. S. Y. et al. (2016). An independent study of two Deep Learning platforms - H2O and SINGA. In: 2016, Bali, Indonesia. International Conference on Industrial Engineering and Engineering Management (IEEM 2016). Bali, Indonesia: 1279–1283. doi: 10.1109/IEEM.2016.7798084 [GS Search]
Rigo, S. J. et al. (2014). Aplicações de Mineração de Dados Educacionais e Learning Analytics com foco na evasão escolar: oportunidades e desafios. Revista Brasileira de Informática na Educação, 22, 01, 168–177. doi: 10.5753/RBIE.2014.22.01.132 [GS Search]
Ripley, B. D. (1996). Pattern recognition and neural networks. 1. ed. Cambridge, E-book. [GS Search]
Romero, C., & Ventura, S. (2013). Data mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 3, 1, 12–27. doi: 10.1002/widm.1075 [GS Search]
Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10, 3, 1–21. doi: 10.1002/widm.1355 [GS Search]
Schmidhuber, J. (2015). Deep Learning in neural networks: An overview. Neural Networks, 61, 85–117. doi: 10.1016/j.neunet.2014.09.003 [GS Search]
Sengupta, S. et al. (2020). Ophthalmic diagnosis using Deep Learning with fundus images – A critical review. Artificial Intelligence in Medicine, 102, 1–36. doi: 10.1016/j.artmed.2019.101758 [GS Search]
Sezer, O. B., Gudelek, M. U., & Ozbayoglu, A. M. (2020). Financial time series forecasting with Deep Learning: A systematic literature review: 2005–2019. Applied Soft Computing Journal, 90, 1–65, 2020. doi: 10.1016/j.asoc.2020.106181 [GS Search]
Shahiri, A. M., Husain, W., & Rashid, N. A. (2015). A Review on Predicting Student’s Performance Using Data Mining Techniques. Procedia Computer Science, 72, 414–422. doi: 10.1016/j.procs.2015.12.157 [GS Search]
Soffer, S., et al. (2019). Convolutional Neural Networks for Radiologic Images: A Radiologist’s Guide. Radiology, 290, 3, 590–606. doi: 10.1148/radiol.2018180547 [GS Search]
Waheed, H., et al. (2020). Predicting academic performance of students from VLE big data using Deep Learning models. Computers in Human Behavior, 104, 1–13, 2020. doi: 10.1016/j.chb.2019.106189 [GS Search]
Wen, Y., et al. (2020). Consideration of the local correlation of learning behaviors to predict dropouts from MOOCs. Tsinghua Science and Technology, 25, 3, 336–347. doi: 10.26599/TST.2019.9010013 [GS Search]
Xin, Y., et al. (2018). Machine Learning and Deep Learning Methods for Cybersecurity. IEEE Access, 20, 1–9. doi: 10.1109/ACCESS.2018.2836950 [GS Search]
Yang, J., Zhang, X. L., & Su, P. (2018). Deep-Learning-Based Agile Teaching Framework of Software Development Courses in Computer Science Education. Procedia Computer Science, 154, 137–145, 2018. doi: 10.1016/j.procs.2019.06.021 [GS Search]
Zhang, Y. & Wu, B. (2019). Research and application of grade prediction model based on decision tree algorithm. In: 2019, Chengdu, China. Turing Celebration Conference (ACM TURC 2019). Chengdu, China: ACM, 1–6. doi: 10.1145/3321408.3322857 [GS Search]
Zhao, Rui et al. (2019). Deep Learning and its applications to machine health monitoring. Mechanical Systems and Signal Processing, 115, 213–237. doi: 10.1016/j.ymssp.2018.05.050 [GS Search]
Additional Files
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.