Student Success Prediction: An Analysis of the Demand for a Transfer Learning Approach

Authors

  • Daniel A. Guimarães De Los Reyes Centro Universitário Ritter dos Reis
  • Everton André Thomas Centro Universitário Ritter dos Reis
  • Lilian Landvoigt da Rosa Centro Universitário Ritter dos Reis
  • Wilson P. Gavião Neto Centro Universitário Ritter dos Reis

DOI:

https://doi.org/10.5753/rbie.2019.27.01.01

Keywords:

Academic sucess prediction, Learning Analytics, Educational Data Mining, Transfer Learning, Covariate Shift

Abstract

Student interactions with Learning Management Systems (LMS) generate logs, which are usually stored, allowing to recover each student activity. Analysis of these data with data mining and/or learning analytics techniques have been provided a better understanding of student behavior and teaching-learning processes. In this context, a number of studies have been reporting promising results in the task of predicting student performance, which allows proactive actions to avoid academic failures. Usually, data mining techniques estimate predictive models by using (past) historical data, assuming the premise that the estimated predictor will make predictions in future contexts that are similar to the (past) contexts which were used in its design. Although it is reasonable to assume that the diversity of existing educational contexts is reflected in the data, few studies discuss the impact of the aforementioned premise in the area of Educational Data Mining (EDM), resulting in models that may perform poorly when used under unforeseen educational conditions. This paper proposes an empirical analysis to verify evidences of differences between data from different educational contexts in the task of predicting students’ academic failure. Logs of more than 3,000 distance higher education students are used, and the adopted methodology is based on the supervised classification approach, commonly used in prediction tasks. Specifically, we aim to verify if distinct educational contexts are in fact separable in terms of the data they generate. Although data scenarios involve activities common to students in the same subject, the experiments indicate an accuracy of up to 83% in the separation of data from different academic terms. Although empirical, our results indicate a similar direction to that pointed out by other studies, contributing about the need of using transfer learning and/or domain adaptation techniques in the design of predictive models that aim to support proactive actions to prevent student failures.

Downloads

Download data is not yet available.

Author Biographies

Daniel A. Guimarães De Los Reyes, Centro Universitário Ritter dos Reis

Escola de Engenharia e TI

Everton André Thomas, Centro Universitário Ritter dos Reis

Escola de Engenharia e TI

Lilian Landvoigt da Rosa, Centro Universitário Ritter dos Reis

Mestrado em Design

Wilson P. Gavião Neto, Centro Universitário Ritter dos Reis

Mestrado em Design e Escola de Engenharia e TI

References

Agudo-Peregrina, Á. F., Iglesias-Pradas, S., Conde-González, M. Á., & Hernández-García, Á. (2014). Can we predict success from log data in vles? classification of interactions for learning analytics and their relation with performance in vle-supported f2f and online learning. Computers in human behavior, 31, 542–550. [GS Search] doi: 10.1016/j.chb.2013.05.031

Baker, R., Isotani, S., Carvalho, A. (2011). Mineração de dados educacionais: Oportunidades para o brasil. Revista Brasileira de Informática na Educação, 19(02), 03. [GS Search] doi: 10.5753/RBIE.2011.19.02.03

Baradwaj, B. K., Pal, S. (2011). Mining educational data to analyze students’ performance. International Journal of Advanced Computer Science and Applications, 2(6). Retrieved from [Link] [GS Search]

Barber, R., Sharkey, M. (2012). Course correction: Using analytics to predict course success. In Proceedings of the 2nd international conference on learning analytics and knowledge (pp. 259–262). [GS Search] doi: 10.1145/2330601.2330664

Bickel, S., Brückner, M., Scheffer, T. (2007). Discriminative learning for differing training and test distributions. In Proceedings of the 24th international conference on machine learning (pp. 81–88). [GS Search] doi: 10.1145/1273496.1273507

Bousbia, N., Belamri, I. (2014). Which contribution does edm provide to computer-based learning environments? In Educational data mining (pp. 3–28). Springer. [GS Search] doi: 10.1007/978-3-319-02738-8_1

Boyer, S., Veeramachaneni, K. (2015). Transfer learning for predictive models in massive open online courses. In International conference on artificial intelligence in education (pp. 54–63). [GS Search] doi: 10.1007/978-3-319-19773-9_6

Cechinel, C., Araujo, R. M., Detoni, D. (2015). Modelling and prediction of distance learning students failure by using the count of interactions. Brazilian Journal of Computers in Education, 23(03), 1. [GS Search] doi: 10.5753/RBIE.2015.23.03.1

Chatti, M. A., Dyckhoff, A. L., Schroeder, U., Thüs, H. (2012). A reference model for learning analytics. International Journal of Technology Enhanced Learning, 4(5-6), 318–331. [GS Search] doi: 10.1504/IJTEL.2012.051815

Costa, E. B., Fonseca, B., Santana, M. A., de Araújo, F. F., Rego, J. (2017). Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses. Computers in Human Behavior, 73, 247–256. [GS Search] doi: 10.1016/j.chb.2017.01.047

Daume III, H., Marcu, D. (2006). Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26, 101–126. [GS Search] doi: 10.1613/jair.1872

Dawson, S., Gašević, D., Siemens, G., Joksimovic, S. (2014). Current state and future trends: A citation network analysis of the learning analytics field. In Proceedings of the fourth inter- national conference on learning analytics and knowledge (pp. 231–240). [GS Search] doi: 10.1145/2567574.2567585

Duval, E. (2011). Attention please!: learning analytics for visualization and recommendation. In Proceedings of the 1st international

conference on learning analytics and knowledge (pp. 9–17). [GS Search] doi: 10.1145/2090116.2090118

Er, E. (2012). Identifying at-risk students using machine learning techniques: A case study with is 100. International Journal of Machine Learning and Computing, 2(4), 476. Retrieved from [Link] [GS Search]

Essa, A., Ayad, H. (2012). Improving student success using predictive models and data visualisations. Research in Learning Technology, 20(sup1), 19191. [GS Search] doi: 10.3402/rlt.v20i0.19191

Faceli, K., Lorena, A. C., Gama, J., Carvalho, A. (2011). Inteligência artificial: Uma abordagem de aprendizado de máquina. Rio de Janeiro: LTC. [GS Search]

Fayyad, U., Piatetsky-Shapiro, G., Smyth, P. (1996). From data mining to knowledge discovery in databases. AI magazine, 17(3), 37. [GS Search] doi: 10.1609/aimag.v17i3.1230

Ferguson, R. (2012). Learning analytics: drivers, developments and challenges. International Journal of Technology Enhanced Learning, 4(5-6), 304–317. [GS Search] doi: 10.1504/IJTEL.2012.051816

Fortenbacher, A., Beuster, L., Elkina, M., Kappe, L., Merceron, A., Pursian, A., . . . Wenzlaff, B. (2013). Lemo: A learning analytics application focussing on user path analysis and interactive visualization. In Intelligent data acquisition and advanced computing systems (idaacs), 2013 ieee 7th international conference on (Vol. 2, pp. 748–753). [GS Search] doi: 10.1109/IDAACS.2013.6663025

Gašević, D., Dawson, S., Rogers, T., Gasevic, D. (2016). Learning analytics should not promote one size fits all: The effects of instructional conditions in predicting academic success. The Internet and Higher Education, 28, 68–84. [GS Search] doi: 10.1016/j.iheduc.2015.10.002

Gottardo, E., Kaestner, C. A. A., Noronha, R. V. (2014). Estimativa de desempenho acadêmico de estudantes: Análise da aplicação de técnicas de mineração de dados em cursos a distância. Revista Brasileira de Informática na Educação, 22(1). Retrieved from [Link] [GS Search]

Han, J., Pei, J., Kamber, M. (2011). Data mining: concepts and techniques. Elsevier. [GS Search]

He, H., Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9), 1263–1284. [GS Search] doi: 10.1109/TKDE.2008.239

Hoang, N. D., Chau, V. T. N., Phung, N. H. (2016). Combining transfer learning and co-training for student classification in an academic credit system. In Computing communication technologies, research, innovation, and vision for the future (rivf), 2016 ieee rivf international conference on (pp. 55–60). [GS Search] doi: 10.1109/RIVF.2016.7800269

Hu, Y.-H., Lo, C.-L., Shih, S.-P. (2014). Developing early warning systems to predict students’ online learning performance. Computers in Human Behavior, 36, 469–478. [GS Search] doi: 10.1016/j.chb.2014.04.002

Jayaprakash, S. M., Moody, E. W., Lauría, E. J., Regan, J. R., Baron, J. D. (2014). Early alert of academically at-risk students: An open source analytics initiative. Journal of Learning Analytics, 1(1), 6–47. [GS Search] doi: 10.18608/jla.2014.11.3

Kampff, A. J. C. (2009). Mineração de dados educacionais para geração de alertas em ambientes virtuais de aprendizagem como apoio à prática docente. Retrieved from [Link] [GS Search]

Lagus, J. (2016). Course outcome prediction with transfer learning methods (Master’s thesis, University of Helsinki, Helsinki, Finland). [GS Search] doi: 10138/165915

Lara, J. A., Lizcano, D., Martínez, M. A., Pazos, J., & Riera, T. (2014). A system for knowledgediscovery in e-learning environments within the european higher education area–application tostudent data from open university of madrid, udima.Computers & Education,72, 23–36. [GS Search] doi: 10.1016/j.compedu.2013.10.009

Liu, B. (2011).Web data mining: Exploring hyperlinks, contents, and usage data. Springer Ber-lin Heidelberg. Retrieved from [Link] [GS Search]

Lu, J., Behbood, V., Hao, P., Zuo, H., Xue, S., & Zhang, G. (2015). Transfer learning usingcomputational intelligence: a survey. Knowledge-Based Systems, 80, 14–23. [GS Search] doi: 10.1016/j.knosys.2015.01.010

Macfadyen, L. P., & Dawson, S. (2010). Mining lms data to develop an “early warning system” for educators: A proof of concept. Computers & education, 54(2), 588–599. [GS Search] doi: 10.1016/j.compedu.2009.09.008

Manhães, L. M. B., Da Cruz, S. M. S., Costa, R. J. M., Zavaleta, J., & Zimbrão, G. (2011). Previsão de estudantes com risco de evasão utilizando técnicas de mineração de dados. In Brazilian symposium on computers in education (simpósio brasileiro de informática na educação-sbie) (Vol. 1). Retrieved from [Link] [GS Search]

Márquez-Vera, C., Cano, A., Romero, C., & Ventura, S. (2013). Predicting student failure atschool using genetic programming and different data mining approaches with high dimensional and imbalanced data. Applied intelligence, 38(3), 315–330. [GS Search] doi: 10.1007/s10489-012-0374-8

Martin, F. (2014 (accessed November 1, 2016)). A simple machine learning method to detectcovariate shift [Computer software manual]. Retrieved from [Link]

Moreno-Torres, J., Raeder, T., Alaiz-Rodríguez, R., Chawla, N., & Herrera, F. (2012). A unifyingview on dataset shift in classification. Pattern Recognition, 45(1), 521–530. [GS Search] doi: 10.1016/j.patcog.2011.06.019

Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on knowledgeand data engineering, 1345–1359. [GS Search] doi: 10.1109/TKDE.2009.191

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., . . . others (2011).Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12(Oct), 2825–2830. Retrieved from [GS Search]

Peña-Ayala, A. (2013). Educational data mining: Applications and trends (Vol. 524). Springer. [GS Search]

Peña-Ayala, A. (2014). Educational data mining: A survey and a data mining-based analysis of recent works. Expert systems with applications, 41(4), 1432–1462. [GS Search] doi: 10.1016/j.eswa.2013.08.042

Quinlan, J. R. (1986, mar). Induction of decision trees. Machine Learning, 1(1), 81–106. [GSSearch] doi: 10.1007/BF00116251

Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., & Lawrence, N. D. (2009). Dataset shiftin machine learning. The MIT Press. [GS Search]

Ramaswami, M., & Bhaskaran, R. (2009). A study on feature selection techniques in educationaldata mining.Journal of Computing,1(1), 7–11. Retrieved from [Link] [GS Search]

Raza, H., Prasad, G., & Li, Y. (2015). Ewma model based shift-detection methods for detecting covariate shifts in non-stationary environments. Pattern Recognition, 48(3), 659–669. [GS Search] doi: 10.1016/j.patcog.2014.07.028

Rigo, S. J., Cambruzzi, W., Barbosa, J. L., & Cazella, S. C. (2014). Educational data mining and learning analytics applications in dropout: opportunities and challenges. Brazilian Journal of Computers in Education, 22(01), 132. [GS Search] doi: 10.5753/rbie.2014.22.01.132

Romero, C., López, M.-I., Luna, J.-M., & Ventura, S. (2013). Predicting students’ final perfor-mance from participation in on-line discussion forums. Computers & Education, 68, 458–472. [GS Search] doi: 10.1016/j.compedu.2013.06.009

Santos, J. L., Govaerts, S., Verbert, K., & Duval, E. (2012). Goal-oriented visualizations of activity tracking: a case study with engineering students. In Proceedings of the 2nd international conference on learning analytics and knowledge (pp. 143–152). [GS Search] doi: 10.1145/2330601.2330639

Siemens, G., & Baker, R. S. (2012). Learning analytics and educational data mining: towards communication and collaboration. In Proceedings of the 2nd international conference on learning analytics and knowledge (pp. 252–254). [GS Search] doi: 10.1145/2330601.2330661

Siemens, G., & Long, P. (2011). Penetrating the fog: Analytics in learning and education. EDUCAUSE review, 46(5), 30. Retrieved from [Link] [GS Search]

Simpson, O. (2004). The impact on retention of interventions to support distance learning students. Open Learning: The Journal of Open, Distance and e-Learning, 19(1), 79–95. [GS Search] doi: 10.1080/0268051042000177863

Sugiyama, M., Nakajima, S., Kashima, H., & Buenau, P. V. (2008). Direct importance estimation with model selection and its application to covariate shift adaptation. In Advances in neural information processing systems (pp. 1433–1440). [GS Search] doi: 10.1007/s10463-008-0197-x

Sun, B., Feng, J., & Saenko, K. (2016). Return of frustratingly easy domain adaptation. In Aaai conference on artificial intelligence(Vol. 6, p. 8). Retrieved from https://arxiv.org/abs/1511.05547 [GS Search]

Thammasiri, D., Delen, D., Meesad, P., & Kasap, N. (2014). A critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition. Expert Systems with Applications, 41(2), 321–330. [GS Search] doi: 10.1016/j.eswa.2013.07.046

Voß, L., Schatten, C., Mazziotti, C., & Schmidt-Thieme, L. (2015). A transfer learning approachfor applying matrix factorization to small its datasets. International Educational Data Mining Society. Retrieved from [Link] [GS Search]

You, J. W. (2016). Identifying significant indicators using lms data to predict course achievement in online learning. The Internet and Higher Education, 29, 23–30. [GS Search] doi: 10.1016/j.iheduc.2015.11.003

Zadrozny, B. (2004). Learning and evaluating classifiers under sample selection bias. In Proceedings of the 21th international conference on machine learning (p. 114). [GS Search] doi: 10.1145/1015330.101542525

Published

2019-01-01

How to Cite

DE LOS REYES, D. A. G.; THOMAS, E. A.; ROSA, L. L. da; GAVIÃO NETO, W. P. Student Success Prediction: An Analysis of the Demand for a Transfer Learning Approach. Brazilian Journal of Computers in Education, [S. l.], v. 27, n. 1, p. 01–25, 2019. DOI: 10.5753/rbie.2019.27.01.01. Disponível em: https://journals-sol.sbc.org.br/index.php/rbie/article/view/4751. Acesso em: 19 sep. 2024.

Issue

Section

Articles