Student Success Prediction: An Analysis of the Demand for a Transfer Learning Approach
DOI:
https://doi.org/10.5753/rbie.2019.27.01.01Keywords:
Academic sucess prediction, Learning Analytics, Educational Data Mining, Transfer Learning, Covariate ShiftAbstract
Student interactions with Learning Management Systems (LMS) generate logs, which are usually stored, allowing to recover each student activity. Analysis of these data with data mining and/or learning analytics techniques have been provided a better understanding of student behavior and teaching-learning processes. In this context, a number of studies have been reporting promising results in the task of predicting student performance, which allows proactive actions to avoid academic failures. Usually, data mining techniques estimate predictive models by using (past) historical data, assuming the premise that the estimated predictor will make predictions in future contexts that are similar to the (past) contexts which were used in its design. Although it is reasonable to assume that the diversity of existing educational contexts is reflected in the data, few studies discuss the impact of the aforementioned premise in the area of Educational Data Mining (EDM), resulting in models that may perform poorly when used under unforeseen educational conditions. This paper proposes an empirical analysis to verify evidences of differences between data from different educational contexts in the task of predicting students’ academic failure. Logs of more than 3,000 distance higher education students are used, and the adopted methodology is based on the supervised classification approach, commonly used in prediction tasks. Specifically, we aim to verify if distinct educational contexts are in fact separable in terms of the data they generate. Although data scenarios involve activities common to students in the same subject, the experiments indicate an accuracy of up to 83% in the separation of data from different academic terms. Although empirical, our results indicate a similar direction to that pointed out by other studies, contributing about the need of using transfer learning and/or domain adaptation techniques in the design of predictive models that aim to support proactive actions to prevent student failures.
Downloads
References
Agudo-Peregrina, Á. F., Iglesias-Pradas, S., Conde-González, M. Á., & Hernández-García, Á. (2014). Can we predict success from log data in vles? classification of interactions for learning analytics and their relation with performance in vle-supported f2f and online learning. Computers in human behavior, 31, 542–550. [GS Search] doi: 10.1016/j.chb.2013.05.031
Baker, R., Isotani, S., Carvalho, A. (2011). Mineração de dados educacionais: Oportunidades para o brasil. Revista Brasileira de Informática na Educação, 19(02), 03. [GS Search] doi: 10.5753/RBIE.2011.19.02.03
Baradwaj, B. K., Pal, S. (2011). Mining educational data to analyze students’ performance. International Journal of Advanced Computer Science and Applications, 2(6). Retrieved from [Link] [GS Search]
Barber, R., Sharkey, M. (2012). Course correction: Using analytics to predict course success. In Proceedings of the 2nd international conference on learning analytics and knowledge (pp. 259–262). [GS Search] doi: 10.1145/2330601.2330664
Bickel, S., Brückner, M., Scheffer, T. (2007). Discriminative learning for differing training and test distributions. In Proceedings of the 24th international conference on machine learning (pp. 81–88). [GS Search] doi: 10.1145/1273496.1273507
Bousbia, N., Belamri, I. (2014). Which contribution does edm provide to computer-based learning environments? In Educational data mining (pp. 3–28). Springer. [GS Search] doi: 10.1007/978-3-319-02738-8_1
Boyer, S., Veeramachaneni, K. (2015). Transfer learning for predictive models in massive open online courses. In International conference on artificial intelligence in education (pp. 54–63). [GS Search] doi: 10.1007/978-3-319-19773-9_6
Cechinel, C., Araujo, R. M., Detoni, D. (2015). Modelling and prediction of distance learning students failure by using the count of interactions. Brazilian Journal of Computers in Education, 23(03), 1. [GS Search] doi: 10.5753/RBIE.2015.23.03.1
Chatti, M. A., Dyckhoff, A. L., Schroeder, U., Thüs, H. (2012). A reference model for learning analytics. International Journal of Technology Enhanced Learning, 4(5-6), 318–331. [GS Search] doi: 10.1504/IJTEL.2012.051815
Costa, E. B., Fonseca, B., Santana, M. A., de Araújo, F. F., Rego, J. (2017). Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses. Computers in Human Behavior, 73, 247–256. [GS Search] doi: 10.1016/j.chb.2017.01.047
Daume III, H., Marcu, D. (2006). Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26, 101–126. [GS Search] doi: 10.1613/jair.1872
Dawson, S., Gašević, D., Siemens, G., Joksimovic, S. (2014). Current state and future trends: A citation network analysis of the learning analytics field. In Proceedings of the fourth inter- national conference on learning analytics and knowledge (pp. 231–240). [GS Search] doi: 10.1145/2567574.2567585
Duval, E. (2011). Attention please!: learning analytics for visualization and recommendation. In Proceedings of the 1st international
conference on learning analytics and knowledge (pp. 9–17). [GS Search] doi: 10.1145/2090116.2090118
Er, E. (2012). Identifying at-risk students using machine learning techniques: A case study with is 100. International Journal of Machine Learning and Computing, 2(4), 476. Retrieved from [Link] [GS Search]
Essa, A., Ayad, H. (2012). Improving student success using predictive models and data visualisations. Research in Learning Technology, 20(sup1), 19191. [GS Search] doi: 10.3402/rlt.v20i0.19191
Faceli, K., Lorena, A. C., Gama, J., Carvalho, A. (2011). Inteligência artificial: Uma abordagem de aprendizado de máquina. Rio de Janeiro: LTC. [GS Search]
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P. (1996). From data mining to knowledge discovery in databases. AI magazine, 17(3), 37. [GS Search] doi: 10.1609/aimag.v17i3.1230
Ferguson, R. (2012). Learning analytics: drivers, developments and challenges. International Journal of Technology Enhanced Learning, 4(5-6), 304–317. [GS Search] doi: 10.1504/IJTEL.2012.051816
Fortenbacher, A., Beuster, L., Elkina, M., Kappe, L., Merceron, A., Pursian, A., . . . Wenzlaff, B. (2013). Lemo: A learning analytics application focussing on user path analysis and interactive visualization. In Intelligent data acquisition and advanced computing systems (idaacs), 2013 ieee 7th international conference on (Vol. 2, pp. 748–753). [GS Search] doi: 10.1109/IDAACS.2013.6663025
Gašević, D., Dawson, S., Rogers, T., Gasevic, D. (2016). Learning analytics should not promote one size fits all: The effects of instructional conditions in predicting academic success. The Internet and Higher Education, 28, 68–84. [GS Search] doi: 10.1016/j.iheduc.2015.10.002
Gottardo, E., Kaestner, C. A. A., Noronha, R. V. (2014). Estimativa de desempenho acadêmico de estudantes: Análise da aplicação de técnicas de mineração de dados em cursos a distância. Revista Brasileira de Informática na Educação, 22(1). Retrieved from [Link] [GS Search]
Han, J., Pei, J., Kamber, M. (2011). Data mining: concepts and techniques. Elsevier. [GS Search]
He, H., Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9), 1263–1284. [GS Search] doi: 10.1109/TKDE.2008.239
Hoang, N. D., Chau, V. T. N., Phung, N. H. (2016). Combining transfer learning and co-training for student classification in an academic credit system. In Computing communication technologies, research, innovation, and vision for the future (rivf), 2016 ieee rivf international conference on (pp. 55–60). [GS Search] doi: 10.1109/RIVF.2016.7800269
Hu, Y.-H., Lo, C.-L., Shih, S.-P. (2014). Developing early warning systems to predict students’ online learning performance. Computers in Human Behavior, 36, 469–478. [GS Search] doi: 10.1016/j.chb.2014.04.002
Jayaprakash, S. M., Moody, E. W., Lauría, E. J., Regan, J. R., Baron, J. D. (2014). Early alert of academically at-risk students: An open source analytics initiative. Journal of Learning Analytics, 1(1), 6–47. [GS Search] doi: 10.18608/jla.2014.11.3
Kampff, A. J. C. (2009). Mineração de dados educacionais para geração de alertas em ambientes virtuais de aprendizagem como apoio à prática docente. Retrieved from [Link] [GS Search]
Lagus, J. (2016). Course outcome prediction with transfer learning methods (Master’s thesis, University of Helsinki, Helsinki, Finland). [GS Search] doi: 10138/165915
Lara, J. A., Lizcano, D., Martínez, M. A., Pazos, J., & Riera, T. (2014). A system for knowledgediscovery in e-learning environments within the european higher education area–application tostudent data from open university of madrid, udima.Computers & Education,72, 23–36. [GS Search] doi: 10.1016/j.compedu.2013.10.009
Liu, B. (2011).Web data mining: Exploring hyperlinks, contents, and usage data. Springer Ber-lin Heidelberg. Retrieved from [Link] [GS Search]
Lu, J., Behbood, V., Hao, P., Zuo, H., Xue, S., & Zhang, G. (2015). Transfer learning usingcomputational intelligence: a survey. Knowledge-Based Systems, 80, 14–23. [GS Search] doi: 10.1016/j.knosys.2015.01.010
Macfadyen, L. P., & Dawson, S. (2010). Mining lms data to develop an “early warning system” for educators: A proof of concept. Computers & education, 54(2), 588–599. [GS Search] doi: 10.1016/j.compedu.2009.09.008
Manhães, L. M. B., Da Cruz, S. M. S., Costa, R. J. M., Zavaleta, J., & Zimbrão, G. (2011). Previsão de estudantes com risco de evasão utilizando técnicas de mineração de dados. In Brazilian symposium on computers in education (simpósio brasileiro de informática na educação-sbie) (Vol. 1). Retrieved from [Link] [GS Search]
Márquez-Vera, C., Cano, A., Romero, C., & Ventura, S. (2013). Predicting student failure atschool using genetic programming and different data mining approaches with high dimensional and imbalanced data. Applied intelligence, 38(3), 315–330. [GS Search] doi: 10.1007/s10489-012-0374-8
Martin, F. (2014 (accessed November 1, 2016)). A simple machine learning method to detectcovariate shift [Computer software manual]. Retrieved from [Link]
Moreno-Torres, J., Raeder, T., Alaiz-Rodríguez, R., Chawla, N., & Herrera, F. (2012). A unifyingview on dataset shift in classification. Pattern Recognition, 45(1), 521–530. [GS Search] doi: 10.1016/j.patcog.2011.06.019
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on knowledgeand data engineering, 1345–1359. [GS Search] doi: 10.1109/TKDE.2009.191
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., . . . others (2011).Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12(Oct), 2825–2830. Retrieved from [GS Search]
Peña-Ayala, A. (2013). Educational data mining: Applications and trends (Vol. 524). Springer. [GS Search]
Peña-Ayala, A. (2014). Educational data mining: A survey and a data mining-based analysis of recent works. Expert systems with applications, 41(4), 1432–1462. [GS Search] doi: 10.1016/j.eswa.2013.08.042
Quinlan, J. R. (1986, mar). Induction of decision trees. Machine Learning, 1(1), 81–106. [GSSearch] doi: 10.1007/BF00116251
Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., & Lawrence, N. D. (2009). Dataset shiftin machine learning. The MIT Press. [GS Search]
Ramaswami, M., & Bhaskaran, R. (2009). A study on feature selection techniques in educationaldata mining.Journal of Computing,1(1), 7–11. Retrieved from [Link] [GS Search]
Raza, H., Prasad, G., & Li, Y. (2015). Ewma model based shift-detection methods for detecting covariate shifts in non-stationary environments. Pattern Recognition, 48(3), 659–669. [GS Search] doi: 10.1016/j.patcog.2014.07.028
Rigo, S. J., Cambruzzi, W., Barbosa, J. L., & Cazella, S. C. (2014). Educational data mining and learning analytics applications in dropout: opportunities and challenges. Brazilian Journal of Computers in Education, 22(01), 132. [GS Search] doi: 10.5753/rbie.2014.22.01.132
Romero, C., López, M.-I., Luna, J.-M., & Ventura, S. (2013). Predicting students’ final perfor-mance from participation in on-line discussion forums. Computers & Education, 68, 458–472. [GS Search] doi: 10.1016/j.compedu.2013.06.009
Santos, J. L., Govaerts, S., Verbert, K., & Duval, E. (2012). Goal-oriented visualizations of activity tracking: a case study with engineering students. In Proceedings of the 2nd international conference on learning analytics and knowledge (pp. 143–152). [GS Search] doi: 10.1145/2330601.2330639
Siemens, G., & Baker, R. S. (2012). Learning analytics and educational data mining: towards communication and collaboration. In Proceedings of the 2nd international conference on learning analytics and knowledge (pp. 252–254). [GS Search] doi: 10.1145/2330601.2330661
Siemens, G., & Long, P. (2011). Penetrating the fog: Analytics in learning and education. EDUCAUSE review, 46(5), 30. Retrieved from [Link] [GS Search]
Simpson, O. (2004). The impact on retention of interventions to support distance learning students. Open Learning: The Journal of Open, Distance and e-Learning, 19(1), 79–95. [GS Search] doi: 10.1080/0268051042000177863
Sugiyama, M., Nakajima, S., Kashima, H., & Buenau, P. V. (2008). Direct importance estimation with model selection and its application to covariate shift adaptation. In Advances in neural information processing systems (pp. 1433–1440). [GS Search] doi: 10.1007/s10463-008-0197-x
Sun, B., Feng, J., & Saenko, K. (2016). Return of frustratingly easy domain adaptation. In Aaai conference on artificial intelligence(Vol. 6, p. 8). Retrieved from https://arxiv.org/abs/1511.05547 [GS Search]
Thammasiri, D., Delen, D., Meesad, P., & Kasap, N. (2014). A critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition. Expert Systems with Applications, 41(2), 321–330. [GS Search] doi: 10.1016/j.eswa.2013.07.046
Voß, L., Schatten, C., Mazziotti, C., & Schmidt-Thieme, L. (2015). A transfer learning approachfor applying matrix factorization to small its datasets. International Educational Data Mining Society. Retrieved from [Link] [GS Search]
You, J. W. (2016). Identifying significant indicators using lms data to predict course achievement in online learning. The Internet and Higher Education, 29, 23–30. [GS Search] doi: 10.1016/j.iheduc.2015.11.003
Zadrozny, B. (2004). Learning and evaluating classifiers under sample selection bias. In Proceedings of the 21th international conference on machine learning (p. 114). [GS Search] doi: 10.1145/1015330.101542525
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2019 Daniel A. Guimarães De Los Reyes, Everton André Thomas, Lilian Landvoigt da Rosa, Wilson P. Gavião Neto
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.