A Genetic Algorithm with Flexible Fitness Function for Feature Selection in Educational Data: Comparative Evaluation
DOI:
https://doi.org/10.5753/jidm.2022.2480Keywords:
Feature Selection, Genetic Algorithm, Educational Data MiningAbstract
Educational Data Mining is an interdisciplinary field that helps understand educational phenomena through computational techniques. The databases of educational institutions are usually extensive, possessing many descriptive attributes that make the prediction process complex. In addition, the data can be sparse, redundant, irrelevant, and noisy, which can degrade the predictive quality of the models and affect computational performance. One way to simplify the problem is to identify the least important attributes and omit them from the modeling process. This can be performed by employing attribute selection techniques. This work evaluates different feature selection techniques applied to open educational data and paired alongside a genetic algorithm with a flexible fitness function. The methods and results described herein extend a previously published paper by: (i) describing a larger set of computational experiments; (ii) performing a hypothesis test over different classifiers; and (iii) presenting a more in-depth literature revision. The results obtained indicate an improvement in the classification process.
Downloads
References
Abid, A., Kallel, I., Blanco, I., and Benayed, M. Selecting relevant educational attributes for predicting students’ academic performance. Advances in Intelligent Systems and Computing vol. 736, pp. 650–660, 2018.
Abu Amra, I. A. and Maghari, A. Y. A. Students performance prediction using knn and naïve bayesian. In 2017 8th International Conference on Information Technology (ICIT). pp. 909–913, 2017.
Ahmed, M., Tahid, S., Mitu, N., Kundu, P., and Yeasmin, S. A comprehensive analysis on undergraduate student academic performance using feature selection techniques on classification algorithms. 11th International Conference on Computing, Communication and Networking Technologies, ICCCNT 2020 , 2020.
Ahmed, S., Al-Hamdani, R., and Croock, M. Edm preprocessing and hybrid feature selection for improving classification accuracy. Journal of Theoretical and Applied Information Technology 97 (1): 279–289, 2019.
Ajibade, S.-S., Ahmad, N., and Shamsuddin, S. An heuristic feature selection algorithm to evaluate academic performance of students. ICSGRC 2019 - 2019 IEEE 10th Control and System Graduate Research Colloquium, Proceeding, 2019.
Almasri, A., Alkhawaldeh, R., and Çelebi, E. Clustering-based emt model for predicting student performance. Arabian Journal for Science and Engineering 45 (12): 10067–10078, 2020.
Amrieh, E. A., Hamtini, T., and Aljarah, I. Mining educational data to predict student’s academic performance using ensemble methods. International Journal of Database Theory and Application 9 (8): 119–136, 2016.
Chandrashekar, G. and Sahin, F. A survey on feature selection methods. Computers & Electrical Engineering 40 (1): 16–28, 2014. 40th-year commemorative issue.
Chaudhury, P. and Tripathy, H. A novel academic performance estimation model using two stage feature selection. Indonesian Journal of Electrical Engineering and Computer Science 19 (3): 1610–1619, 2020.
Chaves, V., Garcia Torres, M., Alonso, D., Gómez-Vela, F., Divina, F., and Vazquez Noguera, J. Analysis of student achievement scores via cluster analysis. 11th International Conference on EUropean Transnational Educational (ICEUTE 2020). Advances in Intelligent Systems and Computing vol. 1266, pp. 399–408, 2021.
Chen, B., Hong, J., and Wang, Y. The minimum feature subset selection problem. Journal of Computer Science and Technology 12 (2): 145–153, 1997.
Chetana, V., Kolisetty, S. S., and Amogh, K. A Short Survey of Dimensionality Reduction Techniques. Recent Advances in Computer Based Systems, Processes and Applications, CRC Press, 2020.
Das, D., Shakir, A., Rabbani, M., Rahman, M., Shaharum, S., Khatun, S., Fadilah, N., Qaiduzzaman, K., Islam, M., and Arman, M. A comparative analysis of four classification algorithms for university students performance detection. Lecture Notes in Electrical Engineering vol. 632, pp. 415–424, 2020.
Dash, M. and Liu, H. Feature selection for classification. Intelligent Data Analysis 1 (1): 131 – 156, 1997.
Davies, S. and Russell, S. J. NP-completeness of searches for smallest possible feature sets. In AAAI Symposium on Intelligent Relevance. AAAI Press, pp. 37–39, 1994.
de Albuquerque, D., Brandão, D., and Coutinho, R. Um algoritmo genético com função de aptidão flexível para seleção de atributos em dados educacionais. In Anais do XXXVI Simpósio Brasileiro de Bancos de Dados. SBC, Porto Alegre, RS, Brasil, pp. 355–360, 2021.
de O. Santos, K. J., Menezes, A. G., de Carvalho, A. B., and Montesco, C. A. E. Supervised learning in the context of educational data mining to avoid university students dropout. In 2019 IEEE 19th International Conference on Advanced Learning Technologies (ICALT). Vol. 2161-377X. pp. 207–208, 2019.
Dimic, G., Rancic, D., Macek, N., Spalevic, P., and Drasute, V. Improving the prediction accuracy in blended learning environment using synthetic minority oversampling technique. Information Discovery and Delivery 47 (2): 76–83, 2019.
Enaro, A. and Chakraborty, S. Feature selection algorithms for predicting students academic performance using data mining techniques. International Journal of Scientific and Technology Research 9 (4): 3622–3626, 2020.
Farissi, A., Dahlan, H. M., and Samsuryadi. Genetic Algorithm Based Feature Selection for Predicting Student’s Academic Performance. Emerging Trends in Intelligent Computing and Informatics, 2020.
Febro, J. Utilizing feature selection in identifying predicting factors of student retention. International Journal of Advanced Computer Science and Applications vol. 10, 01, 2019.
Fix, E. and Jr, J. L. H. Significance Probabilities of the Wilcoxon Test. The Annals of Mathematical Statistics 26 (2): 301 – 312, 1955.
Gitinabard, N., Khoshnevisan, F., Lynch, C., and Wang, E. Your actions or your associates? predicting certification and dropout in moocs with behavioral and social features. Proceedings of the 11th International Conference on Educational Data Mining, EDM 2018 , 2018.
Gopalakrishnan, A., Kased, R., Yang, H., Love, M., Graterol, C., and Shada, A. A multifaceted data mining approach to understanding what factors lead college students to persist and graduate. Proceedings of Computing Conference 2017 vol. 2018-January, pp. 372–381, 2018.
Govindasamy, K. and Velmurugan, T. Preprocessing and feature extraction process in predicting students performance using clustering technique. International Journal of Recent Technology and Engineering 8 (1): 2407–2413, 2019.
Guyon, I. and Elisseeff, A. An introduction to variable and feature selection. Journal of Machine Learning Research 3 1157-1182 , 2003.
Han, J., Kamber, M., and Pei, J. Data mining concepts and techniques, third edition. Morgan Kaufmann Publishers, Waltham, Mass., 2012.
Hancock, J. and Khoshgoftaar, T. Survey on categorical data for neural networks. Journal of Big Data 28 (7), 2020.
Hashemi, H. Z., Parvasideh, P., Larijani, Z. H., and Morad, F. Analyze students performance of a national exam using feature selection methods. In 2018 8th International Conference on Computer and Knowledge Engineering (ICCKE). pp. 7–11, 2018.
Hassan, H., Anuar, S., and Ahmad, N. Students’ performance prediction model using meta-classifier approach. Communications in Computer and Information Science vol. 1000, pp. 221–231, 2019.
Haykin, S. Kalman filtering and neural networks. Vol. 47. John Wiley & Sons, 2004.
Huang, L., Wang, X., Wu, Z., and Wang, F. Feature selection for clustering online learners. In 2019 Eighth International Conference on Educational Innovation through Technology (EITT). pp. 1–6, 2019.
Hung, H.-C., Liu, I.-F., Liang, C.-T., and Su, Y.-S. Applying educational data mining to explore students’ learning patterns in the flipped learning approach for coding education. Symmetry 12 (2), 2020.
Jalota, C. and Agrawal, R. Analysis of educational data mining using classification. In 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon). pp. 243–247, 2019.
Jalota, C. and Agrawal, R. Feature selection algorithms and student academic performance: A study. Advances in Intelligent Systems and Computing vol. 1165, pp. 317–328, 2021.
Kohavi, R. and John, G. H. Wrappers for feature subset selection. Artificial Intelligence 97 (1-2): 273–324, 1997.
Komatsu, A. Comparação dos poderes dos teste t de student e mann-whitney wilcoxon pelo método de monte carlo. vol. VI, pp. 121–127, 12, 2017.
Muchuchuti, S., Narasimhan, L., and Sidume, F. Classification model for student performance amelioration. Lecture Notes in Networks and Systems vol. 69, pp. 742–755, 2020.
Murthy, S. K. Automatic construction of decision trees from data: A multi-disciplinary survey. Data mining and knowledge discovery 2 (4): 345–389, 1998.
Niu, Z., Li, W., Yan, X., and Wu, N. Exploring causes for the dropout on massive open online courses. In Proceedings of ACM Turing Celebration Conference - China. TURC ’18. Association for Computing Machinery, New York, NY, USA, pp. 47–52, 2018.
Poudyal, S., Nagahi, M., Nagahisarchoghaei, M., and Ghanbari, G. Machine learning techniques for determining students’ academic performance: A sustainable development case for engineering education. In 2020 International Conference on Decision Aid Sciences and Application, DASA 2020. pp. 920–924, 2020.
Prabha, D., Siva Subramanian, R., Balakrishnan, S., and Karpagam, M. Performance evaluation of naive bayes classifier with and without filter based feature selection. International Journal of Innovative Technology and Exploring Engineering 8 (10): 2154–2158, 2019.
Punlumjeak, W. and Rachburee, N. A comparative study of feature selection techniques for classify student performance. Proceedings - 2015 7th International Conference on Information Technology and Electrical Engineering: Envisioning the Trend of Computer, Information and Engineering, ICITEE 2015 , 2015.
Rachburee, N. and Punlumjeak, W. A comparison of feature selection approach between greedy, ig-ratio, chi-square, and mrmr in educational mining. Proceedings - 2015 7th International Conference on Information Technology and Electrical Engineering: Envisioning the Trend of Computer, Information and Engineering, ICITEE 2015 , 2015.
Ramaswami, M. and Bhaskaran, R. A Study on Feature Selection Techniques in Educational Data Mining. Journal of computing 1 (1): 7–11, 2009.
Romero, C. and Ventura, S. Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2010.
Santos, G. A., Belloze, K. T., Tarrataca, L., Haddad, D. B., Bordignon, A. L., and Brandao, D. N. EvolveDTree: Analyzing Student Dropout in Universities. International Conference on Systems, Signals, and Image Processing vol. 2020-July, pp. 173–178, 2020.
Sasi Regha, R. and Uma Rani, R. A novel clustering based feature selection for classifying student performance. Indian Journal of Science and Technology vol. 8, pp. 135–140, 2015.
Sharma, H. and Kumar, S. A survey on decision tree algorithms of classification in data mining. International Journal of Science and Research (IJSR) 5 (4): 2094–2097, 2016.
Singh, S. and Selvakumar, S. A hybrid feature subset selection by combining filters and genetic algorithm. In ICCCA. pp. 283–289, 2015.
Sokkhey, P. and Okazaki, T. Study on dominant factor for academic performance prediction using feature selection methods. International Journal of Advanced Computer Science and Applications 11 (8): 492–502, 2020.
Tan, F., Fu, X., Zhang, Y., and Bourgeois, A. G. A genetic algorithm-based method for feature subset selection. Soft Computing 12 (2): 111–120, 2008.
Teodoro, L. d. A. and Kappel, M. A. Aplicação de técnicas de aprendizado de máquina para predição de risco de evasão escolar em instituições públicas de ensino superior no brasil. Revista Brasileira de Informática na Educação 28 (0): 838–863, 2020.
Ullah, A., Khan, F. H., Qamar, U., and Bashir, S. dimensionality reduction approaches and evolving challenges in high dimensional data. ACM International Conference Proceeding Series, 2017.
Velliangiri, S., Alagumuthukrishnan, S., and Thankumar joseph, S. I. A review of dimensionality reduction techniques for efficient computation. Procedia Computer Science vol. 165, pp. 104–111, 2019.
Venkatesh, B. and Anuradha, J. A review of feature selection and its methods. Cybernetics and Information Technologies 19 (1): 3–26, 2019.
Wafi, M., Faruq, U., and Supianto, A. Automatic feature selection for modified k-nearest neighbor to predict student’s academic performance. Proceedings of 2019 4th International Conference on Sustainable Information Engineering and Technology, SIET 2019 , 2019.
Wang, S., Tang, J., and Liu, H. Embedded unsupervised feature selection. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 29, 2015.
Zaffar, M., Hashmani, M. A., and Savita, K. S. Performance analysis of feature selection algorithm for educational data mining. 2017 IEEE Conference on Big Data and Analytics, ICBDA 2017 vol. 2018-January, pp. 7–12, 2018.
Šarić Grgić, I., Grubišić, A., Šerić, L., and Robinson, T. Student clustering based on learning behavior data in the intelligent tutoring system. International Journal of Distance Education Technologies 18 (2): 73–89, 2020.