Complexity versus difficulty: An analysis of their correlation in programming questions in online judges

Authors

DOI:

https://doi.org/10.5753/rbie.2024.3587

Keywords:

Online Judges, Questions difficulty, Difficulty metrics, Complexity metrics, Machine Learning

Abstract

Automatic code correction environments are increasingly used in the teaching-learning process of programming disciplines. However, a problem often faced by teachers who use these systems is to determine the difficulty of the questions registered in the environment. This work aims to carry out a correlation analysis between code complexity metrics and the difficulty faced by students, so that it is possible to automatically predict the difficulty level of a question just by knowing its solution model. This study was divided into three stages: i) analysis of Spearman’s correlation between complexity metrics (extracted from the question) and difficulty (extracted from the student’s interaction with the question), ii) prediction of the difficulty class of questions through models machine learning for classification and iii) prediction of difficulty metrics using regression models. Regarding item i), it was observed that 96% of the correlations were weak or non-existent between individual metrics of code complexity and difficulty, 4% of cases of moderate correlation and no cases of strong correlation. For item ii), the highest f1-score obtained was 88%, considering classification with two levels of difficulty (“easy” and “hard”), and a maximum f1-score of 67%, considering classification with three levels (“easy”, “medium” and “hard”). For item iii), the best result obtained was an adjusted correlation coefficient of 63%.

Downloads

Download data is not yet available.

References

Anguita, D., Ghio, A., Ridella, S., & Sterpi, D. (2009). K-Fold Cross Validation for Error Rate Estimate in Support Vector Machines. DMIN, 291–297. [GS Search]

Bonner, S. E. (1994). A model of the effects of audit task complexity. Accounting, organizations and society, 19(3), 213–234. https://doi.org/10.1016/0361-3682(94)90033-7. [GS Search]

Carvalho, L. S. G., Oliveira, D. B. F., & Gadelha, B. (2016). Juiz online como ferramenta de apoio a uma metodologia de ensino hıbrido em programação. Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educação-SBIE), 27(1), 140–149. https://doi.org/10.5753/cbie.sbie.2016.140. [GS Search]

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321–357. https://doi.org/10.1613/jair.953. [GS Search]

Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785. [GS Search]

Dancey, C., & Reidy, J. (2007). Statistics Without Maths for Psychology. Pearson/Prentice Hall. [GS Search]

Di Bucchianico, A. (2008). Coefficient of determination (R 2). Encyclopedia of statistics in quality and reliability, 1. https://doi.org/10.1002/9780470061572.eqr173. [GS Search]

Effenberger, T., Cechák, J., & Pelánek, R. (2019). Difficulty and Complexity of Introductory Programming Problems. [GS Search]

Elnaffar, S. (2016). Using software metrics to predict the difficulty of code writing questions. 2016 ieee global engineering education conference (educon), 513–518. https://doi.org/10.1109/EDUCON.2016.7474601. [GS Search]

Francisco, R. E., Ambrósio, A. P. L., Junior, C. X. P., & Fernandes, M. A. (2018). Juiz online no ensino de CS1-lições aprendidas e proposta de uma ferramenta. Revista Brasileira de Informática na Educação, 26(03), 163. https://doi.org/10.48550/arXiv.2008.05756. [GS Search]

Grandini, M., Bagli, E., & Visani, G. (2020). Metrics for multi-class classification: An overview. https://doi.org/10.48550/arXiv.2008.05756. [GS Search]

INEP. (2017). Relatório Síntese de Área – Ciência da Computação (Bacharelado/Licenciatura) (rel. técn.) (Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira. [Link]

Lima, M. A. P., Carvalho, L. S. G., Oliveira, E. H. T., Oliveira, D. B. F., & Pereira, F. D. (2021). Uso de atributos de código para classificação da facilidade de questões de codificação. Anais do Simpósio Brasileiro de Educação em Computação, 113–122. https://doi.org/10.5753/educomp.2021.14477. [GS Search]

Liu, P., & Li, Z. (2012). Task complexity: A review and conceptualization framework. International Journal of Industrial Ergonomics, 42(6), 553–568. https://doi.org/10.1016/j.ergon.2012.09.001. [GS Search]

Llana, L., Martin-Martin, E., & Pareja-Flores, C. (2012). FLOP, a free laboratory of programming. Proceedings of the 12th Koli Calling International Conference on Computing Education Research, 93–99. https://doi.org/10.1145/2401796.2401807 . [GS Search]

Lorena, A. C., & Carvalho, A. C. (2007). Uma introdução às support vector machines. Revista de Informática Teórica e Aplicada, 14(2), 43–67. [GS Search]

Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., & Lee, S.-I. (2020). From local explanations to global understanding with explainable AI for trees. Nature machine intelligence, 2(1), 56–67. https://doi.org/10.1038/s42256-019-0138-9. [GS Search]

Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. Cambridge University Press. [GS Search]

McCabe, T. J. (1976). A complexity measure. IEEE Transactions on software Engineering, (4), 308–320. https://doi.org/10.1109/TSE.1976.233837. [GS Search]

Meisalo, V., Sutinen, E., & Torvinen, S. (2004). Classification of exercises in a virtual programming course. 34th Annual Frontiers in Education, 2004. FIE 2004., S3D–1. https://doi.org/10.1109/FIE.2004.1408764. [GS Search]

Myers, L., & Sirois, M. J. (2006). Spearman Correlation Coefficients, Differences between. Encyclopedia of Statistical Sciences. https://doi.org/10.1002/0471667196.ess5050.pub2. [GS Search]

Naser, M. Z., & Alavi, A. H. (2021). Error metrics and performance fitness indicators for artificial intelligence and machine learning in engineering and sciences. Archit. Struct. Constr. https://doi.org/10.1007/s44150-021-00015-8. [GS Search]

Paes, R. B., Malaquias, R., Guimarães, M., & Almeida, H. (2013). Ferramenta para a Avaliação de Aprendizado de Alunos em Programação de Computadores. Anais dos Workshops do Congresso Brasileiro de Informática na Educação, 2, 203–212. https://doi.org/10.5753/CBIE.WCBIE.2013.203. [GS Search]

Pelz, F. D., Jesus, E. A., & Raabe, A. L. (2012). Um mecanismo para correção automática de exercıcios práticos de programação introdutória. Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educaçao-SBIE), 23(1). [GS Search]

Pereira, F. D., Fonseca, S. C., Oliveira, E. H., Cristea, A. I., Bellhäuser, H., Rodrigues, L., Oliveira, D. B., Isotani, S., & Carvalho, L. S. (2021). Explaining Individual and Collective Programming Students’ Behavior by Interpreting a Black-Box Predictive Model. IEEE Access, 9, 117097–117119. https://doi.org/10.1109/ACCESS.2021.3105956. [GS Search]

Pereira, F. D., Junior, H. B., Rodriguez, L., Toda, A., Oliveira, E. H., Cristea, A. I., Oliveira, D. B., Carvalho, L. S., Fonseca, S. C., Alamri, A., et al. (2021). A recommender system based on effort: Towards minimising negative affects and maximising achievement in CS1 learning. International Conference on Intelligent Tutoring Systems, 466–480. https://doi.org/10.1007/978-3-030-80421-3_51. [GS Search]

Petit, J., Giménez, O., & Roura, S. (2012). Jutge. org: an educational programming judge. Proceedings of the 43rd ACM technical symposium on Computer Science Education, 445–450. https://doi.org/10.1145/2157136.2157267. [GS Search]

Prisco, A., dos Santos, R., Botelho, S., Tonin, N., & Bez, J. (2017). Using information technology for personalizing the computer science teaching. 2017 IEEE Frontiers in Education Conference (FIE), 1–7. https://doi.org/10.1109/FIE.2017.8190727. [GS Search]

Robinson, P. (2001). Task complexity, task difficulty, and task production: Exploring interactions in a componential framework. Applied linguistics, 22(1), 27–57. https://doi.org/10.1093/applin/22.1.27. [GS Search]

Rouse, W. B., & SH, R. (1979). Measures of Complexity of Fault Diagnosis Tasks. IEEE Transactions on Systems, Man, and Cybernetics, 9(11), 720–727. https://doi.org/10.1109/TSMC.1979.4310112. [GS Search]

Santos, P., Carvalho, L. S. G., Oliveira, E., & Fernandes, D. (2019). Classificação de dificuldade de questões de programação com base na inteligibilidade do enunciado. Simpósio Brasileiro de Informática na Educação (SBIE), 30(1), 1886. https://doi.org/10.5753/cbie.sbie.2019.1886. [GS Search]

Scarton, C. E., & Aluísio, S. M. (2010). Análise da Inteligibilidade de textos via ferramentas de Processamento de Língua Natural: adaptando as métricas do Coh-Metrix para o Português. Linguamática, 2(1), 45–61. [Link]. [GS Search]

Schölkopf, B., Smola, A. J., Williamson, R. C., & Bartlett, P. L. (2000). New support vector algorithms. Neural computation, 12(5), 1207–1245. https://doi.org/10.1162/089976600300015565. [GS Search]

Silva, É. S., Carvalho, L. S. G., Oliveira, D. B. F., Oliveira, E. H. T., Lauschner, T., P., L. M. A., & Pereira, F. D. (2022). Previsão de indicadores de dificuldade de questões de programação a partir de métricas extraídas do código de solução. Anais do XXXIII Simpósio Brasileiro de Informática na Educação (SBIE). https://doi.org/10.5753/sbie.2022.224724. [GS Search]

Vihavainen, A., Luukkainen, M., & Pärtel, M. (2013). Test my code: An automatic assessment service for the extreme apprenticeship method. 2nd International Workshop on Evidence-based Technology Enhanced Learning, 109–116. https://doi.org/10.1007/978-3-319-00554-6_14. [GS Search]

Viloria, A., Lezama, O. B. P., & Mercado-Caruzo, N. (2020). Unbalanced data processing using oversampling: Machine learning. Procedia Computer Science, 175, 108–113. https://doi.org/10.1016/j.procs.2020.07.018. [GS Search]

Wang, T., Su, X., Ma, P., Wang, Y., & Wang, K. (2011). Ability-training-oriented automated assessment in introductory programming course. Computers & Education, 56(1), 220–226. https://doi.org/10.1016/j.compedu.2010.08.003. [GS Search]

Wohlin, C. (2014). Guidelines for snowballing in systematic literature studies and a replication in software engineering. Proceedings of the 18th international conference on evaluation and assessment in software engineering, 1–10. https://doi.org/10.1145/2601248.2601268. [GS Search]

Published

2024-01-16

How to Cite

FERNANDES, J. C.; CARVALHO, L. S. G. de; OLIVEIRA, D. B. F. de; OLIVEIRA, E. H. T. de; PEREIRA, F. D.; LAUSCHNER, T. Complexity versus difficulty: An analysis of their correlation in programming questions in online judges. Brazilian Journal of Computers in Education, [S. l.], v. 32, p. 22–49, 2024. DOI: 10.5753/rbie.2024.3587. Disponível em: https://journals-sol.sbc.org.br/index.php/rbie/article/view/3587. Acesso em: 4 jul. 2024.

Issue

Section

Awarded Papers :: EduComp 2023