Development of AI-Assisted Teaching Tools for Computer Science Education

Authors

DOI:

https://doi.org/10.5753/rbie.2026.6311

Keywords:

Large Language Models, Computer Science Teaching Tools, Interactive Interfaces, Computer Science Education

Abstract

This work conducts a quantitative and qualitative analysis of large language models (LLMs) for computer science education, evaluating performance across various educational aspects. Initially, we evaluated the performance of four models (ChatGPT, Claude, Copilot, and Gemini) for developing graphical interfaces in Python within the Google Colab environment, exploring diverse request formats and defining metrics for comparison. Evaluation metrics include compilation, execution, and functionality errors, as well as request size, number of interactions, and number of generated code lines. From over 350 trials, 240 functional tools were generated, achieving an 84% success rate. Subsequently, we evaluated the capability for translating code from Python to JavaScript, exploring varied resources for graphical interfaces in browsers and LLM performance for JavaScript. We also assessed performance for interfaces with editing capabilities for graph problems and creation of domain-specific languages for teaching graphs and machine learning models. We further developed examples of other applications such as quiz generation, quiz evaluation, and generated code assessment. The evaluated examples span diverse domains, all documented and made available as a dataset. Results indicate significant acceleration in creating interactive and engaging educational tools, highlighting promising directions for using LLMs in developing educational resources for computer science.

Downloads

Download data is not yet available.

References

Al-Shetairy, M., Hindy, H., Khattab, D., & Aref, M. M. (2024). Transformers Utilization in Chart Understanding: A Review of Recent Advances & Future Trends. arXiv preprint arXiv:2410.13883. https://doi.org/10.48550/arXiv.2410.13883 [GS Search].

Barbosa, L. L., Couto, C. M. S., & Terra, R. (2016). PortuCol: uma pseudo linguagem inspirada em C ANSI para o Ensino de Lógica de Programação e Algoritmos. Workshop sobre Educação em Computação (WEI), 2343–2352. https://doi.org/10.5753/wei.2016.9678 [GS Search].

Bender, E. M., Gebru, T., McMillan-Major, A., & Mitchell, M. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT). https://doi.org/10.1145/3442188.3445922 [GS Search].

Cai, Y., Mao, S., Wu, W., Wang, Z., Liang, Y., Ge, T., Wu, C., You, W., Song, T., Xia, Y., Duan, N., & Wei, F. (2023). Low-code LLM: Graphical User Interface over Large Language Models. arXiv preprint arXiv:2304.08103. https://doi.org/10.18653/v1/2024.naacl-demo.2 [GS Search].

Canesche, M., Bragança, L., Vilela Neto, O. P., Nacif, J. A., & Ferreira, R. (2021). Google Colab CAD4U: Hands-on cloud laboratories for digital design. 2021 IEEE International Symposium on Circuits and Systems (ISCAS), 1–5. https://doi.org/10.1109/ISCAS51556.2021.9401151 [GS Search].

Chen, B., Zhang, Z., Langrené, N., & Zhu, S. (2023). Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review. arXiv preprint arXiv:2310.14735. https://doi.org/10.1016/j.patter.2025.101260 [GS Search].

Chen, C., Sonnert, G., Sadler, P. M., & Malan, D. J. (2020). Computational thinking and assignment resubmission predict persistence in a computer science MOOC. Journal of Computer Assisted Learning, 36(5), 581–594. https://doi.org/10.1111/jcal.12427 [GS Search].

Chen, L., Guo, Q., Jia, H., Zeng, Z., Wang, X., Xu, Y., Wu, J., Wang, Y., Gao, Q., Wang, J., Ye, W., & Zhang, S. (2024). A survey on evaluating large language models in code generation tasks. arXiv preprint arXiv:2408.16498. https://doi.org/10.48550/arXiv.2408.16498 [GS Search].

Chu, Z., Wang, S., Xie, J., Zhu, T., Yan, Y., Ye, J., Zhong, A., Hu, X., Liang, J., Yu, P. S., & Wen, Q. (2025a). LLM Agents for Education: Advances and Applications. arXiv preprint arXiv:2503.11733. [GS Search].

Chu, Z., Wang, S., Xie, J., Zhu, T., Yan, Y., Ye, J., Zhong, A., Hu, X., Liang, J., Yu, P. S., & Wen, Q. (2025b). LLM agents for education: Advances and applications. arXiv preprint arXiv:2503.11733. https://doi.org/10.18653/v1/2025.findings-emnlp.743 [GS Search].

Coura, P., Freitas, I., Costa, H., Nacif, J., & Ferreira, R. (2025). Desmistificando o Ensino de Inteligência Artificial e Aprendizado de Máquina. Simpósio Brasileiro de Educação em Computação (EDUCOMP), 25–27. https://doi.org/10.5753/educomp_estendido.2025.6578 [GS Search].

Del, M., & Fishel, M. (2022). True detective: a deep abductive reasoning benchmark undoable for GPT-3 and challenging for GPT-4. arXiv preprint arXiv:2212.10114. https://doi.org/10.18653/v1/2023.starsem-1.28 [GS Search].

Elon University. (2025, março). Survey: 52% of U.S. adults now use AI large language models like ChatGPT. Disponível em: [Link]. Acessado em: 24 de março de 2026.

Ferreira, R., Canesche, M., Jamieson, P., Vilela Neto, O. P., & Nacif, J. A. (2024). Examples and tutorials on using Google Colab and Gradio to create online interactive student-learning modules. Computer Applications in Engineering Education, e22729. https://doi.org/10.1002/cae.22729 [GS Search].

Ferreira, R., Sabino, C., Canesche, M., Vilela Neto, O. P., & Nacif, J. A. (2024). AIoT tool integration for enriching teaching resources and monitoring student engagement. Internet of Things, 26, 101045. https://doi.org/10.1016/j.iot.2023.101045 [GS Search].

Figueiredo, G. A. R., Souza, E. S., Rodrigues, J. H. F., Nacif, J. A., & Ferreira, R. (2024). Desenvolvendo Ferramentas para Ensino de RISC-V com Python, Verilog, Matplotlib, SVG e ChatGPT. International Journal of Computer Architecture Education, 13(1), 43–52. https://doi.org/10.5753/ijcae.2024.5343 [GS Search].

Floridi, L., & Cowls, J. (2022). A unified framework of five principles for AI in society. Machine learning and the city: Applications in architecture and urban design, 535–545. https://doi.org/10.1162/99608f92.8cd550d1 [GS Search].

Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Iii, H. D., & Crawford, K. (2021). Datasheets for datasets. Communications of the ACM, 64(12), 86–92. https://doi.org/10.1145/3458723 [GS Search].

Hu, B., Zhu, J., Pei, Y., & Gu, X. (2025). Exploring the potential of LLM to enhance teaching plans through teaching simulation. npj Science of Learning, 12(1), 1–12. https://doi.org/10.1038/s41539-025-00300-x [GS Search].

Jimenez, C. E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O., & Narasimhan, K. (2023). Swebench: Can language models resolve real-world GitHub issues? arXiv preprint arXiv:2310.06770. https://doi.org/10.48550/arXiv.2310.06770 [GS Search].

Joel, S., Wu, J. J., & Fard, F. H. (2024). A survey on LLM-based code generation for low-resource and domain-specific programming languages. arXiv preprint arXiv:2410.03981. https://doi.org/10.1145/3770084 [GS Search].

Joshi, S. (2025). Open-Source vs. Commercial Coding Assistants: A 2025 Comparison of DeepSeek R1, Qwen 2.5 and Claude 3.7. International Journal of Computer Applications Technology and Research, 14(09), 6–18. [GS Search].

Julio, J. P. F., Campano Junior, M. M., Aylon, L. B. R., Fonseca, K. O., & Emmendörfer, L. R. (2024). Jogos educativos para Estruturas de Dados: Um Mapeamento Sistemático. Simpósio Brasileiro de Jogos e Entretenimento Digital (SBGames), 1186–1199. https://doi.org/10.5753/sbgames.2024.240933 [GS Search].

Kazemitabaar, M., Hou, X., Henley, A., Ericson, B. J., Weintrop, D., & Grossman, T. (2023). How novices use LLM-based code generators to solve CS1 coding tasks in a self-paced learning environment. Proceedings of the 23rd Koli calling international conference on computing education research, 1–12. https://doi.org/10.1145/3631802.3631806 [GS Search].

Khowaja, S. A., Khuwaja, P., Dev, K., Wang, W., & Nkenyereye, L. (2024). ChatGPT needs spade (sustainability, privacy, digital divide, and ethics) evaluation: A review. Cognitive Computation, 1–23. https://doi.org/10.1007/s12559-024-10285-1 [GS Search].

Kiesler, N., & Schiffner, D. (2023). Large Language Models in Introductory Programming Education: ChatGPT's Performance and Implications for Assessments. arXiv preprint arXiv:2308.08572. https://doi.org/10.48550/arXiv.2308.08572 [GS Search].

Kumar, K. P., Reddy, G. A., Vignesh, D., Pavan, A., Vamsi, T., Rishi, K., & Kumar, K. S. (2025). Automated Question Paper Generator Using LLM. International Journal of Research and Innovation in Applied Science, 10(4), 266–275. https://doi.org/10.51584/IJRIAS.2025.10040020 [GS Search].

Lisboa, M. O., Costa, H., Coura, P., Freitas, I., Villela, M. L. B., & Ferreira, R. (2025). Modelos Generativos de Linguagem na Construção de Ferramentas de Ensino de Computação com Interface Gráfica. Simpósio Brasileiro de Educação em Computação (EDUCOMP), 639–650. https://doi.org/10.5753/educomp.2025.4927 [GS Search].

Liu, J., Liu, A., Lu, X., Welleck, S., West, P., Bras, R. L., Choi, Y., & Hajishirzi, H. (2021). Generated knowledge prompting for commonsense reasoning. arXiv preprint arXiv:2110.08387. https://doi.org/10.18653/v1/2022.acl-long.225 [GS Search].

Logan IV, R. L., Balažević, I., Wallace, E., Petroni, F., Singh, S., & Riedel, S. (2021). Cutting down on prompts and parameters: Simple few-shot learning with language models. arXiv preprint arXiv:2106.13353. https://doi.org/10.18653/v1/2022.findings-acl.222 [GS Search].

Lyu, W., Wang, Y., Sun, Y., & Zhang, Y. (2025). Will Your Next Pair Programming Partner Be Human? An Empirical Evaluation of Generative AI as a Collaborative Teammate in a Semester-Long Classroom Setting. Proceedings of the Twelfth ACM Conference on Learning@ Scale, 83–94. https://doi.org/10.1145/3698205.3729544 [GS Search].

Ma, X., Liu, Q., Jiang, D., Zhang, G., Ma, Z., & Chen, W. (2025). General-reasoner: Advancing llm reasoning across all domains. arXiv preprint arXiv:2505.14652. https://doi.org/10.48550/arXiv.2505.14652 [GS Search].

Martins, F. L. B., Oliveira, A. C. A., Vasconcelos, D. R., & Menezes, M. V. (2025). Avaliando a habilidade do ChatGPT de realizar provas de Dedução Natural em Lógica Proposicional e Lógica de Predicados. Revista Brasileira de Informática na Educação, 33, 244–278. https://doi.org/10.5753/rbie.2025.4500 [GS Search].

Martins, R. M., von Wangenheim, C. G., Rauber, M. F., & Hauck, J. C. (2024). Machine learning for all!—introducing machine learning in middle and high school. International Journal of Artificial Intelligence in Education, 34(2), 185–223. https://doi.org/10.1007/s40593-022-00325-y [GS Search].

Mitchell, M. (2019). Artificial intelligence: A guide for thinking humans. Penguin UK. [GS Search].

Mutanga, M. B., Msane, J., Mndaweni, T. N., Hlongwane, B. B., & Ngcobo, N. Z. (2025). Exploring the Impact of LLM Prompting on Students' Learning. Trends in Higher Education, 4(3), 31. https://doi.org/10.3390/higheredu4030031 [GS Search].

Pan, R., Ibrahimzada, A. R., Krishna, R., Sankar, D., Wassi, L. P., Merler, M., Sobolev, B., Pavuluri, R., Sinha, S., & Jabbarvand, R. (2024). Lost in translation: A study of bugs introduced by large language models while translating code. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, 1–13. https://doi.org/10.1145/3597503.3639226 [GS Search].

Park, J., Teo, T. W., Teo, A., Chang, J., Huang, J. S., & Koo, S. (2023). Integrating artificial intelligence into science lessons: Teachers' experiences and views. International Journal of STEM Education, 10(1), 61. https://doi.org/10.1186/s40594-023-00454-3 [GS Search].

Pereira Filho, L. C., Souza, T. P. C., & Paula, L. B. (2025). Análise das Respostas de LLMs em Relação ao Conteúdo Introdutório de Programação: um Comparativo entre o ChatGPT e o Gemini. Revista Brasileira de Informática na Educação, 33, 722–747. https://doi.org/10.5753/rbie.2025.4477 [GS Search].

Phogat, R., Arora, D., Mehra, P. S., Sharma, J., & Chawla, D. (2025). A Comparative Study of Large Language Models: ChatGPT, DeepSeek, Claude and Qwen. 2025 3rd International Conference on Device Intelligence, Computing and Communication Technologies (DICCT), 609–613. https://doi.org/10.1109/DICCT64131.2025.10986449 [GS Search].

Russo, F. A. I., Rabelo e Sant'Anna, N., & Imai, R. H. (2023). Relato de experiência educacional com o uso de inteligências artificiais sintetizadoras de imagens: debate sobre avanços recentes e possibilidades em síntese criativa. Revista Brasileira de Informática na Educação, 31, 814–828. https://doi.org/10.5753/rbie.2023.2914 [GS Search].

Sato, Y., Suzuki, A., & Mineshima, K. (2024). Building a Large Dataset of Human-Generated Captions for Science Diagrams. International Conference on Theory and Application of Diagrams, 393–401. https://doi.org/10.1007/978-3-031-71291-3_32 [GS Search].

UFV. (2025). Universidade Federal de Viçosa - Material Complementar. [Link].

Vasconcelos, V. R. C., & Frota, D. A. (2025). IA como Aliada da Criatividade Humana: O Desenvolvimento do Aplicativo "Terra e Universo" com Auxílio do ChatGPT. Revista Brasileira de Informática na Educação, 33, 451–471. https://doi.org/10.5753/rbie.2025.5204 [GS Search].

Verhalen, L. E. C., Castro, M. M. M., & Maciel, C. (2025). Percepções e Ferramentas sobre Recriação Digital de Educadores por meio de Inteligência Artificial: Uma Revisão Sistemática de Literatura. Revista Brasileira de Informática na Educação, 33, 565–582. https://doi.org/10.5753/rbie.2025.5279 [GS Search].

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35, 24824–24837. [GS Search].

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., & Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382. https://doi.org/10.48550/ARXIV.2302.11382 [GS Search].

Xu, X., Tao, C., Shen, T., Xu, C., Xu, H., Long, G., Lou, J.-g., & Ma, S. (2023). Re-reading improves reasoning in language models. arXiv preprint arXiv:2309.06275. [GS Search].

Yan, L., Sha, L., Zhao, L., Li, Y., Martinez-Maldonado, R., Chen, G., Li, X., Jin, Y., & Gašević, D. (2024). Practical and ethical challenges of large language models in education: A systematic scoping review. British Journal of Educational Technology, 55(1), 90–112. https://doi.org/10.1111/bjet.13370 [GS Search].

Yan, Y.-M., Chen, C.-Q., Hu, Y.-B., & Ye, X.-D. (2025). LLM-based collaborative programming: impact on students' computational thinking and self-efficacy. Humanities and Social Sciences Communications, 12(1), 1–12. https://doi.org/10.1057/s41599-025-04471-1 [GS Search].

Yang, Z., Li, L., Wang, J., Lin, K., Azarnasab, E., Ahmed, F., Liu, Z., Liu, C., Zeng, M., & Wang, L. (2023). MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action. arXiv preprint arXiv:2303.11381. https://doi.org/10.48550/arXiv.2303.11381 [GS Search].

Yang, Z., & Zhu, Z. (2024). Heuristic question sequence generation based on retrieval augmentation. Education and Lifelong Development Research. https://doi.org/10.46690/elder.2024.02.03 [GS Search].

Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T., Cao, Y., & Narasimhan, K. (2024). Tree of thoughts: Deliberate problem solving with large language models. Advances in Neural Information Processing Systems, 36. [GS Search].

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629. https://doi.org/10.48550/ARXIV.2210.03629 [GS Search].

Yuan, Y., Li, Z., & Zhao, B. (2025). A survey of multimodal learning: Methods, applications, and future. ACM Computing Surveys, 57(7), 1–34. https://doi.org/10.1145/3713070 [GS Search].

Zala, A., Lin, H., Cho, J., & Bansal, M. (2023). DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning. arXiv preprint arXiv:2310.12128. https://doi.org/10.48550/arXiv.2310.12128 [GS Search].

Zanini, A. S., & Raabe, A. L. A. (2012). Análise dos enunciados utilizados nos problemas de programação introdutória em cursos de Ciência da Computação no Brasil. Workshop sobre Educação em Computação (WEI), 11–20. [GS Search].

Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., & Chi, E. (2022). Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625. https://doi.org/10.48550/arXiv.2205.10625 [GS Search].

Published

2026-03-25

How to Cite

ARAÚJO, H. C.; LISBOA, M. O.; PEREIRA, P. H. C.; FREITAS, I. de C.; VILLELA, M. L. B.; FERREIRA, R. Development of AI-Assisted Teaching Tools for Computer Science Education. Brazilian Journal of Computers in Education, [S. l.], v. 34, p. 279–313, 2026. DOI: 10.5753/rbie.2026.6311. Disponível em: https://journals-sol.sbc.org.br/index.php/rbie/article/view/6311. Acesso em: 3 apr. 2026.

Issue

Section

Artigos Premiados :: EduComp