Avaliando a habilidade do ChatGPT de realizar provas de Dedução Natural em Lógica Proposicional e Lógica de Predicados
DOI:
https://doi.org/10.5753/rbie.2025.4500Keywords:
Dedução Natural, Lógica Proposicional, Lógica de Predicados, Raciocínio Lógico, Modelos de Linguagem de Larga Escala, ChatGPTAbstract
Com a recente popularização dos Grandes Modelos de Linguagem, como o ChatGPT, muitos estudos têm sido conduzidos no sentido de explorar a utilização destas ferramentas para auxiliar no processo de ensino e aprendizagem. Esses sistemas têm a capacidade de compreender e processar enormes quantidades de dados, o que lhes permite, por exemplo, fornecer suporte individualizado aos alunos na resolução de exercícios. Entretanto, é fundamental considerar que esses sistemas geralmente possuem um processo de raciocínio baseado em métodos estatísticos, como algoritmos de aprendizado de máquina, que podem alucinar e resultar em respostas incorretas em tarefas, por exemplo, que exigem raciocínio lógico. Este artigo tem como objetivo avaliar a capacidade do ChatGPT na resolução de exercícios de Dedução Natural em Lógica Proposicional e Lógica de Predicados. Para tanto, foram conduzidos experimentos utilizando uma base de dados de exercícios de dedução natural e os resultados confirmam que o processo de raciocínio da ferramenta ainda não se mostra adequado para solucionar este tipo de exercício, tornando essa ferramenta, por ora, inadequada para essa tarefa.
Downloads
Referências
Ando, R., Morishita, T., Abe, H., Mineshima, K., & Okada, M. (2023, junho). Evaluating Large Language Models with NeuBAROCO: Syllogistic Reasoning Ability and Human-like Biases Em S. Chatzikyriakidis & V. de Paiva (Ed.), Proceedings of the 4th Natural Logic Meets Machine Learning Workshop (pp. 1–11). Association for Computational Linguistics. [link] [GS Search]
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems (NeurIPS), 33, 1877–1901. [link] [[GS Search].
Carl, M. (2023). Using large language models for (de-) formalization and natural argumentation exercises for beginner’s students. arXiv preprint arXiv:2304.06186. https://doi.org/10.48550/arXiv.2304.06186 [GS Search]
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., Fiedel, N. (2023). PaLM: Scaling Language Modeling with Pathways. Journal of Machine Learning Research, 24(240), 1–113. [link] [GS Search]
Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., Brahma, S., Webson, A., Gu, S. S., Dai, Z., Suzgun, M., Chen, X., Chowdhery, A., Castro-Ros, A., Pellat, M., Robinson, K., Wei, J. (2024). Scaling Instruction-Finetuned Language Models. Journal of Machine Learning Research, 25(70), 1–53. [link] [GS Search]
Coq, P. (1996). The Coq Proof Assistant-Reference Manual. INRIA Rocquencourt and ENS Lyon, version, 5. [GS Search]
De Moura, L., Kong, S., Avigad, J., Van Doorn, F., & von Raumer, J. (2015). The Lean theorem prover (system description). Automated Deduction-CADE-25: 25th International Conference on Automated Deduction, Berlin, Germany, August 1-7, 2015, Proceedings 25, 378–388. https://doi.org/10.1007/978-3-319-21401-6_26 [GS Search]
Enderton, H. B. (1972). A mathematical introduction to logic. Academic Press. https://doi.org/10.2307/2272104 [GS Search]
Ferreirós, J. (2001). The road to modern logic—an interpretation. Bulletin of Symbolic Logic, 7(4), 441–484. https://doi.org/10.2307/268779 [GS Search]
First, E., Rabe, M., Ringer, T., & Brun, Y. (2023). Baldur: Whole-proof generation and repair with large language models. Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1229–1241. https://doi.org/10.1145/3611643.3616243 [GS Search]
Hammer, E. M. (1998). Semantics for existential graphs. Journal of Philosophical Logic, 27, 489–503. https://doi.org/10.1023/A:1017908108789 [GS Search]
Han, J. M., Rute, J., Wu, Y., Ayers, E. W., & Polu, S. (2022). Proof Artifact Co-training for Theorem Proving with Language Models. International Conference on Learning Representations. https://doi.org/10.48550/arXiv.2102.06203 [GS Search]
Huth, M., & Ryan, M. (2004). Logic in Computer Science: Modelling and Reasoning about Systems (2nd Ed.). Cambridge University Press. https://doi.org/10.1017/CBO9780511810275 [GS Search]
Jiang, A. Q., Li, W., Han, J. M., & Wu, Y. (2021). Lisa: Language models of ISAbelle proofs. 6th Conference on Artificial Intelligence and Theorem Proving (AITP), 378–392. [link] [GS Search]
Jiang, A. Q., Li, W., Tworkowski, S., Czechowski, K., Odrzygóźdź, T., Miłoś, P., Wu, Y., & Jamnik, M. (2022). Thor: Wielding hammers to integrate language models and automated theorem provers. 36th Conference on Neural Information Processing Systems (NeurIPS), 35, 8360–8373. https://doi.org/10.48550/arXiv.2205.10893 [GS Search]
Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., et al. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274 [GS Search]
Lample, G., Lacroix, T., Lachaux, M.-A., Rodriguez, A., Hayat, A., Lavril, T., Ebner, G., & Martinet, X. (2022). Hypertree proof search for neural theorem proving. Advances in Neural Information Processing Systems (NeurIPS), 35, 26337–26349. https://doi.org/10.48550/arXiv.2205.11491 [GS Search]
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2020, julho). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Em D. Jurafsky, J. Chai, N. Schluter & J. Tetreault (Ed.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 7871–7880). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.703 [GS Search]
Liu, H., Ning, R., Teng, Z., Liu, J., Zhou, Q., & Zhang, Y. (2023). Evaluating the logical reasoning ability of chatgpt and gpt-4. arXiv preprint arXiv:2304.03439. https://doi.org/10.48550/arXiv.2304.03439 [GS Search]
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. https://doi.org/10.48550/arXiv.1907.11692 [GS Search]
Martins, F., Oliveira, A., Vasconcelos, D., & Menezes, M. (2023). Avaliando a habilidade do ChatGPT de realizar provas de Dedução Natural em Lógica Proposicional. Anais do XXXIV Simpósio Brasileiro de Informática na Educação, 1282–1292. https://doi.org/10.5753/sbie.2023.234658 [GS Search]
Nipkow, T., Wenzel, M., & Paulson, L. C. (2002). Isabelle/HOL: a proof assistant for higher-order logic. Springer. https://doi.org/10.1007/3-540-45949-9_5 [GS Search]
OpenAI. (2021). ChatGPT [Acesso em: 13 de junho de 2023. Disponível em [link].
Pelletier, F. J. (1999). A brief history of natural deduction. History and Philosophy of Logic, 20(1), 1–31. https://doi.org/10.1080/014453499298165 [GS Search]
Pelletier, F. J. (2000). A history of natural deduction and elementary logic textbooks. Logical consequence: Rival approaches, 1, 105–138. [link] [GS Search]
Polu, S., Han, J. M., Zheng, K., Baksys, M., Babuschkin, I., & Sutskever, I. (2022). Formal mathematics statement curriculum learning. arXiv preprint arXiv:2202.01344. https://doi.org/10.48550/arXiv.2202.01344 [GS Search]
Polu, S., & Sutskever, I. (2020). Generative language modeling for automated theorem proving. arXiv preprint arXiv:2009.03393. https://doi.org/10.48550/arXiv.2009.03393 [GS Search]
Russell, S. J. (2010). Artificial intelligence: a modern approach. Pearson Education, Inc. [GS Search]
Saparov, A., Pang, R. Y., Padmakumar, V., Joshi, N., Kazemi, M., Kim, N., & He, H. (2023). Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples. Em A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt & S. Levine (Ed.), Advances in Neural Information Processing Systems (NeurIPS), 36, pp. 3083–3105. Curran Associates, Inc. [link] [GS Search]
Schick, T., Dwivedi-Yu, J., Dessı, R., Raileanu, R., Lomeli, M., Hambro, E., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023). Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems (NeurIPS), 36. https://doi.org/10.48550/arXiv.2302.04761 [GS Search]
Shikishima, C., Hiraishi, K., Yamagata, S., Sugimoto, Y., Takemura, R., Ozaki, K., Okada, M., Toda, T., & Ando, J. (2009). Is g an entity? A Japanese twin study using syllogisms and intelligence tests. Intelligence, 37(3), 256–267. https://doi.org/10.1016/j.intell.2008.10.010 [GS Search]
Tlili, A., Shehata, B., Adarkwah, M. A., Bozkurt, A., Hickey, D. T., Huang, R., & Agyemang, B. (2023). What if the devil is my guardian angel: ChatGPT as a case study of using chatbots in education. Smart Learning Environments, 10(1), 15. https://doi.org/10.1186/s40561-023-00237-x [GS Search]
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and Efficient Foundation Language Models. https://doi.org/10.48550/arXiv.2302.13971 [GS Search]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems (NIPS), 30. https://doi.org/10.48550/arXiv.1706.03762 [GS Search]
Wang, H., Yuan, Y., Liu, Z., Shen, J., Yin, Y., Xiong, J., Xie, E., Shi, H., Li, Y., Li, L., et al. (2023). DT-solver: Automated theorem proving with dynamic-tree sampling guided by proof-level value function. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 12632–12646. https://doi.org/10.18653/v1/2023.acl-long.706 [GS Search]
Weber, F., Wambsganss, T., Rüttimann, D., & Söllner, M. (2021). Pedagogical agents for interactive learning: A taxonomy of conversational agents in education. Forty-Second International Conference on Information Systems. Austin, Texas, 1–17. [link] [GS Search]
Yang, K., & Deng, J. (2019). Learning to prove theorems via interacting with proof assistants. International Conference on Machine Learning, 97, 6984–6994. [link] [GS Search]
Yang, K., Swope, A., Gu, A., Chalamala, R., Song, P., Yu, S., Godil, S., Prenger, R. J., & Anandkumar, A. (2023). LeanDojo: Theorem proving with retrieval-augmented language models. Advances in Neural Information Processing Systems (NeurIPS), 36. https://doi.org/10.48550/arXiv.2306.15626 [GS Search]
Arquivos adicionais
Published
Como Citar
Issue
Section
Licença
Copyright (c) 2025 Francisco Leonardo Batista Martins, Augusto César Araújo Oliveira, Davi Romero Vasconcelos, Maria Viviane de Menezes

Este trabalho está licenciado sob uma licença Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

