Evaluando la capacidad de ChatGPT para realizar pruebas de Deducción Natural en Lógica Proposicional y Lógica de Predicados

Authors

DOI:

https://doi.org/10.5753/rbie.2025.4500

Keywords:

Deducción natural, Lógica proposicional, Lógica de predicados, Razonamiento lógico, Modelos de lenguaje a gran escala, ChatGPT

Abstract

Con la reciente popularización de los Grandes Modelos de Lenguaje, como ChatGPT, se han llevado a cabo numerosos estudios con el objetivo de explorar el uso de estas herramientas para apoyar el proceso de enseñanza y aprendizaje. Estos sistemas tienen la capacidad de comprender y procesar enormes cantidades de datos, lo que les permite, por ejemplo, proporcionar soporte individualizado a los estudiantes en la resolución de ejercicios. Sin embargo, es fundamental considerar que estos sistemas generalmente poseen un proceso de razonamiento basado en métodos estadísticos, como algoritmos de aprendizaje automático, que pueden generar alucinaciones y dar lugar a respuestas incorrectas en tareas que, por ejemplo, requieren razonamiento lógico. Este artículo tiene como objetivo evaluar la capacidad de ChatGPT para resolver ejercicios de Deducción Natural en Lógica Proposicional y Lógica de Predicados. Para ello, se llevaron a cabo experimentos utilizando una base de datos de ejercicios de deducción natural, y los resultados confirman que el proceso de razonamiento de la herramienta aún no es adecuado para resolver este tipo de ejercicios, lo que hace que, por ahora, esta herramienta sea inadecuada para dicha tarea.

Descargas

Los datos de descargas todavía no están disponibles.

Citas

Ando, R., Morishita, T., Abe, H., Mineshima, K., & Okada, M. (2023, junho). Evaluating Large Language Models with NeuBAROCO: Syllogistic Reasoning Ability and Human-like Biases Em S. Chatzikyriakidis & V. de Paiva (Ed.), Proceedings of the 4th Natural Logic Meets Machine Learning Workshop (pp. 1–11). Association for Computational Linguistics. [link] [GS Search]

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems (NeurIPS), 33, 1877–1901. [link] [[GS Search].

Carl, M. (2023). Using large language models for (de-) formalization and natural argumentation exercises for beginner’s students. arXiv preprint arXiv:2304.06186. https://doi.org/10.48550/arXiv.2304.06186 [GS Search]

Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., Fiedel, N. (2023). PaLM: Scaling Language Modeling with Pathways. Journal of Machine Learning Research, 24(240), 1–113. [link] [GS Search]

Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., Brahma, S., Webson, A., Gu, S. S., Dai, Z., Suzgun, M., Chen, X., Chowdhery, A., Castro-Ros, A., Pellat, M., Robinson, K., Wei, J. (2024). Scaling Instruction-Finetuned Language Models. Journal of Machine Learning Research, 25(70), 1–53. [link] [GS Search]

Coq, P. (1996). The Coq Proof Assistant-Reference Manual. INRIA Rocquencourt and ENS Lyon, version, 5. [GS Search]

De Moura, L., Kong, S., Avigad, J., Van Doorn, F., & von Raumer, J. (2015). The Lean theorem prover (system description). Automated Deduction-CADE-25: 25th International Conference on Automated Deduction, Berlin, Germany, August 1-7, 2015, Proceedings 25, 378–388. https://doi.org/10.1007/978-3-319-21401-6_26 [GS Search]

Enderton, H. B. (1972). A mathematical introduction to logic. Academic Press. https://doi.org/10.2307/2272104 [GS Search]

Ferreirós, J. (2001). The road to modern logic—an interpretation. Bulletin of Symbolic Logic, 7(4), 441–484. https://doi.org/10.2307/268779 [GS Search]

First, E., Rabe, M., Ringer, T., & Brun, Y. (2023). Baldur: Whole-proof generation and repair with large language models. Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 1229–1241. https://doi.org/10.1145/3611643.3616243 [GS Search]

Hammer, E. M. (1998). Semantics for existential graphs. Journal of Philosophical Logic, 27, 489–503. https://doi.org/10.1023/A:1017908108789 [GS Search]

Han, J. M., Rute, J., Wu, Y., Ayers, E. W., & Polu, S. (2022). Proof Artifact Co-training for Theorem Proving with Language Models. International Conference on Learning Representations. https://doi.org/10.48550/arXiv.2102.06203 [GS Search]

Huth, M., & Ryan, M. (2004). Logic in Computer Science: Modelling and Reasoning about Systems (2nd Ed.). Cambridge University Press. https://doi.org/10.1017/CBO9780511810275 [GS Search]

Jiang, A. Q., Li, W., Han, J. M., & Wu, Y. (2021). Lisa: Language models of ISAbelle proofs. 6th Conference on Artificial Intelligence and Theorem Proving (AITP), 378–392. [link] [GS Search]

Jiang, A. Q., Li, W., Tworkowski, S., Czechowski, K., Odrzygóźdź, T., Miłoś, P., Wu, Y., & Jamnik, M. (2022). Thor: Wielding hammers to integrate language models and automated theorem provers. 36th Conference on Neural Information Processing Systems (NeurIPS), 35, 8360–8373. https://doi.org/10.48550/arXiv.2205.10893 [GS Search]

Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., et al. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274 [GS Search]

Lample, G., Lacroix, T., Lachaux, M.-A., Rodriguez, A., Hayat, A., Lavril, T., Ebner, G., & Martinet, X. (2022). Hypertree proof search for neural theorem proving. Advances in Neural Information Processing Systems (NeurIPS), 35, 26337–26349. https://doi.org/10.48550/arXiv.2205.11491 [GS Search]

Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2020, julho). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Em D. Jurafsky, J. Chai, N. Schluter & J. Tetreault (Ed.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 7871–7880). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.703 [GS Search]

Liu, H., Ning, R., Teng, Z., Liu, J., Zhou, Q., & Zhang, Y. (2023). Evaluating the logical reasoning ability of chatgpt and gpt-4. arXiv preprint arXiv:2304.03439. https://doi.org/10.48550/arXiv.2304.03439 [GS Search]

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. https://doi.org/10.48550/arXiv.1907.11692 [GS Search]

Martins, F., Oliveira, A., Vasconcelos, D., & Menezes, M. (2023). Avaliando a habilidade do ChatGPT de realizar provas de Dedução Natural em Lógica Proposicional. Anais do XXXIV Simpósio Brasileiro de Informática na Educação, 1282–1292. https://doi.org/10.5753/sbie.2023.234658 [GS Search]

Nipkow, T., Wenzel, M., & Paulson, L. C. (2002). Isabelle/HOL: a proof assistant for higher-order logic. Springer. https://doi.org/10.1007/3-540-45949-9_5 [GS Search]

OpenAI. (2021). ChatGPT [Acesso em: 13 de junho de 2023. Disponível em [link].

Pelletier, F. J. (1999). A brief history of natural deduction. History and Philosophy of Logic, 20(1), 1–31. https://doi.org/10.1080/014453499298165 [GS Search]

Pelletier, F. J. (2000). A history of natural deduction and elementary logic textbooks. Logical consequence: Rival approaches, 1, 105–138. [link] [GS Search]

Polu, S., Han, J. M., Zheng, K., Baksys, M., Babuschkin, I., & Sutskever, I. (2022). Formal mathematics statement curriculum learning. arXiv preprint arXiv:2202.01344. https://doi.org/10.48550/arXiv.2202.01344 [GS Search]

Polu, S., & Sutskever, I. (2020). Generative language modeling for automated theorem proving. arXiv preprint arXiv:2009.03393. https://doi.org/10.48550/arXiv.2009.03393 [GS Search]

Russell, S. J. (2010). Artificial intelligence: a modern approach. Pearson Education, Inc. [GS Search]

Saparov, A., Pang, R. Y., Padmakumar, V., Joshi, N., Kazemi, M., Kim, N., & He, H. (2023). Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples. Em A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt & S. Levine (Ed.), Advances in Neural Information Processing Systems (NeurIPS), 36, pp. 3083–3105. Curran Associates, Inc. [link] [GS Search]

Schick, T., Dwivedi-Yu, J., Dessı, R., Raileanu, R., Lomeli, M., Hambro, E., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023). Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems (NeurIPS), 36. https://doi.org/10.48550/arXiv.2302.04761 [GS Search]

Shikishima, C., Hiraishi, K., Yamagata, S., Sugimoto, Y., Takemura, R., Ozaki, K., Okada, M., Toda, T., & Ando, J. (2009). Is g an entity? A Japanese twin study using syllogisms and intelligence tests. Intelligence, 37(3), 256–267. https://doi.org/10.1016/j.intell.2008.10.010 [GS Search]

Tlili, A., Shehata, B., Adarkwah, M. A., Bozkurt, A., Hickey, D. T., Huang, R., & Agyemang, B. (2023). What if the devil is my guardian angel: ChatGPT as a case study of using chatbots in education. Smart Learning Environments, 10(1), 15. https://doi.org/10.1186/s40561-023-00237-x [GS Search]

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and Efficient Foundation Language Models. https://doi.org/10.48550/arXiv.2302.13971 [GS Search]

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems (NIPS), 30. https://doi.org/10.48550/arXiv.1706.03762 [GS Search]

Wang, H., Yuan, Y., Liu, Z., Shen, J., Yin, Y., Xiong, J., Xie, E., Shi, H., Li, Y., Li, L., et al. (2023). DT-solver: Automated theorem proving with dynamic-tree sampling guided by proof-level value function. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 12632–12646. https://doi.org/10.18653/v1/2023.acl-long.706 [GS Search]

Weber, F., Wambsganss, T., Rüttimann, D., & Söllner, M. (2021). Pedagogical agents for interactive learning: A taxonomy of conversational agents in education. Forty-Second International Conference on Information Systems. Austin, Texas, 1–17. [link] [GS Search]

Yang, K., & Deng, J. (2019). Learning to prove theorems via interacting with proof assistants. International Conference on Machine Learning, 97, 6984–6994. [link] [GS Search]

Yang, K., Swope, A., Gu, A., Chalamala, R., Song, P., Yu, S., Godil, S., Prenger, R. J., & Anandkumar, A. (2023). LeanDojo: Theorem proving with retrieval-augmented language models. Advances in Neural Information Processing Systems (NeurIPS), 36. https://doi.org/10.48550/arXiv.2306.15626 [GS Search]

Archivos adicionales

Published

2025-05-02

Cómo citar

MARTINS, F. L. B.; OLIVEIRA, A. C. A. de; VASCONCELOS, D. R. de; MENEZES, M. V. de. Evaluando la capacidad de ChatGPT para realizar pruebas de Deducción Natural en Lógica Proposicional y Lógica de Predicados. Revista Brasileña de Informática en la Educación, [S. l.], v. 33, p. 244–278, 2025. DOI: 10.5753/rbie.2025.4500. Disponível em: https://journals-sol.sbc.org.br/index.php/rbie/article/view/4500. Acesso em: 30 ene. 2026.

Issue

Section

Artículos Premiados :: CBIE