Evaluating LLMs on Argument Mining Tasks in Brazilian Portuguese Debate Data
DOI:
https://doi.org/10.5753/jbcs.2025.5824Keywords:
Argument Mining, Debate, LLMAbstract
This study investigates Argument Mining (AM) in Brazilian Portuguese data, focusing on audio transcriptions of semi-structured debates. It proposes an experimental setup to evaluate the effectiveness of Large Language Models (LLMs) in AM tasks. The research addresses key challenges in the field, such as the lack of universally accepted definitions, the absence of a cohesive theoretical framework for dataset standardization, the limited availability of annotated datasets, and underexplored evaluation methods for Artificial Intelligence (AI) models, particularly LLMs. Aiming to bridge these gaps, especially in the underrepresented Brazilian Portuguese context, the study employs multiple prompt engineering strategies, including Single-Prompt, 2-Prompts, and 4-Prompts. The 4-Prompts approach, which integrates few-shot and chain-of-thought (CoT) prompting, demonstrated the best overall performance. The evaluated LLMs include ChatGPT-3.5 Turbo, ChatGPT-4, Gemini, LLaMA 70B, and Sabiá 3. Results show that while LLMs can achieve up to 74% F1 score in basic argument detection, their performance significantly drops in more complex AM tasks that require nuanced interpretation, with a maximum F1 score of 43%. Comparisons with Portuguese-specialized models such as Sabiá 3 revealed similar or inferior performance compared to multilingual models. Surprisingly, LLaMA 70B emerged as the best-performing model across most AM tasks. These findings underscore the need for continued development of AM methodologies and highlight the importance of expanding Natural Language Processing (NLP) research to languages beyond English.
Downloads
References
Abdullah, A. Z., Michael, G., and Jelena, M. (2023). Performance analysis of large language models in the domain of legal argument mining. Frontiers in Artificial Intelligence, 6. DOI: 10.3389/frai.2023.1278796.
Accuosto, P., Neves, M. L., and Saggion, H. (2021). Argumentation mining in scientific literature: From computational linguistics to biomedicine. In BIR@ECIR. Available at:[link].
Al Khatib, K., Ghosal, T., Hou, Y., de Waard, A., and Freitag, D. (2021). Argument mining for scholarly document processing: Taking stock and looking ahead. In Proceedings of the Second Workshop on Scholarly Document Processing, pages 56-65, Online. Association for Computational Linguistics. DOI: 10.18653/v1/2021.sdp-1.7.
Alivanistos, D., Santamaría, S. B., Cochez, M., Kalo, J. C., van Krieken, E., and Thanapalasingam, T. (2022). Prompting as probing: Using language models for knowledge base construction. In Singhania, S., Nguyen, T.-P., and Razniewski, S., editors, LM-KBC 2022 Knowledge Base Construction from Pre-trained Language Models 2022, volume 3274 of CEUR Workshop Proceedings, pages 11-34. CEUR-WS.org. DOI: 10.48550/arxiv.2208.11057.
Bentahar, J., Moulin, B., and Bélanger, M. (2010). A taxonomy of argumentation models used for knowledge representation. Artificial Intelligence Review, 33(3):211–259. DOI: 10.1007/s10462-010-9154-1.
Bhatti, M. M. A., Ahmad, A. S., and Park, J. (2021). Argument mining on twitter: A case study on the planned parenthood debate. In Al-Khatib, K., Hou, Y., and Stede, M., editors, Proceedings of the 8th Workshop on Argument Mining, pages 1-11, Punta Cana, Dominican Republic. Association for Computational Linguistics. DOI: 10.18653/v1/2021.argmining-1.1.
Boltužić, F. and Šnajder, J. (2015). Identifying prominent arguments in online debates using semantic textual similarity. In Cardie, C., editor, Proceedings of the 2nd Workshop on Argumentation Mining, pages 110-115, Denver, CO. Association for Computational Linguistics. DOI: 10.3115/v1/W15-0514.
Boltužić, F. and Šnajder, J. (2016). Fill the gap! analyzing implicit premises between claims from online debates. In Reed, C., editor, Proceedings of the Third Workshop on Argument Mining (ArgMining2016), pages 124-133, Berlin, Germany. Association for Computational Linguistics. DOI: 10.18653/v1/W16-2815.
Cabrio, E. and Villata, S. (2018). Five years of argument mining: a data-driven analysis. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pages 5427-5433. International Joint Conferences on Artificial Intelligence Organization. DOI: 10.24963/ijcai.2018/766.
Carstens, L., Toni, F., and Evripidou, V. (2014). Argument mining and social debates. In Comma. DOI: 10.3233/978-1-61499-436-7-451.
Chakrabarty, T., Hidey, C., Muresan, S., McKeown, K., and Hwang, A. (2019). Ampersand: Argument mining for persuasive online discussions. In Inui, K., Jiang, J., Ng, V., and Wan, X., editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2933-2943, Hong Kong, China. Association for Computational Linguistics. DOI: 10.18653/v1/D19-1291.
Chen, G., Cheng, L., Luu, A. T., and Bing, L. (2024). Exploring the potential of large language models in computational argumentation. In Ku, L.-W., Martins, A., and Srikumar, V., editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2309-2330, Bangkok, Thailand. Association for Computational Linguistics. DOI: 10.18653/v1/2024.acl-long.126.
Chen, W.-F., Chen, M.-H., Mudgal, G., and Wachsmuth, H. (2022). Analyzing culture-specific argument structures in learner essays. In Lapesa, G., Schneider, J., Jo, Y., and Saha, S., editors, Proceedings of the 9th Workshop on Argument Mining, pages 51-61, Online and in Gyeongju, Republic of Korea. International Conference on Computational Linguistics. Available at:[link].
Duthie, R., Budzynska, K., and Reed, C. (2016). Mining Ethos in Political Debate, volume 287 of Frontiers in Artificial Intelligence and Applications, pages 299-310. IOS Press, Netherlands. This research was supported in part by EPSRC in the UK under grant EP/M506497/1 and in part by the Polish National Science Centre under grant 2015/18/M/HS1/00620.. DOI: 10.3233/978-1-61499-686-6-299.
Habernal, I. and Gurevych, I. (2015). Exploiting debate portals for semi-supervised argumentation mining in user-generated web discourse. In Màrquez, L., Callison-Burch, C., and Su, J., editors, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 2127-2137, Lisbon, Portugal. Association for Computational Linguistics. DOI: 10.18653/v1/D15-1255.
Habernal, I. and Gurevych, I. (2016). Which argument is more convincing? analyzing and predicting convincingness of web arguments using bidirectional lstm. In Erk, K. and Smith, N. A., editors, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1589-1599, Berlin, Germany. Association for Computational Linguistics. DOI: 10.18653/v1/P16-1150.
Habernal, I. and Gurevych, I. (2017). Argumentation mining in user-generated web discourse. Computational Linguistics, 43(1):125-179. DOI: 10.1162/COLI_a_00276.
Haddadan, S., Cabrio, E., and Villata, S. (2019). Yes, we can! mining arguments in 50 years of us presidential campaign debates. In Korhonen, A., Traum, D., and Màrquez, L., editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4684-4690, Florence, Italy. Association for Computational Linguistics. DOI: 10.18653/v1/P19-1463.
Hautli-Janisz, A., Kikteva, Z., Siskou, W., Gorska, K., Becker, R., and Reed, C. (2022). Qt30: A corpus of argument and conflict in broadcast debate. In Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Odijk, J., and Piperidis, S., editors, Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3291-3300, Marseille, France. European Language Resources Association. Available at:[link].
Kotelnikov, E., Loukachevitch, N., Nikishina, I., and Panchenko, A. (2022). Ruarg-2022: Argument mining evaluation. In Computational Linguistics and Intellectual Technologies. RSUH. DOI: 10.28995/2075-7182-2022-21-333-348.
Lavee, T., Orbach, M., Kotlerman, L., Kantor, Y., Gretz, S., Dankin, L., Jacovi, M., Bilu, Y., Aharonov, R., and Slonim, N. (2019). Towards effective rebuttal: Listening comprehension using corpus-wide claim mining. In Stein, B. and Wachsmuth, H., editors, Proceedings of the 6th Workshop on Argument Mining, pages 58-66, Florence, Italy. Association for Computational Linguistics. DOI: 10.18653/v1/W19-4507.
Lawrence, J. and Reed, C. (2020). Argument mining: A survey. Computational Linguistics, 45(4):765-818. DOI: 10.1162/coli_a_00364.
Lima, P. L. and Campelo, C. E. (2024). Disfluency detection and removal in speech transcriptions via large language models. In Anais do XV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 227-235, Porto Alegre, RS, Brasil. SBC. DOI: 10.5753/stil.2024.245417.
Lippi, M. and Torroni, P. (2016). Argument mining from speech: Detecting claims in political debates. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). DOI: 10.1609/aaai.v30i1.10384.
Mancini, E., Ruggeri, F., Galassi, A., and Torroni, P. (2022). Multimodal argument mining: A case study in political debates. In Lapesa, G., Schneider, J., Jo, Y., and Saha, S., editors, Proceedings of the 9th Workshop on Argument Mining, pages 158-170, Online and in Gyeongju, Republic of Korea. International Conference on Computational Linguistics. Available at:[link].
Marvin, G., Hellen, N., Jjingo, D., and Nakatumba-Nabende, J. (2024). Prompt engineering in large language models. In Jacob, I. J., Piramuthu, S., and Falkowski-Gilski, P., editors, Data Intelligence and Cognitive Informatics, pages 387-402, Singapore. Springer Nature Singapore. DOI: 10.1007/978-981-99-7962-2_30.
Mestre, R., Middleton, S. E., Ryan, M., Gheasi, M., Norman, T., and Zhu, J. (2023). Augmenting pre-trained language models with audio feature embedding for argumentation mining in political debates. In Vlachos, A. and Augenstein, I., editors, Findings of the Association for Computational Linguistics: EACL 2023, pages 274-288, Dubrovnik, Croatia. Association for Computational Linguistics. DOI: 10.18653/v1/2023.findings-eacl.21.
Mestre, R., Milicin, R., Middleton, S. E., Ryan, M., Zhu, J., and Norman, T. J. (2021). M-arg: Multimodal argument mining dataset for political debates with audio and transcripts. In Al-Khatib, K., Hou, Y., and Stede, M., editors, Proceedings of the 8th Workshop on Argument Mining, pages 78-88, Punta Cana, Dominican Republic. Association for Computational Linguistics. DOI: 10.18653/v1/2021.argmining-1.8.
Mirkin, S., Jacovi, M., Lavee, T., Kuo, H.-K., Thomas, S., Sager, L., Kotlerman, L., Venezian, E., and Slonim, N. (2018). A recorded debating dataset. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA). DOI: 10.48550/arxiv.1709.06438.
Mirzakhmedova, N., Gohsen, M., Chang, C. H., and Stein, B. (2024). Are large language models reliable argument quality annotators? In Cimiano, P., Frank, A., Kohlhase, M., and Stein, B., editors, Robust Argumentation Machines, pages 129-146, Cham. Springer Nature Switzerland. DOI: 10.1007/978-3-031-63536-6_8.
Nguyen, H. (2018). Context-aware Argument Mining and Its Applications in Education. PhD thesis. Available at:[link].
Peldszus, A. (2014). Towards segment-based recognition of argumentation structure in short texts. In Green, N., Ashley, K., Litman, D., Reed, C., and Walker, V., editors, Proceedings of the First Workshop on Argumentation Mining, pages 88-97, Baltimore, Maryland. Association for Computational Linguistics. DOI: 10.3115/v1/W14-2112.
Pojoni, M.-L., Dumani, L., and Schenkel, R. (2023). Argument-mining from podcasts using chatgpt. In Malburg, L. and Verma, D., editors, Proceedings of the Workshops at the 31st International Conference on Case-Based Reasoning (ICCBR-WS 2023), volume 3438 of CEUR Workshop Proceedings, pages 129-144, Aberdeen, Scotland. CEUR.Available at:[link].
Rajasekharan, A., Zeng, Y., and Gupta, G. (2023). argument analysis using answer set programming and semantics-guided large language models. In ICLP'23 Workshop on Goal-directed Execution of Answer Set Programs. Available at:[link].
Reed, C. and Norman, T., editors (2003). Argumentation Machines: New Frontiers in Argument and Computation. Argumentation Library. Kluwer Academic Publishers, Netherlands. Book.
Rocha, G., Cardoso, H. L., Belouadi, J., and Eger, S. (2023). Cross-genre argument mining: Can language models automatically fill in missing discourse markers? Argument & Computation, vol. Pre-press, no. Pre-press, pp. 1-41, 2024. DOI: 10.3233/aac-230008.
Sazid, M. T. and Mercer, R. E. (2022). A unified representation and a decoupled deep learning architecture for argumentation mining of students' persuasive essays. In Lapesa, G., Schneider, J., Jo, Y., and Saha, S., editors, Proceedings of the 9th Workshop on Argument Mining, pages 74-83, Online and in Gyeongju, Republic of Korea. International Conference on Computational Linguistics. Available at:[link].
Schneider, J. (2014). Automated argumentation mining to the rescue? envisioning argumentation and decision-making support for debates in open online collaboration communities. In Green, N., Ashley, K., Litman, D., Reed, C., and Walker, V., editors, Proceedings of the First Workshop on Argumentation Mining, pages 59-63, Baltimore, Maryland. Association for Computational Linguistics. DOI: 10.3115/v1/W14-2108.
Sousa, J. P., Nascimento, R., Araujo, R., and Coelho, O. (2021). Não se perca no debate! mineração de argumentação em redes sociais. In Anais do X Brazilian Workshop on Social Network Analysis and Mining, pages 139-150, Porto Alegre, RS, Brasil. SBC. DOI: 10.5753/brasnam.2021.16132.
Soyusiawaty, D. and Rahmawanto, F. (2018). Similarity detector on the student assignment document using levenshtein distance method. In 2018 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), pages 656-661. DOI: 10.1109/ISRITI.2018.8864339.
Stylianou, N. and Vlahavas, I. (2021). Transformed: End-to-Εnd transformers for evidence-based medicine and argument mining in medical literature. Journal of Biomedical Informatics, 117:103767. DOI: 10.1016/j.jbi.2021.103767.
Van der Meer, M., Liscio, E., Jonker, C., Plaat, A., Vossen, P., and Murukannaiah, P. (2024). A hybrid intelligence method for argument mining. Journal of Artificial Intelligence Research, 80:1187–1222. DOI: 10.1613/jair.1.15135.
van der Meer, M., Reuver, M., Khurana, U., Krause, L., and Baez Santamaria, S. (2022). Will it blend? mixing training paradigms & prompting for argument quality prediction. In Lapesa, G., Schneider, J., Jo, Y., and Saha, S., editors, Proceedings of the 9th Workshop on Argument Mining, pages 95-103, Online and in Gyeongju, Republic of Korea. International Conference on Computational Linguistics. Available at:[link].
van Eemeren, F. H., Garssen, B., Krabbe, E. C. W., Snoeck Henkemans, A. F., Verheij, B., and Wagemans, J. H. M. (2014). Handbook of argumentation theory. DOI: 10.1007/978-90-481-9473-5.
Visser, J., Lawrence, J., Wagemans, J., and Reed, C. (2019). An annotated corpus of argument schemes in us election debates. In Proceedings of the 9th Conference of the International Society for the Study of Argumentation (ISSA), 3-6 July 2018, pages 1101-1111. Available at:[link].
Walton, D., Reed, C., and Macagno, F. (2008). Argumentation Schemes. Cambridge University Press. DOI: 10.1017/cbo9780511802034.
Westermann, H., Savelka, J., Walker, V. R., Ashley, K. D., and Benyekhlef, K. (2022). Toward an intelligent tutoring system for argument mining in legal texts. IOS Press. Available at:[link].
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 David Eduardo Pereira, Daniela Thuaslar Simão Gomes, Claudio E. C. Campelo

This work is licensed under a Creative Commons Attribution 4.0 International License.

