Evaluating Large Language Models for Brazilian Portuguese Sentiment Analysis: A Comparative Study of Multilingual State-of-the-Art vs. Brazilian Portuguese Fine-Tuned LLMs
DOI:
https://doi.org/10.5753/jbcs.2025.5793Keywords:
Large Language Models, Sentiment Analysis, Brazilian Portuguese, In-context Learning, Comparative Evaluation, Natural Language Processing, Model Fine-tuningAbstract
This study presents an extensive comparative analysis of Large Language Models (LLMs) for sentiment analysis in Brazilian Portuguese texts. We evaluated 23 LLMs—comprising 13 state-of-the-art multilingual models and 10 models specifically fine-tuned for Portuguese—across 12 public annotated datasets from diverse domains, employing the in-context learning paradigm. Our findings demonstrate that large-scale models such as Claude-$3.5$-Sonnet, GPT-4o, DeepSeek-V3, and Sabiá-3 delivered superior results with accuracies exceeding 92%, while smaller models (7-13B parameters) also showed compelling performance with top performers achieving accuracies above 90%. Notably, linguistic specialization through fine-tuning demonstrated mixed results—significantly reducing hallucination rates for some models but not consistently yielding performance improvements across all model types. We also observed that newer model generations frequently outperformed their predecessors, and in the one dataset where traditional machine learning methods were employed by the original authors for sentiment classification, all evaluated LLMs substantially surpassed these traditional approaches. Moreover, smaller-scale models exhibited a tendency toward overgeneration despite explicit instructions. These findings contribute valuable insights to the discourse on language-specific model optimization and establish empirical benchmarks for both multilingual and Portuguese-specialized LLMs in sentiment analysis tasks.
Downloads
References
Abonizio, H., Almeida, T. S., Laitz, T., et al. (2024). Sabiá-3 technical report. DOI: 10.48550/arXiv.2410.12049.
Ainslie, J., Lee-Thorp, J., de Jong, M., et al. (2023). Gqa: Training generalized multi-query transformer models from multi-head checkpoints. DOI: 10.48550/arXiv.2305.13245.
Anthropic (2023). Introducing claude. Available online [link].
Anthropic (2024a). Claude 3 model card. Available online [link].
Anthropic (2024b). Claude 3.5 sonnet. Available online [link].
Anthropic (2024c). Introducing the next generation of claude. Available online [link].
Araujo, M., Reis, J., Pereira, A., et al. (2016). An evaluation of machine translation for multilingual sentence-level sentiment analysis. In Proceedings of the 31st Annual ACM Symposium on Applied Computing, page 1140–1145. Association for Computing Machinery. DOI: 10.1145/2851613.2851817.
Atil, B., Aykent, S., Chittams, A., Fu, L., Passonneau, R. J., Radcliffe, E., Rajagopal, G. R., Sloan, A., Tudrej, T., Ture, F., Wu, Z., Xu, L., and Baldwin, B. (2025). Non-determinism of "deterministic" llm settings. DOI: 10.48550/arXiv.2408.04667.
Bai, J., Bai, S., Chu, Y., et al. (2023). Qwen technical report. arXiv.org. DOI: 10.48550/arXiv.2309.16609.
Belisário, L., Luiz G., F., and Thiago A. S., P. (2019). Classificação de subjetividade para o português: Métodos baseados em aprendizado de máquina e em léxico. In 27º Simpósio Internacional de Iniciação Científica e Tecnológica da USP (SIICUSP), pages 1-1. Available online [link].
Beltagy, I., Peters, M. E., and Cohan, A. (2020). Longformer: The long-document transformer. DOI: 10.48550/arXiv.2004.05150.
Bender, E. M., Gebru, T., McMillan-Major, A., and Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT '21, page 610–623, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3442188.3445922.
BotBot AI (2024a). Botbot. Available online [link].
BotBot AI (2024b). botbot-ai/. urlhttps://huggingface.co/botbot-ai. Available online [link].
BotBot AI (2024c). botbot-ai/cabrallama3-8b · hugging face. Available online [link].
Brown, T. B., Mann, B., Ryder, N., et al. (2020). Language models are few-shot learners. arxiv.org, 33:1877-1901. DOI: 10.48550/arXiv.2005.14165.
Brum, H. and das Graccas Volpe Nunes, M. (2018). Building a Sentiment Corpus of Tweets in Brazilian Portuguese. In chair), N. C. C., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Hasida, K., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S., and Tokunaga, T., editors, Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA). DOI: 10.48550/arxiv.1712.08917.
Buscemi, A. and Proverbio, D. (2024). Chatgpt vs gemini vs llama on multilingual sentiment analysis. arXiv.org. DOI: 10.48550/arXiv.2402.01715.
Cai, Z., Cao, M., Chen, H., et al. (2024). Internlm2 technical report. DOI: 10.48550/arXiv.2403.17297.
Chen, B., Zhang, Z., Langrené, N., et al. (2023). Unleashing the potential of prompt engineering in large language models: a comprehensive review. arXiv.org. DOI: 10.48550/arXiv.2310.14735.
Chowdhery, A., Narang, S., Devlin, J., et al. (2022). Palm: Scaling language modeling with pathways. arXiv (Cornell University). DOI: 10.48550/arxiv.2204.02311.
Cui, Y., Yang, Z., and Yao, X. (2024). Efficient and effective text encoding for chinese llama and alpaca. DOI: 10.48550/arXiv.2304.08177.
Cui, Y. and Yao, X. (2024). Rethinking llm language adaptation: A case study on chinese mixtral. DOI: 10.48550/arXiv.2403.01851.
de Araujo, G., de Melo, T., and Figueiredo, C. M. S. (2024). Is chatgpt an effective solver of sentiment analysis tasks in portuguese? a preliminary study. In Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1. Available online [link].
DeepSeek-AI, :, Bi, X., et al. (2024a). Deepseek llm: Scaling open-source language models with longtermism. DOI: 10.48550/arxiv.2401.02954.
DeepSeek-AI, Guo, D., Yang, D., et al. (2025a). Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. DOI: 10.48550/arXiv.2501.12948.
DeepSeek-AI, Liu, A., Feng, B., et al. (2024b). Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model. DOI: 10.48550/arXiv.2405.04434.
DeepSeek-AI, Liu, A., Feng, B., et al. (2025b). Deepseek-v3 technical report. DOI: https://doi.org/10.48550/arxiv.2412.19437.
Dettmers, T., Pagnoni, A., Holtzman, A., et al. (2023). Qlora: Efficient finetuning of quantized llms.
Devlin, J., Chang, M.-W., Lee, K., et al. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. In arXiv.org. DOI: https://doi.org/10.48550/arxiv.1810.04805.
Ding, X., Chen, L., Emani, M., et al. (2023). Hpc-gpt: Integrating large language model for high-performance computing. In Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, SC-W '23, page 951–960, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3624062.3624172.
Dong, Q., Li, L., Dai, D., et al. (2023). A survey on in-context learning. In arXiv.org. DOI: 10.48550/arXiv.2301.00234.
Dong, Y., Jiang, X., Liu, H., et al. (2024). Generalization or memorization: Data contamination and trustworthy evaluation for large language models. DOI: 10.48550/arXiv.2402.15938.
dos Santos Silva, L. N., Real, L., Zandavalle, A. C. B., et al. (2024). Repro: a benchmark for opinion mining for brazilian portuguese. In Gamallo, P., Claro, D., Teixeira, A., Real, L., Garcia, M., Oliveira, H. G., and Amaro, R., editors, ACLWeb, page 432–440. Association for Computational Lingustics. Available online [link].
Du, X., Yu, Z., Gao, S., et al. (2024). Chinese tiny llm: Pretraining a chinese-centric large language model. DOI: 10.48550/arXiv.2404.04167.
Elangovan, A., He, J., and Verspoor, K. (2021). Memorization vs. generalization : Quantifying data leakage in NLP performance evaluation. In Merlo, P., Tiedemann, J., and Tsarfaty, R., editors, Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1325-1335, Online. Association for Computational Linguistics. DOI: 10.18653/v1/2021.eacl-main.113.
Freitas, C., Motta, E., Milidiú, R., et al. (2014). Sparkling vampire... lol! annotating opinions in a book review corpus. New language technologies and linguistic research: a two-way Road, pages 128-146. Available online [link].
Garcia, E. A. S. (2024). Open portuguese llm leaderboard. newblock [link].
Garcia, G. L., Paiola, P. H., Garcia, E., et al. (2025). Gembode and phibode: Adapting small language models to brazilian portuguese. In Hernández-García, R., Barrientos, R. J., and Velastin, S. A., editors, Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pages 228-243, Cham. Springer Nature Switzerland. DOI: 10.1007/978-3-031-76607-7_17.
Garcia, G. L., Paiola, P. H., Morelli, L. H., et al. (2024). Introducing bode: A fine-tuned large language model for portuguese prompt-based task. In arXiv.org. DOI: 10.48550/arXiv.2401.02909.
Gemini Team, Anil, R., Borgeaud, S., Alayrac, J.-B., Yu, J., and et al (2023). Gemini: A family of highly capable multimodal models. In arXiv.org. DOI: 10.48550/arXiv.2312.11805.
Gemma Team, Mesnard, T., Hardin, C., et al. (2024a). Gemma: Open models based on gemini research and technology. arXiv.org. DOI: 10.48550/arXiv.2403.08295.
Gemma Team, Riviere, M., Pathak, S., Sessa, P. G., Hardin, C., and et al (2024b). Gemma 2: Improving open language models at a practical size. DOI: 10.48550/arXiv.2408.00118.
Giray, L. (2023). Prompt engineering with chatgpt: A guide for academic writers. Annals of Biomedical Engineering, 51:2629-2633. DOI: 10.1007/s10439-023-03272-4.
Goyal, N., Gao, C., Chaudhary, V., et al. (2022). The flores-101 evaluation benchmark for low-resource and multilingual machine translation. Transactions of the Association for Computational Linguistics, 10:522-538. DOI: 10.1162/tacl_a_00474.
Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., and et al (2024). The llama 3 herd of models. DOI: 10.48550/arXiv.2407.21783.
Gunasekar, S., Zhang, Y., Aneja, J., et al. (2023). Textbooks are all you need. DOI: 10.48550/arXiv.2306.11644.
Han, X., Zhang, Z., Ding, N., et al. (2021). Pre-trained models: Past, present and future. In arXiv.org. DOI: 10.48550/arXiv.2106.07139.
Hartmann, J., Heitmann, M., Siebert, C., et al. (2023). More than a feeling: Accuracy and application of sentiment analysis. International Journal of Research in Marketing, 40(1):75-87. DOI: 10.1016/j.ijresmar.2022.05.005.
Hendrycks, D., Burns, C., Basart, S., et al. (2020). Measuring massive multitask language understanding. arXiv (Cornell University). DOI: 10.48550/arxiv.2009.03300.
Hershcovich, D., Webersinke, N., Kraus, M., Bingler, J. A., and Leippold, M. (2022). Towards climate awareness in nlp research. DOI: 10.18653/v1/2022.emnlp-main.159.
Hoffmann, J., Borgeaud, S., Mensch, A., et al. (2022). Training compute-optimal large language models. In arXiv.org. DOI: 10.48550/arXiv.2203.15556.
Holmes, D. T. (2020). Chapter 2 - statistical methods in laboratory medicine. In Clarke, W. and Marzinke, M. A., editors, Contemporary Practice in Clinical Chemistry (Fourth Edition), pages 15-35. Academic Press, fourth edition edition. DOI: 10.1016/B978-0-12-815499-1.00002-8.
Hu, E. J., Shen, Y., Wallis, P., et al. (2021). Lora: Low-rank adaptation of large language models. arXiv (Cornell University). DOI: 10.48550/arxiv.2106.09685.
InternLM Team (2023). Internlm: A multilingual language model with progressively enhanced capabilities. Available online [link].
Jiang, A. Q., Sablayrolles, A., Mensch, A., et al. (2023). Mistral 7b. DOI: 10.48550/arXiv.2310.06825.
Klishevich, E., Denisov-Blanch, Y., Obstbaum, S., Ciobanu, I., and Kosinski, M. (2025). Measuring determinism in large language models for software code review.
Krugmann, J. O. and Hartmann, J. (2024). Sentiment analysis in the age of generative ai. Customer Needs and Solutions, 11:3. DOI: 10.1007/s40547-024-00143-4.
Lacoste, A., Luccioni, A., Schmidt, V., and Dandres, T. (2019). Quantifying the carbon emissions of machine learning. DOI: 10.48550/arxiv.1910.09700.
Larcher, C., Piau, M., Finardi, P., et al. (2023). Cabrita: closing the gap for foreign languages. In arXiv.org. DOI: 10.48550/arXiv.2308.11878.
Li, X. and Qiu, X. (2023). Finding support examples for in-context learning. arXiv (Cornell University). DOI: 10.48550/arxiv.2302.13539.
Liu, J., Shen, D., Zhang, Y., et al. (2022). What makes good in-context examples for GPT-3? In Agirre, E., Apidianaki, M., and Vulić, I., editors, Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pages 100-114, Dublin, Ireland and Online. Association for Computational Linguistics. DOI: 10.18653/v1/2022.deelio-1.10.
Liu, P., Yuan, W., Fu, J., et al. (2021). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. DOI: 10.48550/arxiv.2107.13586.
Liu, Y., Ott, M., Goyal, N., et al. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv (Cornell University). DOI: 10.48550/arxiv.1907.11692.
Lu, Y., Bartolo, M., Moore, A., et al. (2022). Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Muresan, S., Nakov, P., and Villavicencio, A., editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8086-8098, Dublin, Ireland. Association for Computational Linguistics. DOI: 10.18653/v1/2022.acl-long.556.
Maas, A. L., Daly, R. E., Pham, P. T., et al. (2011). Learning word vectors for sentiment analysis. Available online [link].
Meta (2024). Introducing meta llama 3: The most capable openly available llm to date. Available online [link].
Minaee, S., Mikolov, T., Nikzad, N., et al. (2024). Large language models: A survey. arXiv (Cornell University). DOI: 10.48550/arxiv.2402.06196.
Mistral AI Team (2023). Mistral 7b in short. Available online [link].
Moraes, S., Santos, A., Redecker, M., et al. (2016). Comparing approaches to subjectivity classification: A study on portuguese tweets. In Silva, J., Ribeiro, R., Quaresma, P., Adami, A., and Branco, A., editors, Lecture Notes in Computer Science, volume 9727, page 86–94. Springer International Publishing. DOI: 10.1007/978-3-319-41552-9_8.
Mosbach, M., Pimentel, T., Ravfogel, S., et al. (2023). Few-shot fine-tuning vs. in-context learning: A fair comparison and evaluation. arXiv (Cornell University). DOI: 10.48550/arxiv.2305.16938.
Naveed, H., Khan, A. U., Qiu, S., et al. (2024). A comprehensive overview of large language models. DOI: 10.48550/arXiv.2307.06435.
Oliveira, M. V. and de Melo, T. (2020). Investigating sets of linguistic features for two sentiment analysis tasks in brazilian portuguese web reviews. Anais Estendidos do Simpósio Brasileiro de Sistemas Multimídia e Web (WebMedia), pages 45-48. DOI: 10.5753/webmedia_estendido.2020.13060.
OpenAI, :, Hurst, A., Lerer, A., Goucher, A. P., Perelman, A., and et al (2024a). Gpt-4o system card. DOI: 10.48550/arxiv.2410.21276.
OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., and et al (2024b). Gpt-4 technical report. DOI: 10.48550/arxiv.2303.08774.
Overwijk, A., Xiong, C., and Callan, J. (2022a). Clueweb22: 10 billion web documents with rich information. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '22, page 3360–3362, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3477495.3536321.
Overwijk, A., Xiong, C., Liu, X., et al. (2022b). Clueweb22: 10 billion web documents with visual and semantic information. DOI: 10.48550/arxiv.2211.15848.
Pires, R., Abonizio, H., Almeida, T. S., et al. (2023). Sabiá: Portuguese large language models. In Naldi, M. C. and Bianchi, R. A. C., editors, Lecture Notes in Computer Science, page 226–240. Springer Nature Switzerland. DOI: 10.1007/978-3-031-45392-2_15.
Přibáň, P., Šmíd, J., Steinberger, J., et al. (2024). A comparative study of cross-lingual sentiment analysis. Expert Systems with Applications, 247:123247. DOI: 10.1016/j.eswa.2024.123247.
Qiu, X., Sun, T., Xu, Y., et al. (2020). Pre-trained models for natural language processing: A survey. Science China Technological Sciences, 63:1872-1897. DOI: 10.1007/s11431-020-1647-3.
Qwen Team (2024). Introducing qwen1.5. Available online [link].
Radford, A., Narasimhan, K., Salimans, T., et al. (2018). Improving language understanding by generative pre-training. Available online [link].
Radford, A., Wu, J., Child, R., et al. (2019). Language models are unsupervised multitask learners. Available online [link].
Rae, J. W., Borgeaud, S., Cai, T., et al. (2022). Scaling language models: Methods, analysis & insights from training gopher. arXiv:2112.11446 [cs], page 120. DOI: https://doi.org/10.48550/arXiv.2112.11446.
Real, L., Oshiro, M., and Mafra, A. (2019). B2w-reviews01: An open product reviews corpus. In the Proceedings of the XII Symposium in Information and Human Language Technology., pages 200-208. SOCIEDADE BRASILEIRA DE COMPUTAÇÃO (SBC). Available online [link].
Reid, M., Savinov, N., Teplyashin, D., et al. (2024). Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv (Cornell University). DOI: 10.48550/arxiv.2403.05530.
Reynolds, L. and McDonell, K. (2021). Prompt programming for large language models: Beyond the few-shot paradigm. arXiv (Cornell University). DOI: 10.48550/arxiv.2102.07350.
Rubin, O., Herzig, J., and Berant, J. (2022). Learning to retrieve prompts for in-context learning. In Carpuat, M., de Marneffe, M.-C., and Meza Ruiz, I. V., editors, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2655-2671, Seattle, United States. Association for Computational Linguistics. DOI: 10.18653/v1/2022.naacl-main.191.
Sainz, O., Campos, J., García-Ferrero, I., Etxaniz, J., de Lacalle, O. L., and Agirre, E. (2023). NLP evaluation in trouble: On the need to measure LLM data contamination for each benchmark. In Bouamor, H., Pino, J., and Bali, K., editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10776-10787, Singapore. Association for Computational Linguistics. DOI: 10.18653/v1/2023.findings-emnlp.722.
Sales Almeida, T., Abonizio, H., Nogueira, R., et al. (2024). Sabiá-2: A new generation of portuguese large language models. arXiv (Cornell University). DOI: 10.48550/arxiv.2403.09887.
Scheff, S. W. (2016). Chapter 8 - nonparametric statistics. In Scheff, S. W., editor, Fundamental Statistical Principles for the Neurobiologist, pages 157-182. Academic Press. DOI: 10.1016/B978-0-12-804753-8.00008-7.
Silva, R. R. and Pardo, T. A. S. (2019). Corpus 4p: um córpus anotado de opiniões em português sobre produtos eletrônicos para fins de sumarização contrastiva de opinião. In Proceedings of the 6a Jornada de Descrição do Português (JDP), pages 1-9. SOCIEDADE BRASILEIRA DE COMPUTAÇÃO. Available online [link].
Simmering, P. F. and Huoviala, P. (2023). Large language models for aspect-based sentiment analysis. arXiv (Cornell University), page 12. DOI: 10.48550/arxiv.2310.18025.
Socher, R., Perelygin, A., Wu, J., et al. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Yarowsky, D., Baldwin, T., Korhonen, A., Livescu, K., and Bethard, S., editors, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631-1642, Seattle, Washington, USA. Association for Computational Linguistics. DOI: 10.18653/v1/d13-1170.
Song, Y., Wang, G., Li, S., and Lin, B. Y. (2024). The good, the bad, and the greedy: Evaluation of llms should not ignore non-determinism. DOI: 10.18653/v1/2025.naacl-long.211.
Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: Pretrained bert models for brazilian portuguese. In Cerri, R. and Prati, R. C., editors, Intelligent Systems, pages 403-417, Cham. Springer International Publishing. DOI: 10.1007/978-3-030-61377-8_28.
Souza, F. D. and Filho, J. B. d. O. e. S. (2022). Bert for sentiment analysis: Pre-trained and fine-tuned alternatives. In Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D., Magro, C., and Pinto, H., editors, Computational Processing of the Portuguese Language, pages 209-218, Cham. Springer International Publishing. DOI: 10.1007/978-3-030-98305-5_20.
Strubell, E., Ganesh, A., and McCallum, A. (2020). Energy and policy considerations for modern deep learning research. Proceedings of the AAAI Conference on Artificial Intelligence, 34(09):13693-13696. DOI: 10.1609/aaai.v34i09.7123.
Touvron, H., Lavril, T., Izacard, G., et al. (2023a). Llama: Open and efficient foundation language models. DOI: 10.48550/arXiv.2302.13971.
Touvron, H., Martin, L., Stone, K., et al. (2023b). Llama 2: Open foundation and fine-tuned chat models. In arXiv.org. DOI: 10.48550/arXiv.2307.09288.
Vargas, F. A., Sanches, R., and Rocha, P. R. (2020). Identifying fine-grained opinion and classifying polarity on coronavirus pandemic. In Proceedings of the 9th Brazilian Conference on Intelligent Systems (BRACIS 2020), page 511–520. Springer-Verlag. DOI: 10.1007/978-3-030-61377-8_35.
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. In arXiv.org. DOI: 10.48550/arXiv.1706.03762.
Wang, A., Singh, A., Michael, J., et al. (2019). Glue: A multi-task benchmark and analysis platform for natural language understanding. DOI: 10.18653/v1/w18-5446.
Wang, J. J. and Wang, V. X. (2025). Assessing consistency and reproducibility in the outputs of large language models: Evidence across diverse finance and accounting tasks.
Wang, Y., Kordi, Y., Mishra, S., et al. (2022). Self-instruct: Aligning language model with self generated instructions. arXiv (Cornell University). DOI: 10.48550/arxiv.2212.10560.
Wang, Z., Xie, Q., Ding, Z., et al. (2023). Is chatgpt a good sentiment analyzer? a preliminary study. arXiv (Cornell Univesity). DOI: 10.48550/arxiv.2304.04339.
Wei, J., Tay, Y., Bommasani, R., et al. (2022). Emergent abilities of large language models. In arXiv.org. DOI: 10.48550/arXiv.2206.07682.
White, J., Fu, Q., Hays, S., et al. (2023). A prompt pattern catalog to enhance prompt engineering with chatgpt. Arxiv (Cornell University). DOI: 10.48550/arxiv.2302.11382.
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6):80-83. DOI: 10.2307/3001968.
Yang, A., Yang, B., Hui, B., Zheng, B., Yu, B., and et al (2024a). Qwen2 technical report. [link].
Yang, J., Jin, H., Tang, R., et al. (2024b). Harnessing the power of llms in practice: A survey on chatgpt and beyond. ACM Trans. Knowl. Discov. Data, page 30. DOI: 10.1145/3649506.
Yao, Y., Duan, J., Xu, K., et al. (2024). A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confidence Computing, page 100211. DOI: 10.1016/j.hcc.2024.100211.
Ye, J., Wu, Z., Feng, J., et al. (2023). Compositional exemplars for in-context learning. arXiv.org. DOI: 10.48550/arXiv.2302.05698.
Yu, B. (2023). Benchmarking large language model volatility. DOI: https://doi.org/10.48550/arxiv.2311.15180.
Zeng, A., Liu, X., Du, Z., et al. (2023). GLM-130b: An open bilingual pre-trained model. In The Eleventh International Conference on Learning Representations. Available online [link].
Zhang, W., Deng, Y., Liu, B., et al. (2023). Sentiment analysis in the era of large language models: A reality check. arXiv.org. DOI: 10.48550/arXiv.2305.15005.
Zhao, J., Liu, K., and Xu, L. (2016). Sentiment analysis: Mining opinions, sentiments, and emotions. Computational Linguistics, 42:595-598. DOI: 10.1162/COLI_r_00259.
Zhao, W. X., Zhou, K., Li, J., et al. (2023). A survey of large language models. [link].
Zhong, Q., Ding, L., Liu, J., et al. (2023). Can chatgpt understand too? a comparative study on chatgpt and fine-tuned bert. arXiv (Cornell University). DOI: 10.48550/arxiv.2302.10198.
Zhou, Y., Muresanu, A. I., Han, Z., et al. (2022). Large language models are human-level prompt engineers. arXiv (Cornell University). DOI: 10.48550/arxiv.2211.01910.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 André da Fonseca Schuck, Gabriel Lino Garcia, João Renato Ribeiro Manesco, Pedro Henrique Paiola, João Paulo Papa

This work is licensed under a Creative Commons Attribution 4.0 International License.

