The Bode Family of Large Language Models: Investigating the Frontiers of LLMs in Brazilian Portuguese
DOI:
https://doi.org/10.5753/jbcs.2025.5812Keywords:
Large Language Models, Brazilian Portuguese, Natural Language Processing, Small Language ModelsAbstract
The rapid advancement of Large Language Models (LLMs) has significantly impacted Natural Language Processing, yet their effectiveness remains uneven across languages. Most state-of-the-art models are trained predominantly in English, leading to performance disparities in lower-resource languages such as Brazilian Portuguese (BP). This paper explores fine-tuning strategies for adapting open-weight LLMs to BP, focusing on dataset translation techniques, linguistic adaptation challenges, and parameter-efficient fine-tuning methods, such as LoRA and Q-LoRA. We present a benchmark analysis evaluating multiple fine-tuning approaches across various open models, establishing a guiding framework for future BP-specific adaptations. Our results showcase the importance of specialized fine-tuning in improving cross-lingual transfer and NLP performance in BP, contributing to the broader goal of enhancing multilingual language model accessibility.
Downloads
References
Ainslie, J., Lee-Thorp, J., de Jong, M., Zemlyanskiy, Y., Lebrón, F., and Sanghai, S. (2023). Gqa: Training generalized multi-query transformer models from multi-head checkpoints. DOI: 10.18653/v1/2023.emnlp-main.298.
Almeida, T. S., Laitz, T., Bonás, G. K., and Nogueira, R. (2023). BLUEX: A benchmark based on Brazilian Leading Universities Entrance eXams. arXiv. DOI: 10.48550/arXiv.2307.05410.
Beltagy, I., Peters, M. E., and Cohan, A. (2020). Longformer: The long-document transformer. DOI: 10.48550/arxiv.2004.05150.
Bender, E. M., Gebru, T., McMillan-Major, A., and Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT '21, page 610–623, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3442188.3445922.
Brum, H. and Volpe Nunes, M. d. G. (2018). Building a Sentiment Corpus of Tweets in Brazilian Portuguese. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA). DOI: 10.48550/arXiv.1712.08917.
Chaudhary, S. (2023). Code alpaca: An instruction-following llama model for code generation. Available online [link].
Child, R., Gray, S., Radford, A., and Sutskever, I. (2019). Generating long sequences with sparse transformers. DOI: 10.48550/arxiv.1904.10509.
Corrêa, N. K., Falk, S., Fatimah, S., Sen, A., and De Oliveira, N. (2024). Teenytinyllama: open-source tiny language models trained in brazilian portuguese. Machine Learning with Applications, 16:100558. DOI: 10.1016/j.mlwa.2024.100558.
Delfino Pedro, Cuconato Bruno, Haeusler Edward Hermann, and Rademaker Alexandre (2017). Passing the Brazilian OAB Exam: Data Preparation and Some Experiments. In Frontiers in Artificial Intelligence and Applications. IOS Press. DOI: 10.48550/arXiv.1712.05128.
Dettmers, T., Pagnoni, A., Holtzman, A., and Zettlemoyer, L. (2023). Qlora: Efficient finetuning of quantized llms. Advances in neural information processing systems, 36:10088-10115. DOI: 10.48550/arxiv.2305.14314.
Ding, N., Chen, Y., Xu, B., Qin, Y., Zheng, Z., Hu, S., Liu, Z., Sun, M., and Zhou, B. (2023). Enhancing chat language models by scaling high-quality instructional conversations. DOI: 10.18653/v1/2023.emnlp-main.183.
Ding, Y., Zhang, L. L., Zhang, C., Xu, Y., Shang, N., Xu, J., Yang, F., and Yang, M. (2024). Longrope: Extending llm context window beyond 2 million tokens. DOI: 10.48550/arxiv.2402.13753.
Du, J., Grave, É., Gunel, B., Chaudhary, V., Çelebi, O., Auli, M., Stoyanov, V., and Conneau, A. (2021). Self-training improves pre-training for natural language understanding. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5408-5418. DOI: 10.18653/v1/2021.naacl-main.426.
Fortuna, P., Rocha da Silva, J., Soler-Company, J., Wanner, L., and Nunes, S. (2019). A Hierarchically-Labeled Portuguese Hate Speech Dataset. In Proceedings of the Third Workshop on Abusive Language Online, pages 94-104, Florence, Italy. Association for Computational Linguistics. DOI: 10.18653/v1/W19-3510.
Garcia, E. A. S. (2024). Open portuguese llm leaderboard. Available online [link].
Garcia, G. L., Paiola, P. H., Garcia, E., Manesco, J. R. R., and Papa, J. P. (2025). GemBode and PhiBode: Adapting Small Language Models to Brazilian Portuguese. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pages 228-243, Cham. Springer Nature Switzerland. DOI: 10.1007/978-3-031-76607-7_17.
Garcia, G. L., Paiola, P. H., Morelli, L. H., Candido, G., Júnior, A. C., Jodas, D. S., Afonso, L., Guilherme, I. R., Penteado, B. E., and Papa, J. P. (2024). Introducing bode: a fine-tuned large language model for portuguese prompt-based task. arXiv preprint arXiv:2401.02909. DOI: 10.48550/arxiv.2401.02909.
Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., et al. (2024). The llama 3 herd of models. arXiv preprint arXiv:2407.21783. DOI: 10.48550/arXiv.2407.21783.
Gunasekar, S., Zhang, Y., Aneja, J., Mendes, C. C. T., Del Giorno, A., Gopi, S., Javaheripi, M., Kauffmann, P., de Rosa, G., Saarikivi, O., Salim, A., Shah, S., Behl, H. S., Wang, X., Bubeck, S., Eldan, R., Kalai, A. T., Lee, Y. T., and Li, Y. (2023). Textbooks Are All You Need. arXiv. DOI: 10.48550/arXiv.2306.11644.
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2021). Lora: Low-rank adaptation of large language models. DOI: 10.48550/arxiv.2106.09685.
Hughes, A. (2023). Phi-2: The surprising power of small language models. Available online [link].
InternLM Team (2023). Internlm: A multilingual language model with progressively enhanced capabilities. Available online [link].
Jain, S. (2022). tiktoken: A fast bpe tokeniser for use with openai’s models. Available online [link].
Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., Casas, D. d. l., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Scao, T. L., Lavril, T., Wang, T., Lacroix, T., and Sayed, W. E. (2023). Mistral 7B. arXiv. DOI: 10.48550/arXiv.2310.06825.
Jiang, A. Q., Sablayrolles, A., Roux, A., Mensch, A., Savary, B., Bamford, C., Chaplot, D. S., de las Casas, D., Hanna, E. B., Bressand, F., Lengyel, G., Bour, G., Lample, G., Lavaud, L. R., Saulnier, L., Lachaux, M.-A., Stock, P., Subramanian, S., Yang, S., Antoniak, S., Scao, T. L., Gervet, T., Lavril, T., Wang, T., Lacroix, T., and Sayed, W. E. (2024). Mixtral of experts. DOI: 10.48550/arxiv.2401.04088.
Jodas, D. S., Garcia, G. L., Paiola, P. H., Ribeiro Manesco, J. R., and Papa, J. P. (2024). Impact of quantization on large language models for portuguese classification tasks. In Iberoamerican Congress on Pattern Recognition, pages 213-227. Springer. DOI: 10.1007/978-3-031-76607-7_16.
Kumar, P. (2024). Large language models (llms): survey, technical frameworks, and future challenges. Artificial Intelligence Review, 57(10):260. DOI: 10.1007/s10462-024-10888-y.
Köpf, A., Kilcher, Y., von Rütte, D., Anagnostidis, S., Tam, Z.-R., Stevens, K., Barhoum, A., Duc, N. M., Stanley, O., Nagyfi, R., ES, S., Suri, S., Glushkov, D., Dantuluri, A., Maguire, A., Schuhmann, C., Nguyen, H., and Mattick, A. (2023). Openassistant conversations - democratizing large language model alignment. Available online [link].
Lacoste, A., Luccioni, A., Schmidt, V., and Dandres, T. (2019). Quantifying the carbon emissions of machine learning. arXiv preprint arXiv:1910.09700. DOI: https://doi.org/10.48550/arxiv.1910.09700.
Lai, J., Gan, W., Wu, J., Qi, Z., and Yu, P. S. (2024). Large language models in law: A survey. AI Open. DOI: 10.1016/j.aiopen.2024.09.002.
Larcher, C., Piau, M., Finardi, P., Gengo, P., Esposito, P., and Caridá, V. (2023). Cabrita: closing the gap for foreign languages. arXiv preprint arXiv:2308.11878. DOI: https://doi.org/10.48550/arxiv.2308.11878.
Li, C., Chen, M., Wang, J., Sitaram, S., and Xie, X. (2024). Culturellm: Incorporating cultural differences into large language models. Advances in Neural Information Processing Systems, 37:84799-84838. DOI: 10.48550/arxiv.2402.10946.
Li, Y., Bubeck, S., Eldan, R., Giorno, A. D., Gunasekar, S., and Lee, Y. T. (2023). Textbooks Are All You Need II: phi-1.5 technical report. arXiv. DOI: 10.48550/arXiv.2309.05463.
Lopes, R., Magalhães, J., and Semedo, D. (2024). Glória-a generative and open large language model for portuguese. DOI: 10.48550/arXiv.2402.12969.
Nunes, D., Primi, R., Pires, R., Lotufo, R., and Nogueira, R. (2023). Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams. arXiv. DOI: 10.48550/arXiv.2303.17003.
Paiola, P. H., Garcia, G. L., Manesco, J. R. R., Roder, M., Rodrigues, D., and Papa, J. P. (2024). Adapting llms for the medical domain in portuguese: A study on fine-tuning and model evaluation. arXiv preprint arXiv:2410.00163. DOI: 10.24132/csrn.2025-37.
Pires, R., Abonizio, H., Almeida, T. S., and Nogueira, R. (2023). Sabiá: Portuguese large language models. In Brazilian Conference on Intelligent Systems, pages 226-240. Springer. DOI: 10.1007/978-3-031-45392-2_15.
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al. (2018). Improving language understanding by generative pre-training. Available online [link].
Rafailov, R., Sharma, A., Mitchell, E., Manning, C. D., Ermon, S., and Finn, C. (2023). Direct preference optimization: Your language model is secretly a reward model. In Advances in Neural Information Processing Systems, volume 36, pages 53728-53741. Curran Associates, Inc. Available online [link].
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1-67. DOI: 10.48550/arxiv.1910.10683.
Real, L., Fonseca, E., and Gonçalo Oliveira, H. (2020). The ASSIN 2 Shared Task: A Quick Overview. In Computational Processing of the Portuguese Language: 14th International Conference, PROPOR 2020, Evora, Portugal, March 2–4, 2020, Proceedings, pages 406-412, Berlin, Heidelberg. Springer-Verlag. DOI: 10.1007/978-3-030-41505-1_39.
Sayama, H. F., Araujo, A. V., and Fernandes, E. R. (2019). FaQuAD: Reading Comprehension Dataset in the Domain of Brazilian Higher Education. In 2019 8th Brazilian Conference on Intelligent Systems (BRACIS), pages 443-448. DOI: 10.1109/BRACIS.2019.00084.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. CoRR, abs/1707.06347. DOI: 10.48550/arxiv.1707.06347.
Sennrich, R., Haddow, B., and Birch, A. (2016). Improving neural machine translation models with monolingual data. In 54th Annual Meeting of the Association for Computational Linguistics, pages 86-96. Association for Computational Linguistics (ACL). DOI: 10.18653/v1/p16-1009.
Shazeer, N. (2020). Glu variants improve transformer. DOI: 10.48550/arxiv.2002.05202.
Singh, S., Vargus, F., Dsouza, D., Karlsson, B. F., Mahendiran, A., Ko, W.-Y., Shandilya, H., Patel, J., Mataciunas, D., OMahony, L., Zhang, M., Hettiarachchi, R., Wilson, J., Machado, M., Moura, L. S., Krzemiński, D., Fadaei, H., Ergün, I., Okoh, I., Alaagib, A., Mudannayake, O., Alyafeai, Z., Chien, V. M., Ruder, S., Guthikonda, S., Alghamdi, E. A., Gehrmann, S., Muennighoff, N., Bartolo, M., Kreutzer, J., Üstün, A., Fadaee, M., and Hooker, S. (2024). Aya dataset: An open-access collection for multilingual instruction tuning. DOI: 10.18653/v1/2024.acl-long.620.
Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: pretrained bert models for brazilian portuguese. In Brazilian conference on intelligent systems, pages 403-417. Springer. DOI: 10.1007/978-3-030-61377-8_28.
Su, J., Lu, Y., Pan, S., Murtadha, A., Wen, B., and Liu, Y. (2023). Roformer: Enhanced transformer with rotary position embedding. DOI: 10.1016/j.neucom.2023.127063.
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., and Lample, G. (2023a). LLaMA: Open and Efficient Foundation Language Models. arXiv. DOI: 10.48550/arXiv.2302.13971.
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al. (2023b). Llama 2: Open Foundation and Fine-Tuned Chat Models.
Tunstall, L., Beeching, E., Lambert, N., Rajani, N., Rasul, K., Belkada, Y., Huang, S., von Werra, L., Fourrier, C., Habib, N., Sarrazin, N., Sanseviero, O., Rush, A. M., and Wolf, T. (2023). Zephyr: Direct distillation of lm alignment. DOI: 10.48550/arxiv.2310.16944.
Vargas, F., Carvalho, I., Rodrigues de Góes, F., Pardo, T., and Benevenuto, F. (2022). HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Offensive Language and Hate Speech Detection. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 7174-7183, Marseille, France. European Language Resources Association. DOI: 10.48550/arXiv.2103.14972.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc. Available online [link].
XTuner Contributors (2023). Xtuner: A toolkit for efficiently fine-tuning llm. [link].
Yu, L., Jiang, W., Shi, H., Yu, J., Liu, Z., Zhang, Y., Kwok, J. T., Li, Z., Weller, A., and Liu, W. (2024). Metamath: Bootstrap your own mathematical questions for large language models. DOI: 10.48550/arxiv.2309.12284.
Zhang, B. and Sennrich, R. (2019). Root mean square layer normalization. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc. Available online [link].
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Pedro Henrique Paiola, Gabriel Lino Garcia, João Vitor Mariano Correia, João Renato Ribeiro Manesco, Ana Lara Alves Garcia, João Paulo Papa

This work is licensed under a Creative Commons Attribution 4.0 International License.

