The Cocoruta Hub: Open and Curated Corpora, Datasets and Language Models on Brazilian Ocean Law
DOI:
https://doi.org/10.5753/jbcs.2025.5791Keywords:
Legal Large Language Models, Evaluation of Large Language Models, Domain-oriented Datasets, Open DataAbstract
This paper describes Cocoruta, an open-source hub for computational linguistics resources and language models related to the Brazilian legal context, with a focus on ocean law. The Cocoruta hub consists of open-access, curated corpora and datasets, and fine-tuned language models. Cocoruta includes two sets of resources. The first set is associated with a training dataset for question-answering tasks, whereas the second set features a dataset prepared to support model training for dialogue tasks, in addition to having refined curation procedures. In this paper, we provide a comprehensive analysis of Cocoruta's contributions, including information to ensure transparency and reproducibility of Cocoruta's construction process, fine-tuning of language models, and quantitative and qualitative evaluations of models. Quantitative evaluations are based on various performance metrics, while qualitative evaluations are conducted using human assessment procedures. This work contributes by advancing fundamental resources in specialized language domains and by fostering research and development in Brazilian legal natural language processing, serving as a hub to bring together efforts in this field.
Downloads
References
Al-Qaesm, R., Hendi, M., and Tantour, B. (2025). Alkafi-llama3: fine-tuning llms for precise legal understanding in palestine. Discover Artificial Intelligence, 5:107. DOI: 10.1007/s44163-025-00313-w.
Almeida, T. S., Abonizio, H., Nogueira, R., and Pires, R. (2024). Sabiá-2: A new generation of portuguese large language models. arXiv:2403.09887. DOI: 10.48550/arXiv.2403.09887.
Ariai, F. and Demartini, G. (2025). Natural language processing for the legal domain: A survey of tasks, datasets, models, and challenges. arXiv:2410.21306. DOI: 10.48550/arXiv.2410.21306.
Bertalan, V. G. F. and Ruiz, E. E. S. (2020). Predicting judicial outcomes in the brazilian legal system using textual features. In Workshop on Digital Humanities and Natural Language Processing (DHandNLP), Évora, Portugal. Available at:[link].
Bhardwaj, E., Gujral, H., Wu, S., Zogheib, C., Maharaj, T., and Becker, C. (2024). The state of data curation at neurips: An assessment of dataset development practices in the datasets and benchmarks track. arXiv:2410.22473. DOI: 10.48550/2410.22473.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D. (2020). Language models are few-shot learners. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., editors, Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), volume 33, pages 1877-1901, Virtual-only Conference. Curran Associates, Inc.. DOI: 10.48550/arxiv.2005.14165.
Burnell, R., Schellaert, W., Burden, J., Ullman, T. D., Martinez-Plumed, F., Tenenbaum, J. B., Rutar, D., Cheke, L. G., Sohl-Dickstein, J., Mitchell, M., Kiela, D., Shanahan, M., Voorhees, E. M., Cohn, A. G., Leibo, J. Z., and Hernandez-Orallo, J. (2023). Rethink reporting of evaluation results in ai. Science, 380(6641):136-138. DOI: 10.1126/science.adf6369.
Canaverde, B., Pires, T. P., Ribeiro, L. M., and Martins, A. F. T. (2025). Legalbench.pt: A benchmark for portuguese law. arXiv:2502.16357. DOI: 10.48550/arXiv.2502.16357.
Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., and Androutsopoulos, I. (2020). Legal-bert: The muppets straight out of law school. In Cohn, T., He, Y., and Liu, Y., editors, Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2898-2904, Online. Association for Computational Linguistics. DOI: 10.18653/v1/2020.findings-emnlp.261.
Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, Y., Yu, P. S., Yang, Q., and Xie, X. (2024). A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology, 15(3):1-45. DOI: 10.1145/3641289.
Chen, B. and Cherry, C. (2014). A systematic comparison of smoothing techniques for sentence-level bleu. In Bojar, O., Buck, C., Federmann, C., Haddow, B., Koehn, P., Monz, C., Post, M., and Specia, L., editors, Proceedings of the Ninth Workshop on Statistical Machine Translation, pages 362-367, Baltimore, Maryland, USA. Association for Computational Linguistics. DOI: 10.3115/v1/W14-3346.
Chen, Z. Z., Ma, J., Zhang, X., Hao, N., Yan, A., Nourbakhsh, A., Yang, X., McAuley, J., Petzold, L., and Wang, W. Y. (2024). A survey on large language models for critical societal domains: Finance, healthcare, and law. Transactions on Machine Learning Research. DOI: 10.48550/arXiv.2405.01769.
Colombo, P., Pires, T. P., Boudiaf, M., Culver, D., Melo, R., Corro, C., Martins, A. F. T., Esposito, F., Raposo, V. L., Morgado, S., and Desa, M. (2024). Saullm-7b: A pioneering large language model for law. arXiv:2403.03883. DOI: 10.48550/arXiv.2403.03883.
Cui, J., Shen, X., and Wen, S. (2023). A survey on legal judgment prediction: Datasets, metrics, models and challenges. IEEE Access, 11:102050-102071. DOI: 10.1109/ACCESS.2023.3317083.
do Espírito Santo, F. O., Peres, S. M., Gramacho, G. d. S., Brandão, A. A. F., and Cozman, F. G. (2024). Legal document-based, domain-driven Q&A system: Llms in perspective. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), pages 1-9, Yokohama, Japan. DOI: 10.1109/IJCNN60899.2024.10650895.
Garcia, E. A. S., Silva, N. F. F., Siqueira, F., Gomes, J. R. S., Albuqueruqe, H. O., Souza, E., Lima, E., and De Carvalho, A. (2024). Robertalexpt: A legal roberta model pretrained with deduplication for portuguese. In Proceedings of the Computational Processing of the Portuguese Language (PROPOR) - Vol. 1, pages 374-383, Santiago de Compostela, Spain. Association for Computational Linguistics. Available at:[link].
Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., Yang, A., Fan, A., Goyal, A., Hartshorn, A., Yang, A., Mitra, A., Sravankumar, A., Korenev, A., Hinsvark, A., Rao, A., Zhang, A., Rodriguez, A., Gregerson, A., Spataru, A., Roziere, B., Biron, B., Tang, B., Chern, B., Caucheteux, C., Nayak, C., Bi, C., Marra, C., McConnell, C., Keller, C., Touret, C., Wu, C., Wong, C., Ferrer, C. C., Nikolaidis, C., Allonsius, D., Song, D., Pintz, D., Livshits, D., Wyatt, D., Esiobu, D., Choudhary, D., Mahajan, D., Garcia-Olano, D., Perino, D., Hupkes, D., Lakomkin, E., AlBadawy, E., Lobanova, E., Dinan, E., Smith, E. M., Radenovic, F., Guzman, F., Zhang, F., Synnaeve, G., Lee, G., Anderson, G. L., Thattai, G., Nail, G., Mialon, G., Pang, G., Cucurell, G., Nguyen, H., Korevaar, H., Xu, H., Touvron, H., Zarov, I., Ibarra, I. A., Kloumann, I., Misra, I., Evtimov, I., Zhang, J., Copet, J., Lee, J., Geffert, J., Vranes, J., Park, J., Mahadeokar, J., Shah, J., van der Linde, J., Billock, J., Hong, J., Lee, J., Fu, J., Chi, J., Huang, J., Liu, J., Wang, J., Yu, J., Bitton, J., Spisak, J., Park, J., Rocca, J., Johnstun, J., Saxe, J., Jia, J., and Papakipos, Z. (2024). The llama 3 herd of models. CoRR, abs/2407.21783. DOI: 10.48550/arXiv.2407.21783.
Greenleaf, G., Mowbray, A., and Chung, P. (2018). Building sustainable free legal advisory systems: Experiences from the history of ai & law. Computer Law & Security Review, 34(2):314-326. DOI: 10.1016/j.clsr.2018.02.007.
Greenleaf, G., Mowbray, A., and Tyree, A. (1987). Legal expert systems: Words, words words... ? International Review of Law, Computers & Technology, 3(1):119-135. DOI: 10.1080/13600869.1987.9966258.
Guha, N., Nyarko, J., Ho, D. E., Ré, C., Chilton, A., Narayana, A., Chohlas-Wood, A., Peters, A., Waldon, B., Rockmore, D. N., Zambrano, D., Talisman, D., Hoque, E., Surani, F., Fagan, F., Sarfaty, G., Dickinson, G. M., Porat, H., Hegland, J., Wu, J., Nudell, J., Niklaus, J., Nay, J., Choi, J. H., Tobia, K., Hagan, M., Ma, M., Livermore, M., Rasumov-Rahe, N., Holzenberger, N., Kolt, N., Henderson, P., Rehaag, S., Goel, S., Gao, S., Williams, S., Gandhi, S., Zur, T., Iyer, V., and Li, Z. (2023). Legalbench: a collaboratively built benchmark for measuring legal reasoning in large language models. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NeurIPS), NIPS '23, Red Hook, NY, USA. Curran Associates Inc.. DOI: 10.2139/ssrn.4583531.
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2022). Lora: Low-rank adaptation of large language models. In Proceedings of the 10th International Conference on Learning Representations, (ICLR), Virtual-only Conference. DOI: 10.48550/arxiv.2106.09685.
Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., and Liu, T. (2025). A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43(2). DOI: 10.1145/3703155.
Hutchinson, B., Rostamzadeh, N., Greer, C., Heller, K., and Prabhakaran, V. (2022). Evaluation gaps in machine learning practice. In Proceedings of the 5th ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT), page 1859-1876, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3531146.3533233.
Kugler, L. (2025). How do you measure ai? Communications of the ACM, 68(4):15-17. DOI: 10.1145/3708972.
Lage-Freitas, A., Allende-Cid, H., Santana, O., and Oliveira-Lage, L. (2022). Predicting brazilian court decisions. PeerJ Computer Science, 8:e904. DOI: 10.7717/peerj-cs.904.
Lai, J., Gan, W., Wu, J., Qi, Z., and Yu, P. S. (2024). Large language models in law: A survey. AI Open, 5:181-196. DOI: 10.1016/j.aiopen.2024.09.002.
Liang, P., Bommasani, R., Lee, T., Tsipras, D., Soylu, D., Yasunaga, M., Zhang, Y., Narayanan, D., Wu, Y., Kumar, A., Newman, B., Yuan, B., Yan, B., Zhang, C., Cosgrove, C., Manning, C. D., Ré, C., Acosta-Navas, D., Hudson, D. A., Zelikman, E., Durmus, E., Ladhak, F., Rong, F., Ren, H., Yao, H., Wang, J., Santhanam, K., Orr, L., Zheng, L., Yuksekgonul, M., Suzgun, M., Kim, N., Guha, N., Chatterji, N., Khattab, O., Henderson, P., Huang, Q., Chi, R., Xie, S. M., Santurkar, S., Ganguli, S., Hashimoto, T., Icard, T., Zhang, T., Chaudhary, V., Wang, W., Li, X., Mai, Y., Zhang, Y., and Koreeda, Y. (2022). Holistic evaluation of language models. Transactions on Machine Learning Research. DOI: 10.48550/arXiv.2211.09110.
Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74-81, Barcelona, Spain. Association for Computational Linguistics. Available at:[link].
Ling, C., Zhao, X., Lu, J., Deng, C., Zheng, C., Wang, J., Chowdhury, T., Li, Y., Cui, H., Zhang, X., Zhao, T., Panalkar, A., Mehta, D., Pasquali, S., Cheng, W., Wang, H., Liu, Y., Chen, Z., Chen, H., White, C., Gu, Q., Pei, J., Yang, C., and Zhao, L. (2024). Domain specialization as the key to make large language models disruptive: A comprehensive survey. arXiv:2305.18703. DOI: 10.48550/arXiv.2305.18703.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692. DOI: 10.48550/arXiv.1907.11692.
Malaquias Junior, R., Pires, R., Almeida, T. S., Sakiyama, K., Romero, R. A. F., and Nogueira, R. (2025). The interplay between domain specialization and model size. CoRR, abs/2501.02068. DOI: 10.48550/arXiv.2501.02068.
Malaquias Junior, R., Pires, R., Romero, R., and Nogueira, R. (2024). Juru: Legal brazilian large language model from reputable sources. arXiv:2403.18140. DOI: 10.48550/arXiv.2403.18140.
Menezes-Neto, E. J. d. and Clementino, M. B. M. (2022). Using deep learning to predict outcomes of legal appeals better than human experts: A study with data from brazilian federal courts. PLOS ONE, 17(7):1-20. DOI: 10.1371/journal.pone.0272287.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Isabelle, P., Charniak, E., and Lin, D., editors, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pages 311-318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics. DOI: 10.3115/1073083.1073135.
Pires, R., Abonizio, H., Almeida, T. S., and Nogueira, R. (2023). Sabiá: Portuguese large language models. In Naldi, M. C. and Bianchi, R. A. C., editors, Proceedings of the 12th Brazilian Conference on Intelligent Systems (BRACIS), Lecture Notes in Computer Science, pages 226-240, Belo Horizonte, MG, Brasil. Springer Nature Switzerland. Available at:[link].
Pirozelli, P., Castro, A. B. R., de Oliveira, A. L. C., Oliveira, A. S., Cação, F. N., Silveira, I. C., Campos, J. G. M., Motheo, L. C., Figueiredo, L. F., Pellicer, L. F. A. O., José, M. A., José, M. M., Ligabue, P. M., Grava, R. S., Tavares, R. M., Matos, V. B., Sym, Y. V., Costa, A. H. R., Brandão, A. A. F., Mauá, D. D., Cozman, F. G., and Peres, S. M. (2022). The blue amazon brain (blab): a modular architecture of services about the brazilian maritime territory. In IJCAI-ECAI Workshops: Modeling Oceans and Climate Change - AIMOCC. DOI: 10.48550/arxiv.2209.07928.
Schluter, N. (2017). The limits of automatic summarisation according to rouge. In Lapata, M., Blunsom, P., and Koller, A., editors, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL), volume 2, pages 41-45, Valencia, Spain. Association for Computational Linguistics. DOI: 10.18653/v1/e17-2007.
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C. C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., Fuller, B., Gao, C., Goswami, V., Goyal, N., Hartshorn, A., Hosseini, S., Hou, R., Inan, H., Kardas, M., Kerkez, V., Khabsa, M., Kloumann, I., Korenev, A., Koura, P. S., Lachaux, M.-A., Lavril, T., Lee, J., Liskovich, D., Lu, Y., Mao, Y., Martinet, X., Mihaylov, T., Mishra, P., Molybog, I., Nie, Y., Poulton, A., Reizenstein, J., Rungta, R., Saladi, K., Schelten, A., Silva, R., Smith, E. M., Subramanian, R., Tan, X. E., Tang, B., Taylor, R., Williams, A., Kuan, J. X., Xu, P., Yan, Z., Zarov, I., Zhang, Y., Fan, A., Kambadur, M., Narang, S., Rodriguez, A., Stojnic, R., Edunov, S., and Scialom, T. (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288. DOI: 10.48550/arXiv.2307.09288.
Waterman, D. A., Paul, J., and Peterson, M. (1986). Expert systems for legal decision making. Expert Systems, 3(4):212-226. DOI: 10.1111/j.1468-0394.1986.tb00203.x.
Xiao, C., Hu, X., Liu, Z., Tu, C., and Sun, M. (2021). Lawformer: A pre-trained language model for chinese legal long documents. AI Open, 2:79-84. DOI: 10.1016/j.aiopen.2021.06.003.
Yang, J., Jin, H., Tang, R., Han, X., Feng, Q., Jiang, H., Zhong, S., Yin, B., and Hu, X. (2024). Harnessing the power of llms in practice: A survey on chatgpt and beyond. ACM Transactions on Knowledge Discovery from Data, 18(6). DOI: 10.1145/3649506.
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., and Artzi, Y. (2020). Bertscore: Evaluating text generation with bert. arXiv:1904.09675. DOI: 10.48550/arXiv.1904.09675.
Zhao, W., Peyrard, M., Liu, F., Gao, Y., Meyer, C. M., and Eger, S. (2019). Moverscore: Text generation evaluating with contextualized embeddings and earth mover distance. In Inui, K., Jiang, J., Ng, V., and Wan, X., editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 563-578, Hong Kong, China. Association for Computational Linguistics. DOI: 10.18653/v1/D19-1053.
Zhu, L., Yang, L., Li, C., Hu, S., Liu, L., and Yin, B. (2024). Legilm: A fine-tuned legal language model for data compliance. arXiv:2409.13721. DOI: 10.48550/arXiv.2409.13721.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Felipe Oliveira do Espírito Santo, Sarajane Marques Peres, Bernardo Gonçalves, Fabio José Muneratti Ortega, Vinícius Bitencourt Matos, André Paulino de Lima, Anarosa Alves Franco Brandão, Fábio Gagliardi Cozman

This work is licensed under a Creative Commons Attribution 4.0 International License.

