ALMEIDA, T. S.; NOGUEIRA, R.; PEDRINI, H. Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora. Journal of the Brazilian Computer Society, [S. l.], v. 31, n. 1, p. 1246–1262, 2025. DOI: 10.5753/jbcs.2025.5788. Disponível em: https://journals-sol.sbc.org.br/index.php/jbcs/article/view/5788. Acesso em: 31 jan. 2026.