ALMEIDA, T. S.; NOGUEIRA, R.; PEDRINI, H. Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora. Journal of the Brazilian Computer Society, [S. l.], v. 31, n. 1, p. 1247–1263, 2025. DOI: 10.5753/jbcs.2025.5788. Disponível em: https://journals-sol.sbc.org.br/index.php/jbcs/article/view/5788. Acesso em: 5 dec. 2025.