Almeida, Thales Sales, Rodrigo Nogueira, and Helio Pedrini. “Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora”. Journal of the Brazilian Computer Society 31, no. 1 (October 27, 2025): 1246–1262. Accessed January 31, 2026. https://journals-sol.sbc.org.br/index.php/jbcs/article/view/5788.