Almeida, Thales Sales, Rodrigo Nogueira, and Helio Pedrini. 2025. “Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora”. Journal of the Brazilian Computer Society 31 (1):1247-63. https://doi.org/10.5753/jbcs.2025.5788.