Almeida, T. S., R. Nogueira, and H. Pedrini. “Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora”. Journal of the Brazilian Computer Society, vol. 31, no. 1, Oct. 2025, pp. 1247-63, doi:10.5753/jbcs.2025.5788.