Almeida, T. S., Nogueira, R. and Pedrini, H. (2025) “Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora”, Journal of the Brazilian Computer Society, 31(1), pp. 1247–1263. doi: 10.5753/jbcs.2025.5788.