Almeida, T. S., Nogueira, R., & Pedrini, H. (2025). Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora. Journal of the Brazilian Computer Society, 31(1), 1247–1263. https://doi.org/10.5753/jbcs.2025.5788