Almeida, Thales Sales, Rodrigo Nogueira, and Helio Pedrini. “Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora”. Journal of the Brazilian Computer Society 31, no. 1 (October 27, 2025): 1247–1263. Accessed December 5, 2025. https://journals-sol.sbc.org.br/index.php/jbcs/article/view/5788.