1.
Almeida TS, Nogueira R, Pedrini H. Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora. JBCS [Internet]. 2025Oct.27 [cited 2026Jan.31];31(1):1246-62. Available from: https://journals-sol.sbc.org.br/index.php/jbcs/article/view/5788