1.
Almeida TS, Nogueira R, Pedrini H. Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora. JBCS [Internet]. 2025Oct.27 [cited 2025Dec.5];31(1):1247-63. Available from: https://journals-sol.sbc.org.br/index.php/jbcs/article/view/5788