[1]
Almeida, T.S., Nogueira, R. and Pedrini, H. 2025. Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora. Journal of the Brazilian Computer Society. 31, 1 (Oct. 2025), 1247–1263. DOI:https://doi.org/10.5753/jbcs.2025.5788.