[1]
T. S. Almeida, R. Nogueira, and H. Pedrini, “Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora”, JBCS, vol. 31, no. 1, pp. 1247–1263, Oct. 2025.