Leveraging LLMs for Topic Modeling and Classification in Brazilian Funk Lyrics

Authors

DOI:

https://doi.org/10.5753/jidm.2026.5578

Keywords:

Lyrics, topic modeling, BERTopic, prompt engineering

Abstract

Song lyrics present unique challenges for topic modeling and classification due to their implicit discourse, reliance on figurative and poetic language, and use of slang. As a cultural expression of urban peripheries, Brazilian funk provides a rich social narrative. This work proposes LLMusic, an end-to-end framework for topic extraction and classification of song lyrics, using Brazilian funk as a case study. LLMusic synergistically combines prompt-based Large Language Models (LLMs) with advanced topic modeling techniques such as BERTopic, aiming to address the limitations of traditional methods for identifying subjectively represented topics in texts. Zero-shot prompting is also deployed for unsupervised classification of new lyrics based on the identified topics. Our assessments demonstrate that LLMusic outperforms BERTopic in identifying subjectively expressed topics while achieving strong performance in unsupervised topic classification. The paper describes the components of LLMusic for topic identification and topic classification and illustrates its effectiveness by analyzing the discourse in the most popular funk songs, highlighting its potential for large-scale lyrical analysis.

Downloads

Download data is not yet available.

References

Bencke, L., Paula, F., dos Santos, B., and Moreira, V. P. Can we trust llms as relevance judges? In Anais do

XXXIX Simp´osio Brasileiro de Bancos de Dados. SBC, Porto Alegre, RS, Brasil, pp. 600–612, 2024.

Betti, L., Abrate, C., and Kaltenbrunner, A. Large scale analysis of gender bias and sexism in song lyrics. EPJ

Data Science 12 (1): 10, 2023.

Blei, D. M., Ng, A. Y., and Jordan, M. I. Latent dirichlet allocation. Journal of Machine Learning Research vol. 3,

pp. 993–1022, Mar., 2003.

Brilhante, A. V. M., Giaxa, R. R. B., Branco, J. G. d. O., and Vieira, L. J. E. d. S. Cultura do estupro e violência

ostentação: uma an´alise a partir da artefactualidade do funk. Interface-Comunicação Saúde, Educação vol. 23, pp.

e170621, 2019.

Calcina, Erik e Novak, E. Measuring the similarity of song artists using topic modelling. In Proc. of the 25th Intl.

Multiconference Information Society - Data Mining and Data Warehouses (SiKDD). pp. 103–106, 2022.

Devi, M. D. and Saharia, N. Exploiting topic modelling to classify sentiment from lyrics. In Proc. of the 2nd Intl.

Conferemce on Machine Learning, Image Processing, Network Security and Data Sciences (MIND). pp. 411–423,

Facina, A. Que batida é essa? In Funk, Que batida ´e essa??, J. CASTRO, André e HAIAD (Ed.). Hunter Books, pp.

–228, 2012.

Grootendorst, M. BERTopic: Leveraging bert and topic modeling for efficient document clustering. https://

maartengr.github.io/BERTopic, 2022.

Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., et al.

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv

preprint arXiv:2311.05232 , 2023.

Junior, J. S., Rossi, R., and Lobato, F. Uma abordagem baseada em letras para a descoberta de conhecimento da

música brasileira: o sertanejo como um estudo de caso. In Anais do XVI Encontro Nacional de Inteligência Artificial

e Computacional. Salvador, pp. 949–960, 2019.

Liu, P., Yuan, W., Fu, J., Zhengbao, J., Hayashi, H., and Neubig, G. Pre-train, prompt, and predict: A systematic

survey of prompting methods in natural language processing. ACM Computing Surveys 55 (9): 1–35, 2023.

Lopes, A. C. Funk-se Quem Quiser: No Batidão Negro Da Cidade Carioca. Bom Texto FAPERJ, 2011.

Lopes, A. C. and Facina, A. Cidade do funk: expressões da diáspora negra nas favelas cariocas. Revista do Arquivo

Geral da Cidade do Rio de Janeiro vol. 6, pp. 193–206, 2012.

Oramas, S., Espinosa-Anke, L., G´omez, F., and Serra, X. Natural language processing for music knowledge discovery.

Journal of New Music Research vol. 47, pp. 365–382, 8, 2018.

Pengfei Liu, Weizhe Yuan, J. F. Z. J. H. H. and Neubig, G. Pre-train, prompt, and predict: A systematic survey

of prompting methods in natural language processing. ACMCom-put. 55 (9): 35, 2023.

Pereira, A. B. Funk ostentação em São Paulo: imaginação, consumo e novas tecnologia da informação e da comunicação.

Revista Estudos Culturais (1): 1–18, 2014.

Peres, F. C. Puta ou santa: as relações com mulheres enquanto elemento constituinte das masculinidades do funk

brasileiro? In Anais do IV Encontro Anual de Antropologia do Mercosul, 2023.

Pham, C. M., Hoyle, A., Sun, S., Resnik, P., and Iyyer, M. Topicgpt: A prompt-based topic modeling framework.

https://doi.org/10.48550/arXiv.2311.01449, 2024.

Qin, C., andZhuosheng Zhang, A. Z., Chen, J., Yasunaga, M., and Yang, D. Is chatgpt a general-purpose natural

language processing task solver? In Proc. of the 2023 EMNLP. pp. 1339–1384, 2023.

Roder, M., Both, A., and Hinneburg, A. Exploring the space of topic coherence measures. In Proceedings of the

eighth ACM international conference on Web search and data mining. pp. 399–408, 2015.

Watanabe, K. and Goto, M. Lyrics information processing: Analysis, generation, and applications. In Proceedings of

the 1st Workshop on NLP for Music and Audio (NLP4MusA). pp. 6–12, 2020.

Yepez, J., Tavares, B., Peres, F., and Becker, K. Llms got the funk: leveraging llm, prompt engineering and

fine-tuning for topic modeling on brazilian funk lyrics. In Proc. of the 2024 Conf. on Web Intelligence Conference

(WI-IAT), 2024a.

Yepez, J., Tavares, B., Peres, F., and Becker, K. Na batida do funk: modelagem de tópicos combinando llm,

engenharia de prompt e bertopic. In Anais do XXXIX Simp´osio Brasileiro de Bancos de Dados. SBC, Porto Alegre,

RS, Brasil, pp. 613–625, 2024b.

Zhang, T., Ladhak, F., Durmus, E., Liang, P., McKeown, K., and Hashimoto, T. B. Benchmarking Large Language

Models for News Summarization. Transactions of the Association for Computational Linguistics vol. 12, pp. 39–57,

, 2024.

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y.,

Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J.-Y., and Wen, J.-R.

A survey of large language models. , 3, 2023.

Ziems, C., Held, W., Shaikh, O., Chen, J., Zhang, Z., and Yang, D. Can large language models transform

computational social science? Comput. Linguistics vol. 50, 2024.

Downloads

Published

2026-03-13

How to Cite

Yepez Rojas, J. D. ., Tavares Santos, B. ., de Carvalho Leite Peres, F. ., & Becker, K. (2026). Leveraging LLMs for Topic Modeling and Classification in Brazilian Funk Lyrics. Journal of Information and Data Management, 17(1), 156–170. https://doi.org/10.5753/jidm.2026.5578

Issue

Section

SBBD 2024 Full papers - Extended papers