Leveraging LLMs for Topic Modeling and Classification in Brazilian Funk Lyrics
DOI:
https://doi.org/10.5753/jidm.2026.5578Keywords:
Lyrics, topic modeling, BERTopic, prompt engineeringAbstract
Song lyrics present unique challenges for topic modeling and classification due to their implicit discourse, reliance on figurative and poetic language, and use of slang. As a cultural expression of urban peripheries, Brazilian funk provides a rich social narrative. This work proposes LLMusic, an end-to-end framework for topic extraction and classification of song lyrics, using Brazilian funk as a case study. LLMusic synergistically combines prompt-based Large Language Models (LLMs) with advanced topic modeling techniques such as BERTopic, aiming to address the limitations of traditional methods for identifying subjectively represented topics in texts. Zero-shot prompting is also deployed for unsupervised classification of new lyrics based on the identified topics. Our assessments demonstrate that LLMusic outperforms BERTopic in identifying subjectively expressed topics while achieving strong performance in unsupervised topic classification. The paper describes the components of LLMusic for topic identification and topic classification and illustrates its effectiveness by analyzing the discourse in the most popular funk songs, highlighting its potential for large-scale lyrical analysis.
Downloads
References
Bencke, L., Paula, F., dos Santos, B., and Moreira, V. P. Can we trust llms as relevance judges? In Anais do
XXXIX Simp´osio Brasileiro de Bancos de Dados. SBC, Porto Alegre, RS, Brasil, pp. 600–612, 2024.
Betti, L., Abrate, C., and Kaltenbrunner, A. Large scale analysis of gender bias and sexism in song lyrics. EPJ
Data Science 12 (1): 10, 2023.
Blei, D. M., Ng, A. Y., and Jordan, M. I. Latent dirichlet allocation. Journal of Machine Learning Research vol. 3,
pp. 993–1022, Mar., 2003.
Brilhante, A. V. M., Giaxa, R. R. B., Branco, J. G. d. O., and Vieira, L. J. E. d. S. Cultura do estupro e violência
ostentação: uma an´alise a partir da artefactualidade do funk. Interface-Comunicação Saúde, Educação vol. 23, pp.
e170621, 2019.
Calcina, Erik e Novak, E. Measuring the similarity of song artists using topic modelling. In Proc. of the 25th Intl.
Multiconference Information Society - Data Mining and Data Warehouses (SiKDD). pp. 103–106, 2022.
Devi, M. D. and Saharia, N. Exploiting topic modelling to classify sentiment from lyrics. In Proc. of the 2nd Intl.
Conferemce on Machine Learning, Image Processing, Network Security and Data Sciences (MIND). pp. 411–423,
Facina, A. Que batida é essa? In Funk, Que batida ´e essa??, J. CASTRO, André e HAIAD (Ed.). Hunter Books, pp.
–228, 2012.
Grootendorst, M. BERTopic: Leveraging bert and topic modeling for efficient document clustering. https://
maartengr.github.io/BERTopic, 2022.
Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., et al.
A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv
preprint arXiv:2311.05232 , 2023.
Junior, J. S., Rossi, R., and Lobato, F. Uma abordagem baseada em letras para a descoberta de conhecimento da
música brasileira: o sertanejo como um estudo de caso. In Anais do XVI Encontro Nacional de Inteligência Artificial
e Computacional. Salvador, pp. 949–960, 2019.
Liu, P., Yuan, W., Fu, J., Zhengbao, J., Hayashi, H., and Neubig, G. Pre-train, prompt, and predict: A systematic
survey of prompting methods in natural language processing. ACM Computing Surveys 55 (9): 1–35, 2023.
Lopes, A. C. Funk-se Quem Quiser: No Batidão Negro Da Cidade Carioca. Bom Texto FAPERJ, 2011.
Lopes, A. C. and Facina, A. Cidade do funk: expressões da diáspora negra nas favelas cariocas. Revista do Arquivo
Geral da Cidade do Rio de Janeiro vol. 6, pp. 193–206, 2012.
Oramas, S., Espinosa-Anke, L., G´omez, F., and Serra, X. Natural language processing for music knowledge discovery.
Journal of New Music Research vol. 47, pp. 365–382, 8, 2018.
Pengfei Liu, Weizhe Yuan, J. F. Z. J. H. H. and Neubig, G. Pre-train, prompt, and predict: A systematic survey
of prompting methods in natural language processing. ACMCom-put. 55 (9): 35, 2023.
Pereira, A. B. Funk ostentação em São Paulo: imaginação, consumo e novas tecnologia da informação e da comunicação.
Revista Estudos Culturais (1): 1–18, 2014.
Peres, F. C. Puta ou santa: as relações com mulheres enquanto elemento constituinte das masculinidades do funk
brasileiro? In Anais do IV Encontro Anual de Antropologia do Mercosul, 2023.
Pham, C. M., Hoyle, A., Sun, S., Resnik, P., and Iyyer, M. Topicgpt: A prompt-based topic modeling framework.
https://doi.org/10.48550/arXiv.2311.01449, 2024.
Qin, C., andZhuosheng Zhang, A. Z., Chen, J., Yasunaga, M., and Yang, D. Is chatgpt a general-purpose natural
language processing task solver? In Proc. of the 2023 EMNLP. pp. 1339–1384, 2023.
Roder, M., Both, A., and Hinneburg, A. Exploring the space of topic coherence measures. In Proceedings of the
eighth ACM international conference on Web search and data mining. pp. 399–408, 2015.
Watanabe, K. and Goto, M. Lyrics information processing: Analysis, generation, and applications. In Proceedings of
the 1st Workshop on NLP for Music and Audio (NLP4MusA). pp. 6–12, 2020.
Yepez, J., Tavares, B., Peres, F., and Becker, K. Llms got the funk: leveraging llm, prompt engineering and
fine-tuning for topic modeling on brazilian funk lyrics. In Proc. of the 2024 Conf. on Web Intelligence Conference
(WI-IAT), 2024a.
Yepez, J., Tavares, B., Peres, F., and Becker, K. Na batida do funk: modelagem de tópicos combinando llm,
engenharia de prompt e bertopic. In Anais do XXXIX Simp´osio Brasileiro de Bancos de Dados. SBC, Porto Alegre,
RS, Brasil, pp. 613–625, 2024b.
Zhang, T., Ladhak, F., Durmus, E., Liang, P., McKeown, K., and Hashimoto, T. B. Benchmarking Large Language
Models for News Summarization. Transactions of the Association for Computational Linguistics vol. 12, pp. 39–57,
, 2024.
Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y.,
Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J.-Y., and Wen, J.-R.
A survey of large language models. , 3, 2023.
Ziems, C., Held, W., Shaikh, O., Chen, J., Zhang, Z., and Yang, D. Can large language models transform
computational social science? Comput. Linguistics vol. 50, 2024.

