SONBRA: A Public and Annotated Dataset for Research on Brazilian Music Genre Classification

Authors

DOI:

https://doi.org/10.5753/reic.2026.7222

Keywords:

Machine learning, music information retrieval, music genre classification

Abstract

This paper introduces SONBRA, a publicly available annotated dataset designed to support and advance research in the automatic classification of Brazilian music genres. The dataset encompasses eight highly representative national genres: Bossa Nova, Forró, Forró Piseiro, Funk, Pagode, Samba, Samba-Enredo, and Sertanejo. For its creation, music tracks were collected and segmented into 30-second excerpts extracted from five distinct segments of each track (beginning, beginning-to-middle, middle, middle-to-end, and end), resulting in a total of 10,000 samples. Each sample is annotated with its corresponding genre label and is characterized by 785 features extracted from the audio, including Chroma CENS, Chroma CQT, Chroma STFT, Fourier Tempogram, Mel-spectrogram, MFCC, RMS, Spectral Bandwidth, Spectral Centroid, Spectral Roll-Off, Tempogram, Tonnetz, and ZCR. To validate the dataset’s utility for machine learning tasks, we conducted a comprehensive classification benchmark. We evaluated the following algorithms: Decision Tree, K-Nearest Neighbors (KNN), Multi-Layer Perceptron (MLP), Random Forest, Support Vector Machines (SVM), XGBoost, and a Voting ensemble. The results demonstrate the dataset’s effectiveness, with the Voting classifier achieving the highest accuracy (83.2%) and feature combinations involving Mel, MFCC, Tempogram, and Chroma CENS proving to be the most informative for genre prediction. This paper details the curation methodology and structure of the SONBRA dataset, which we release as a public resource to the scientific community to serve as a benchmark and to stimulate further computational research into the richness of Brazilian music.

Downloads

Download data is not yet available.

References

Al Mamun, M. A., Kadir, I., Rabby, A. S. A., and Al Azmi, A. (2019). Bangla music genre classification using neural network. In 2019 8th International Conference System Modeling and Advancement in Research Trends (SMART), pages 397–403. DOI: 10.1109/SMART46866.2019.9117400.

Alonso, G. (2011). Cowboys do Asfalto. Doutorado em história, Programa de Pós-Graduação em História, Universidade Federal Fluminense, Niterói.

Bezerra, J. (2017). Funk: a batida dos bailes cariocas que contagiou o Brasil. Panda Books, São Paulo, 1 edition.

Bühlmann, P. (2012). Bagging, boosting and ensemble methods. Handbook of Computational Statistics, page 39. DOI: 10.1007/978-3-642-21551-3_33.

Caldas, W. (1987). O que é a música sertaneja? Brasiliense, São Paulo.

Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–794, Nova York. Association for Computing Machinery. DOI: 10.1145/2939672.2939785.

Conceição, J., Freitas, R., Gadelha, B., Kienen, J., Anders, S., and Cavalcante, B. (2020). Applying supervised learning techniques to brazilian music genre classification. In 2020 XLVI Latin American Computing Conference (CLEI), pages 102–107. DOI: 10.1109/CLEI52000.2020.00019.

Costa, C., Valle, J., and Koerich, A. (2004). Automatic classification of audio data. In 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583), volume 1, pages 562–567. DOI: 10.1109/ICSMC.2004.1398359.

Deferrard, M., Benzi, K., Vandergheynst, P., and Bresson, X. (2017). Fma: A dataset for music analysis.

Dias, I. and Dupan, S. (2022). O que é o Forró? Meroveu, Campina Grande, 3 edition.

Farajzadeh, N., Sadeghzadeh, N., and Hashemzadeh, M. (2023). Pmg-net: Persian music genre classification using deep neural networks. Entertainment Computing, 44:100518. DOI: 10.1016/j.entcom.2022.100518.

Géron, A. (2019). Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly, Sebastopol, 2 edition.

Harte, C., Sandler, M., and Gasser, M. (2006). Detecting harmonic change in musical audio. In Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia, pages 21–26, New York. Association for Computing Machinery.

Hasib, K. M., Tanzim, A., Shin, J., Faruk, K. O., Mahmud, J. A., and Mridha, M. F. (2022). Bmnet-5: A novel approach of neural network to classify the genre of bengali music based on audio features. IEEE Access, 10:108545–108563. DOI: 10.1109/ACCESS.2022.3213818.

Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, 2 edition.

Klapuri, A. and Davy, M., editors (2006). Signal Processing Methods for Music Transcription. Springer, New York.

Kotsiantis, S., Zaharakis, I., and Pintelas, P. (2006). Machine learning: A review of classification and combining techniques. Artificial Intelligence Review, 26:159–190. DOI: 10.1007/s10462-007-9052-3.

McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., and Nieto, O. (2015). librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference, volume 8.

Müller, M. (2015). Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications. Springer, Cham, 1 edition.

Müller, M. and Balke, S. (2018). Short-Time Fourier Transform and Chroma Features. International Audio Laboratories Erlangen.

Oneto, L. (2018). Model selection and error estimation without the agonizing pain. WIREs Data Mining and Knowledge Discovery, 8(4):e1252. DOI: 10.1002/widm.1252.

Quadros Júnior, J. F. S. d. (2019). Música Brasileira. PPGCOM/UFMG, Belo Horizonte.

Ramraj, S., Uzir, N., Sunil, R., and Banerjee, S. (2016). Experimenting xgboost algorithm for prediction and classification of different datasets. International Journal of Control Theory and Applications, 9(40):651–662.

Ribeiro, A. V., Santana, J. M. d. O., and Oliveira, M. A. d. (2025). Sonbra dataset. [link]. Acessado em: 17 de dezembro de 2025.

Sammut, C. and Webb, G. I., editors (2011). Encyclopedia of Machine Learning. Springer, Boston.

Scaringella, N., Zoia, G., and Mlynek, D. (2006). Automatic genre classification of music content: a survey. IEEE Signal Processing Magazine, 23(2):133–141. DOI: 10.1109/MSP.2006.1598089.

Sharma, G., Umapathy, K., and Krishnan, S. (2020). Trends in audio signal feature extraction methods. Applied Acoustics, 161:107201. DOI: 10.1016/j.apacoust.2019.107201.

Shirol, S. and Kathiresan, R. S. (2023). A comprehensive survey of music genre classification using audio files. International Journal of Enhanced Research in Science, Technology & Engineering, 12:183–192.

Silla, Jr., C. N., Kaestner, C. A. A., and Koerich, A. L. (2007). Automatic music genre classification using ensemble of classifiers. In 2007 IEEE International Conference on Systems, Man and Cybernetics, pages 1687–1692. DOI: 10.1109/ICSMC.2007.4414136.

Silla, Jr., C. N., Koerich, A. L., and Kaestner, C. A. A. (2018). The latin music database. In Proceedings of the 9th International Conference on Music Information Retrieval, pages 451–456, Philadelphia. ISMIR. DOI: 10.5281/zenodo.1416282.

Silva, D. and Gomes, C. (2022). Modelo de aprendizado de máquina para classificação de gêneros musicais populares da região amazônica legal internacional. Revista Eletrônica de Iniciação Científica em Computação, 20(4).

Suthaharan, S. (2016). Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning. Springer US.

Tomar, S. (2006). Converting video formats with ffmpeg. Linux Journal, 2006(146):10.

Tzanetakis, G. and Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5):293–302. DOI: 10.1109/TSA.2002.800560.

Viana, L. R. (2010). O funk no brasil: Música desintermediada na cibercultura. Revista Sonora, Unicamp, 3.

Yu, T. and Zhu, H. (2020). Hyper-parameter optimization: A review of algorithms and applications. ArXiv, abs/2003.05689.

Published

2026-01-30

How to Cite

Ribeiro, A. V., Santana, J. M. de O., Oliveira, M. A. de, & Gomes, C. (2026). SONBRA: A Public and Annotated Dataset for Research on Brazilian Music Genre Classification. Electronic Journal of Undergraduate Research on Computing, 24(1), 57–65. https://doi.org/10.5753/reic.2026.7222

Issue

Section

Full Papers