Song Emotion Recognition: a Performance Comparison Between Audio Features and Artificial Neural Networks
Keywords:
Deep learning, Neural networks, Emotion recognition, Digital signal processing, Music Information RetrievalAbstract
When songs are composed or performed, there is often an intent by the singer/songwriter of expressing feelings or emotions through it. For humans, matching the emotiveness in a musical composition or performance with the subjective perceptiveness of an audience can be quite challenging. Fortunately, the machine learning approach for this problem is simpler. Usually, it takes a data-set, from which audio features are extracted to present this information to a data-driven model, that will, in turn, train predicting the highest probability of an input song matching a target emotion. In this paper, we studied the most common features and models used in recent publications to tackle this problem, revealing which ones are best suited for songs a cappella.
Downloads
References
Atmaja, B. T. and Akagi, M. (2020). On the differences between song and speech emotion recognition: Effect of feature sets, feature types, and classifiers. In 2020 IEEE REGION 10 CONFERENCE (TENCON), pages 968–972.
Casper, L. (2020). Creating a speech and music emotion recognition system for mixed source audio. Master’s thesis, Universiteit Utrecht.
Cunningham, S., Ridley, H., Weinel, J., and Picking, R. (2020). Supervised machine learning for audio emotion recognition. Personal and Ubiquitous Computing.
Cunningham, S., Weinel, J., and Picking, R. (2018). High level analysis of audio features for identifying emotional valence in human singing. In: Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion, pages 1–4.
Santos, A., Rosero Jácome, K., and Masiero, B. (2021). Song Emotion Recognition: A Study of the State of the Art. In Anais do XVIII Simpósio Brasileiro de Computação Musical, (pp. 209-212). Porto Alegre: SBC. doi:10.5753/sbcm.2021.19449
Du, P., Li, X., and Gao, Y. (2020). Dynamic music emotion recognition based on cnn-bilstm. In 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC), pages 1372–1376.
Flamia Azevedo, B. and Bressan, G. (2018). A comparison of classifiers for musical genres classification and music emotion recognition. Advances in Mathematical Sciences and Applications, pages 241–262.
Gao, Z., Qiu, L., Qi, P., and Sun, Y. (2020). A novel music emotion recognition model for scratch-generated music. In 2020 International Wireless Communications and Mobile Computing (IWCMC), pages 1794–1799.
Kim, W. (2020). Musemo: Express musical emotion based on neural network. Master’s thesis, Ulsan National Institute of Science and Technology.
Kim, Y. E., Schmidt, E. M., Migneco, R., Morton, B. G., Richardson, P., Scott, J., Speck, J. A., and Turnbull, D. (2010). Music emotion recognition: A state of the art review. In Proc. ismir, volume 86, pages 937–952.
Livingstone, S. R. and Russo, F. A. (2018). The Ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english. PLoS ONE, 13(5).
Ospitia Medina, Y., Beltr´an Bl´azquez, J. R., and Baldassarri, S. (2020). Emotional classification of music using neural networks with the mediaeval dataset. Personal and Ubiquitous Computing.
Panda, R., Malheiro, R. M., and Paiva, R. P. (2020). Audio features for music emotion recognition: a survey. IEEE Transactions on Affective Computing, pages 1–1.
Pandrea, A. G., Gómez-Cañon, J. S., and Herrera, P. (2020). Cross-Dataset Music Emotion Recognition: an End-to-End Approach.
Priore, I. and Stover, C. (2014). The subversive songs of bossa nova: Tom jobim in the era of censorship. Analytical Approaches to World Music, 3(2):1–32.
Rajesh, S. and Nalini, N. J. (2020). Musical instrument emotion recognition using deep recurrent neural network. Procedia Computer Science, 167:16–25. International Conference on Computational Intelligence and Data Science.
Russo, M., Kraljevi´c, L., Stella, M., and Sikora, M. (2020). Cochleogram-based approach for detecting perceived emotions in music. Information Processing & Management, 57(5):102270.
Yadav, A. and Vishwakarma, D. K. (2020). A multilingual framework of cnn and bilstm for emotion classification. In 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pages 1–6.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Os Autores
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.