Intelligent Emotion Tracking System VIRE: Evaluation of Neural Network Architectures in Facial Emotion Recognition
DOI:
https://doi.org/10.5753/jbcs.2026.5370Keywords:
Convolutional Neural Networks, Emotion Recognition, Artificial Intelligence, Deep NetworksAbstract
This work proposes an emotional monitoring system called Visual Identification of Recognition of Emotions (VIRE), based on convolutional neural networks (CNNs) to analyze facial expressions. Using the six basic emotions proposed by Paul Ekman as a reference, which can be identified from the composition of various facial muscle states, VIRE aims to assist in the diagnosis of mental health conditions. While emotional expressions are communicated in various ways, this research focuses primarily on facial expressions due to their expressiveness resulting from the mobility of facial muscles. The methodology involved collecting data from the FER2013 dataset, preprocessing the images, hyperparameter tuning, and training three different architectures: AlexNet, DenseNet, and a custom CNN. The research will classify expressions into basic emotions and evaluate the models' performance in terms of accuracy and other metrics. VIRE has demonstrated potential, achieving an accuracy of about 60%, although improvements are needed for practical application. The ultimate goal is to create a tool that integrates technology and health, facilitating the identification of emotional states that may indicate mental health issues, thereby contributing to more accurate and effective diagnoses.
Downloads
References
Albraikan, A. A., Alzahrani, J. S., Alshahrani, R., Yafoz, A., Alsini, R., Hilal, A. M., Alkhayyat, A., and Gupta, D. (2022). Intelligent facial expression recognition and classification using optimal deep transfer learning model. Image and Vision Computing, 128:104583. DOI: 10.1016/j.imavis.2022.104583.
American Psychiatric Association (2014). DSM-5: Manual diagnóstico e estatístico de transtornos mentais. Artmed Editora. Book.
Attrah, S. (2025). Emotion estimation from video footage with lstm. DOI: 10.48550/ARXIV.2501.13432.
Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. J. Mach. Learn. Res., 13(10):281–305. Available at:[link].
Bergstra, J., Komer, B., Eliasmith, C., Yamins, D., and Cox, D. D. (2015). Hyperopt: a python library for model selection and hyperparameter optimization. Computational Science & Discovery, 8(1):014008. DOI: 10.1088/1749-4699/8/1/014008.
Bishop, C. (2016). Pattern Recognition and Machine Learning. Information Science and Statistics. Springer New York. DOI: 10.1117/1.2819119.
Castro, C. L. d. and Braga, A. P. (2011). Aprendizado supervisionado com conjuntos de dados desbalanceados. Sba: Controle & Automação Sociedade Brasileira de Automatica, 22(5):441–466. DOI: 10.1590/s0103-17592011000500002.
Chollet, F. (2018). Deep Learning with Python. Manning Publications, Shelter Island. Book.
Dalgalarrondo, P. (2019). Psicopatologia e semiologia dos transtornos mentais. Artmed, Porto Alegre, 3 edition. Book.
Dumitru, I., Goodfellow, W., Cukierski, Y., and Bengio (2013). Challenges in representation learning: facial expression recognition challenge. Available at:[link].
Ekman, P. (2003). Emotions Revealed. Times Books, New York. DOI: 10.1136/sbmj.0405184.
Ekman, P. and Friesen, W. V. (1971). Constants across cultures in the face and emotion. Journal of Personality and Social Psychology, 17(2):124-129. DOI: 10.1037/h0030377.
Filho, G. P. R., Meneguette, R. I., Mendonça, F. L. L. d., Enamoto, L., Pessin, G., and Gonçalves, V. P. (2024). Toward an emotion efficient architecture based on the sound spectrum from the voice of portuguese speakers. Neural Computing and Applications, 36(32):19939-19950. DOI: 10.1007/s00521-024-10249-4.
Frenay, B. and Verleysen, M. (2014). Classification in the presence of label noise: A survey. IEEE Transactions on Neural Networks and Learning Systems, 25(5):845–869. DOI: 10.1109/tnnls.2013.2292894.
Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O'Reilly Media. Available at:[link].
Gonzalez, R. and Woods, R. (2008). Digital Image Processing. Prentice Hall. DOI: 10.2307/1574313.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. Adaptive Computation and Machine Learning series. MIT Press. DOI: 10.1038/nature14539.
Huang, G., Liu, Z., van der Maaten, L., and Weinberger, K. Q. (2017). Densely Connected Convolutional Networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261-2269, Honolulu, HI, USA. IEEE Computer Society. DOI: 10.1109/cvpr.2017.243.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1, NIPS'12, page 1097–1105, Red Hook, NY, USA. Curran Associates Inc.. DOI: 10.1145/3065386.
Lonkar, S. (2021). Facial expressions recognition with convolutional neural networks. DOI: 10.48550/ARXIV.2107.08640.
Mariano, D. (2021). Métricas de avaliação em machine learning: acurácia, sensibilidade, precisão, especificidade e F-score. Alfahelix. DOI: 10.51780/978-6-599-275326-15.
Minaee, S., Minaei, M., and Abdolrashidi, A. (2021). Deep-emotion: Facial expression recognition using attentional convolutional network. Sensors, 21(9):3046. DOI: 10.3390/s21093046.
Oguine, O. C., Oguine, K. J., Bisallah, H. I., and Ofuani, D. (2022). Hybrid facial expression recognition (fer2013) model for real-time emotion classification and prediction. DOI: 10.48550/ARXIV.2206.09509.
Qu, D., Dhakal, S., and Carrillo, D. (2023). Facial emotion recognition using cnn in pytorch. DOI: 10.48550/ARXIV.2312.10818.
Solomon, A. (2014). O demônio do meio-dia: uma anatomia da depressão. Companhia das Letras, São Paulo, 2 edition. Tradução de Myriam Campello.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2015). Rethinking the inception architecture for computer vision. DOI: 10.48550/ARXIV.1512.00567.
Zhang, Z., Lin, W., Liu, M., and Mahmoud, M. (2020). Multimodal deep learning framework for mental disorder recognition. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pages 344-350. DOI: 10.1109/FG47880.2020.00033.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Nathan Ferraz da Silva, Geraldo Pereira Rocha Filho, Roger Immich, Rodolfo Ipolito Meneguette, Vinícius Pereira Gonçalves

This work is licensed under a Creative Commons Attribution 4.0 International License.

