Detection by face of learning emotions: an approach based on deep neural networks and on the emotions flow
DOI:
https://doi.org/10.5753/rbie.2023.2936Keywords:
emotion recognition, deep learning, learning emotionsAbstract
Automatic face recognition of emotions has the potential of turning the human-computer interaction an increasingly natural experience. Especially in intelligent learning environments, emotion detection benefits the students by directly using their affective information to perceive their difficulties, adapt the pedagogic intervention and engage them. The present article presents a model capable of recognizing by face the emotions commonly experienced by students in interaction sections with learning environments: engagement, confusion, frustration, and boredom. The proposed model uses deep neural networks to classify one of these emotions, extracting statistical, temporal, and spatial features from the videos provided for training, including eye and facial movements. This work’s main contribution is to take into account the flow of emotions (the sequence of emotions in the order that they are experienced by a student) as a mean for increasing emotion detection accuracy. We tested several model configurations and their efficiency compared to the state of art models. Results show that taking into account the learning emotions sequence as models’ input improves those algorithms’ effectiveness. Training the model on the DAiSEE dataset, we achieved 26.27% F1 improvement (from 0.5122 to 0.6468) when including the emotions’ history in the model.
Downloads
References
Ackermann, P., Kohlschein, C., Bitsch, J. A., Wehrle, K., & Jeschke, S. (2016). Eeg-based automatic emotion recognition: Feature extraction, selection and classification methods. In Int. conf. on e-health networking, applications and services (pp. 1–6). [GS Search]
Baltrusaitis, T., Zadeh, A., Lim, Y. C., & Morency, L.-P. (2018). Openface 2.0. In Ieee int. conf. on automatic face & gesture recognition (pp. 59–66). [GS Search]
Bianco, S., Cadene, R., Celona, L., & Napoletano, P. (2018). Benchmark analysis of representative deep neural network architectures. IEEE access, 6, 64270–64277. [GS Search]
Bradski, G. (2000). The OpenCV Library. Dr. Dobb’s Journal of Software Tools. [GS Search]
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. J. of Artif. Intel. Research, 16, 321–357. [GS Search]
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In IEEE CVPR (pp. 248–255). [GS Search]
Dewan, M. A. A., Lin, F., Wen, D., Murshed, M., & Uddin, Z. (2018). A deep learning approach to detecting engagement of online learners. In IEEE SmartWorld (pp. 1895–1902). [GS Search]
Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2011a). Acted facial expressions in the wild database. Australian National University, Technical Report TR-CS-11, 2, 1. [GS Search]
Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2011b). Static facial expression analysis in tough conditions. In IEEE ICCV Workshops (pp. 2106–2112). [GS Search]
D’Mello, S., & Calvo, R. A. (2013). Beyond the basic emotions: what should affective computing compute? In CHI’13 Extended Abstracts (pp. 2287–2294). [GS Search]
D’Mello, S., & Graesser, A. (2012). Dynamics of affective states during complex learning. Learning and Instruction, 22(2), 145–157. [GS Search]
D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A. (2014). Confusion can be beneficial for learning. Learning and Instruction, 29, 153–170. [GS Search]
Ekman, P. (1992). An argument for basic emotions. Cognition & emotion, 6(3-4), 169–200. [GS Search ]
Ekman, P. (1999). Basic emotions. Handbook of cognition and emotion, 45–60. [GS Search]
Elrahman, S. M. A., & Abraham, A. (2013). A review of class imbalance problem. Journal of Network and Innovative Computing, 1(2013), 332–340. [GS Search]
Fredrickson, B. L. (1998). What good are positive emotions? Review of general psychology, 2(3), 300–319. [GS Search]
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press. [GS Search]
Goodfellow, I., Erhan, D.,Carrier, P. L., Courville, A., Mirza, M., Hamner, B., . . . Lee, D.-H. (2013). Challenges in representation learning: A report on three machine learning contests. In International conference on neural information processing (pp. 117–124). [GS Search]
Goodman, L. A. (1961). Snowball sampling. The annals of mathematical statistics, 148–170. [GS Search]
Gupta, A., D’Cunha, A., Awasthi, K., & Balasubramanian, V. (2016). Daisee: Towards user engagement recognition in the wild. arXiv preprint arXiv:1609.01885. [GS Search]
Gupta, A., Jaiswal, R., Adhikari, S., & Balasubramanian, V. N. (2016). Daisee: Dataset for affective states in e-learning environments. arXiv, 1–22. [GS Search]
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE CVPR (pp. 770–778). [GS Search]
Heckathorn, D. D. (1997). Respondent-driven sampling: a new approach to the study of hidden populations. Social problems, 44(2), 174–199. [GS Search]
Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), 1527–1554. [GS Search]
Hu, W.-S., Li, H.-C., Pan, L., Li, W., Tao, R., & Du, Q. (2020). Spatial–spectral feature extraction via deep convlstm neural networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 58(6), 4237–4250. [GS Search]
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In IEEE CVPR (pp. 4700–4708). [GS Search]
Huang, W., Song, G., Li, M., Hu, W., & Xie, K. (2013). Adaptive weight optimization for classification of imbalanced data. In Int. Conf. on Intelligent Science and Big Data Engineering (pp. 546–553). [GS Search]
Jaques, P. A., Seffrin, H., Rubi, G. L., de Morais, F., Ghilardi, C., Bittencourt, I. I., & Isotani, S. (2013). Rule-based expert systems to support step-by-step guidance in algebraic problem solving: The case of the tutor pat2math.Expert Systems with Applications, 40(14), 5456– 5465. doi: 10.1016/j.eswa.2013.04.004 [GS Search]
Kaggle (2013). Challenges in representation learning: Facial expression recognition challenge. Disponível em [Link]. Acesso em: 17 jul. 2019.
Kaur, A., Mustafa, A., Mehta, L., & Dhall, A. (2018). Prediction and localization of student engagement in the wild. In 2018 digital image computing: Techniques and applications (dicta) (pp. 1–8). [GS Search]
King, D. E. (2009). Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, 10(Jul), 1755–1758.[GS Search]
Krawczyk, B. (2016). Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221–232. [GS Search]
Kreifelts, B., Wildgruber, D., & Ethofer, T. (2013). Audiovisual integration of emotional information from voice and face. In Integrating face and voice in person perception (pp. 225–251). Springer. [GS Search]
Lai, M.-L., Tsai, M.-J., Yang, F.-Y., Hsu, C.-Y., Liu, T.-C., Lee, S. W.-Y., . . . Tsai, C.-C. (2013). A review of using eye-tracking technology in exploring learning from 2000 to 2012. Educational research review, 10, 90–115. [GS Search]
Lazarus, R. S. (1982). Thoughts on the relations between emotion and cognition. American psychologist, 37(9), 1019.[GS Search]
Lea, C., Flynn, M. D., Vidal, R., Reiter, A., & Hager, G. D. (2017). Temporal convolutional networks for action segmentation and detection. In IEEE CVPR (pp. 156–165). [GS Search]
Li, S., & Deng, W. (2018). Deep facial expression recognition: A survey. arXiv preprint arXiv:1804.08348. [GS Search]
Liu, C., Tang, T., Lv, K., & Wang, M. (2018). Multi-feature based emotion recognition for video clips. In Int. Conf. on Multimodal Interaction (pp. 630–634). [GS Search]
Longadge, R., & Dongre, S. (2013). Class imbalance problem in data mining review. arXiv preprint arXiv:1305.1707.[GS Search]
Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In IEEE CVPR-Workshops (pp. 94–101). [GS Search]
Marcus, G. (2018). Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631. [GS Search]
Morais, F., & Jaques, P. A. (2022). Dinâmica de afetos em um sistema tutor inteligente de matemática no contexto brasileiro: uma análise da transição de emoções acadêmicas. Revista Brasileira de Informática na Educação, 30, 519-541. doi: 10.5753/rbie.2022.2577 [GS Search]
Morais, F., Kautzmann, T. R., Bittencourt, I. I., & Jaques, P. (2019). Emap-ml: A protocol of emotions and behaviors annotation for machine learning labels. In EC-TEL. [GS Search]
Nardelli, M., Valenza, G., Greco, A., Lanata, A., & Scilingo, E. P. (2015). Recognizing emotions induced by affective sounds through heart rate variability. IEEE Transactions on Affective Computing, 6(4), 385–394. [GS Search]
Nezami, O. M., Dras, M., Hamey, L., Richards, D., Wan, S., & Paris, C. (2018). Automatic recognition of student engagement using deep learning and facial expression. arXiv preprint arXiv:1808.02324. [GS Search]
Ocumpaugh, J. (2015). Baker rodrigo ocumpaugh monitoring protocol (bromp) 2.0 technical and training manual. NewYork, NY and Manila, Philippines: Teachers College, Columbia University and Ateneo Laboratory for the Learning Sciences, 60. [GS Search]
Pekrun, R. (2011). Emotions as drivers of learning and cognitive development. In New perspectives on affect and learning technologies (pp. 23–39). Springer. [GS Search]
Reis, H., Alvares, D., Jaques, P., & Isotani, S. (2018). Analysis of permanence time in emotional states: A case study using educational software. In ITS (pp. 180–190). [GS Search]
Reis, H., Jaques, P., & Isotani, S. (2018, 03). Sistemas tutores inteligentes que detectam as emoções dos estudantes: um mapeamento sistemático. Revista Brasileira de Informática na Educação, 26, 76-107. doi: 10.5753/rbie.2018.26.03.76 [GS Search]
Sariyanidi, E., Gunes, H., & Cavallaro, A. (2014). Automatic analysis of facial affect: A survey of registration,representation, and recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence, 37(6), 1113–1133.[GS Search]
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. [GS Search]
Subramanian, R., Wache, J., Abadi, M. K., Vieriu, R. L., Winkler, S., & Sebe, N. (2016). Ascertain: Emotion and personality recognition using commercial sensors. IEEE Transactions on Affective Computing, 9(2), 147–160. [GS Search]
Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI Conf. on Artif. Intellig. [GS Search]
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In IEEE CVPR (pp. 2818–2826). [GS Search]
Thomas, C., Nair, N., & Jayagopi, D. B. (2018). Predicting engagement intensity in the wild using temporal convolutional network. In Int. Conf. on Multimodal Interaction (pp. 604– 610). [GS Search]
Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3d convolutional networks. In IEEE CV (pp. 4489–4497). [GS Search]
Wang, S., Liu, W., Wu, J., Cao, L., Meng, Q., & Kennedy, P. J. (2016). Training deep neural networks on imbalanced data sets. In IJCNN (pp. 4368–4374). [GS Search]
Werlang, P. (2022). Github: werlang/emolearn-ml-model. Disponível em [Link]. Acesso em: 3 out. 2019.
Whitehill, J., Serpell, Z., Lin, Y.-C., Foster, A., & Movellan, J. R. (2014). The faces of engagement expressions.IEEE Transactions on Affective Computing, 5(1), 86–98. [GS Search]
Yang, J., Wang, K., Peng, X., & Qiao, Y. (2018). Deep recurrent multi-instance learning with spatio-temporal features for engagement intensity prediction. In Int. Conf. on Multimodal Interaction (pp. 594–598). [GS Search]
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Pablo Werlang, Patrícia Augustin Jaques
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.