Detecção por face de emoções de aprendizagem: abordagem baseada em redes neurais profundas e fluxo de emoções
DOI:
https://doi.org/10.5753/rbie.2023.2936Keywords:
reconhecimento de emoções, redes neurais profundas, emoções no aprendizadoAbstract
O reconhecimento automático de emoções através da face possui o potencial de tornar a interação com um computador uma experiência mais natural. Em especial nos ambientes inteligentes de aprendizagem, a detecção das emoções beneficia diretamente os estudantes ao usar as suas informações afetivas para perceber suas dificuldades, adaptar a intervenção pedagógica e engajá-lo. Este artigo apresenta um modelo de aprendizado de máquina capaz de reconhecer, por vídeos da face, as emoções engajamento, confusão, frustração e tédio, experimentadas pelos estudantes em seções de interação com ambientes de aprendizagem. O modelo proposto se utiliza de redes neurais profundas para realizar a classificação em uma destas emoções, extraindo características estatísticas, temporais e espaciais dos vídeos fornecidos para treinamento, incluindo movimento dos olhos e movimentos musculares face. O trabalho possui como principal diferencial a consideração do fluxo das emoções como entrada, ou seja, a sequência em que as emoções são manifestas. Diversas configurações de modelos de aprendizado profundo de máquina foram testadas, e suas eficiências comparadas ao estado da arte. Os resultados trazem evidências que considerar a sequência de emoções de aprendizagem dos estudantes como entrada nos modelos melhora a efetividade desses algoritmos. Utilizando o treinamento na base de dados DAiSEE, o ganho de desempenho na métrica F1 foi de 26,27% (de 0,5122 para 0,6468) quando incluído o histórico de emoções no modelo.
Downloads
Referências
Ackermann, P., Kohlschein, C., Bitsch, J. A., Wehrle, K., & Jeschke, S. (2016). Eeg-based automatic emotion recognition: Feature extraction, selection and classification methods. In Int. conf. on e-health networking, applications and services (pp. 1–6). [GS Search]
Baltrusaitis, T., Zadeh, A., Lim, Y. C., & Morency, L.-P. (2018). Openface 2.0. In Ieee int. conf. on automatic face & gesture recognition (pp. 59–66). [GS Search]
Bianco, S., Cadene, R., Celona, L., & Napoletano, P. (2018). Benchmark analysis of representative deep neural network architectures. IEEE access, 6, 64270–64277. [GS Search]
Bradski, G. (2000). The OpenCV Library. Dr. Dobb’s Journal of Software Tools. [GS Search]
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. J. of Artif. Intel. Research, 16, 321–357. [GS Search]
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In IEEE CVPR (pp. 248–255). [GS Search]
Dewan, M. A. A., Lin, F., Wen, D., Murshed, M., & Uddin, Z. (2018). A deep learning approach to detecting engagement of online learners. In IEEE SmartWorld (pp. 1895–1902). [GS Search]
Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2011a). Acted facial expressions in the wild database. Australian National University, Technical Report TR-CS-11, 2, 1. [GS Search]
Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2011b). Static facial expression analysis in tough conditions. In IEEE ICCV Workshops (pp. 2106–2112). [GS Search]
D’Mello, S., & Calvo, R. A. (2013). Beyond the basic emotions: what should affective computing compute? In CHI’13 Extended Abstracts (pp. 2287–2294). [GS Search]
D’Mello, S., & Graesser, A. (2012). Dynamics of affective states during complex learning. Learning and Instruction, 22(2), 145–157. [GS Search]
D’Mello, S., Lehman, B., Pekrun, R., & Graesser, A. (2014). Confusion can be beneficial for learning. Learning and Instruction, 29, 153–170. [GS Search]
Ekman, P. (1992). An argument for basic emotions. Cognition & emotion, 6(3-4), 169–200. [GS Search ]
Ekman, P. (1999). Basic emotions. Handbook of cognition and emotion, 45–60. [GS Search]
Elrahman, S. M. A., & Abraham, A. (2013). A review of class imbalance problem. Journal of Network and Innovative Computing, 1(2013), 332–340. [GS Search]
Fredrickson, B. L. (1998). What good are positive emotions? Review of general psychology, 2(3), 300–319. [GS Search]
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press. [GS Search]
Goodfellow, I., Erhan, D.,Carrier, P. L., Courville, A., Mirza, M., Hamner, B., . . . Lee, D.-H. (2013). Challenges in representation learning: A report on three machine learning contests. In International conference on neural information processing (pp. 117–124). [GS Search]
Goodman, L. A. (1961). Snowball sampling. The annals of mathematical statistics, 148–170. [GS Search]
Gupta, A., D’Cunha, A., Awasthi, K., & Balasubramanian, V. (2016). Daisee: Towards user engagement recognition in the wild. arXiv preprint arXiv:1609.01885. [GS Search]
Gupta, A., Jaiswal, R., Adhikari, S., & Balasubramanian, V. N. (2016). Daisee: Dataset for affective states in e-learning environments. arXiv, 1–22. [GS Search]
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE CVPR (pp. 770–778). [GS Search]
Heckathorn, D. D. (1997). Respondent-driven sampling: a new approach to the study of hidden populations. Social problems, 44(2), 174–199. [GS Search]
Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), 1527–1554. [GS Search]
Hu, W.-S., Li, H.-C., Pan, L., Li, W., Tao, R., & Du, Q. (2020). Spatial–spectral feature extraction via deep convlstm neural networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 58(6), 4237–4250. [GS Search]
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In IEEE CVPR (pp. 4700–4708). [GS Search]
Huang, W., Song, G., Li, M., Hu, W., & Xie, K. (2013). Adaptive weight optimization for classification of imbalanced data. In Int. Conf. on Intelligent Science and Big Data Engineering (pp. 546–553). [GS Search]
Jaques, P. A., Seffrin, H., Rubi, G. L., de Morais, F., Ghilardi, C., Bittencourt, I. I., & Isotani, S. (2013). Rule-based expert systems to support step-by-step guidance in algebraic problem solving: The case of the tutor pat2math.Expert Systems with Applications, 40(14), 5456– 5465. doi: 10.1016/j.eswa.2013.04.004 [GS Search]
Kaggle (2013). Challenges in representation learning: Facial expression recognition challenge. Disponível em [Link]. Acesso em: 17 jul. 2019.
Kaur, A., Mustafa, A., Mehta, L., & Dhall, A. (2018). Prediction and localization of student engagement in the wild. In 2018 digital image computing: Techniques and applications (dicta) (pp. 1–8). [GS Search]
King, D. E. (2009). Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, 10(Jul), 1755–1758.[GS Search]
Krawczyk, B. (2016). Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221–232. [GS Search]
Kreifelts, B., Wildgruber, D., & Ethofer, T. (2013). Audiovisual integration of emotional information from voice and face. In Integrating face and voice in person perception (pp. 225–251). Springer. [GS Search]
Lai, M.-L., Tsai, M.-J., Yang, F.-Y., Hsu, C.-Y., Liu, T.-C., Lee, S. W.-Y., . . . Tsai, C.-C. (2013). A review of using eye-tracking technology in exploring learning from 2000 to 2012. Educational research review, 10, 90–115. [GS Search]
Lazarus, R. S. (1982). Thoughts on the relations between emotion and cognition. American psychologist, 37(9), 1019.[GS Search]
Lea, C., Flynn, M. D., Vidal, R., Reiter, A., & Hager, G. D. (2017). Temporal convolutional networks for action segmentation and detection. In IEEE CVPR (pp. 156–165). [GS Search]
Li, S., & Deng, W. (2018). Deep facial expression recognition: A survey. arXiv preprint arXiv:1804.08348. [GS Search]
Liu, C., Tang, T., Lv, K., & Wang, M. (2018). Multi-feature based emotion recognition for video clips. In Int. Conf. on Multimodal Interaction (pp. 630–634). [GS Search]
Longadge, R., & Dongre, S. (2013). Class imbalance problem in data mining review. arXiv preprint arXiv:1305.1707.[GS Search]
Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In IEEE CVPR-Workshops (pp. 94–101). [GS Search]
Marcus, G. (2018). Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631. [GS Search]
Morais, F., & Jaques, P. A. (2022). Dinâmica de afetos em um sistema tutor inteligente de matemática no contexto brasileiro: uma análise da transição de emoções acadêmicas. Revista Brasileira de Informática na Educação, 30, 519-541. doi: 10.5753/rbie.2022.2577 [GS Search]
Morais, F., Kautzmann, T. R., Bittencourt, I. I., & Jaques, P. (2019). Emap-ml: A protocol of emotions and behaviors annotation for machine learning labels. In EC-TEL. [GS Search]
Nardelli, M., Valenza, G., Greco, A., Lanata, A., & Scilingo, E. P. (2015). Recognizing emotions induced by affective sounds through heart rate variability. IEEE Transactions on Affective Computing, 6(4), 385–394. [GS Search]
Nezami, O. M., Dras, M., Hamey, L., Richards, D., Wan, S., & Paris, C. (2018). Automatic recognition of student engagement using deep learning and facial expression. arXiv preprint arXiv:1808.02324. [GS Search]
Ocumpaugh, J. (2015). Baker rodrigo ocumpaugh monitoring protocol (bromp) 2.0 technical and training manual. NewYork, NY and Manila, Philippines: Teachers College, Columbia University and Ateneo Laboratory for the Learning Sciences, 60. [GS Search]
Pekrun, R. (2011). Emotions as drivers of learning and cognitive development. In New perspectives on affect and learning technologies (pp. 23–39). Springer. [GS Search]
Reis, H., Alvares, D., Jaques, P., & Isotani, S. (2018). Analysis of permanence time in emotional states: A case study using educational software. In ITS (pp. 180–190). [GS Search]
Reis, H., Jaques, P., & Isotani, S. (2018, 03). Sistemas tutores inteligentes que detectam as emoções dos estudantes: um mapeamento sistemático. Revista Brasileira de Informática na Educação, 26, 76-107. doi: 10.5753/rbie.2018.26.03.76 [GS Search]
Sariyanidi, E., Gunes, H., & Cavallaro, A. (2014). Automatic analysis of facial affect: A survey of registration,representation, and recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence, 37(6), 1113–1133.[GS Search]
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. [GS Search]
Subramanian, R., Wache, J., Abadi, M. K., Vieriu, R. L., Winkler, S., & Sebe, N. (2016). Ascertain: Emotion and personality recognition using commercial sensors. IEEE Transactions on Affective Computing, 9(2), 147–160. [GS Search]
Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI Conf. on Artif. Intellig. [GS Search]
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In IEEE CVPR (pp. 2818–2826). [GS Search]
Thomas, C., Nair, N., & Jayagopi, D. B. (2018). Predicting engagement intensity in the wild using temporal convolutional network. In Int. Conf. on Multimodal Interaction (pp. 604– 610). [GS Search]
Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3d convolutional networks. In IEEE CV (pp. 4489–4497). [GS Search]
Wang, S., Liu, W., Wu, J., Cao, L., Meng, Q., & Kennedy, P. J. (2016). Training deep neural networks on imbalanced data sets. In IJCNN (pp. 4368–4374). [GS Search]
Werlang, P. (2022). Github: werlang/emolearn-ml-model. Disponível em [Link]. Acesso em: 3 out. 2019.
Whitehill, J., Serpell, Z., Lin, Y.-C., Foster, A., & Movellan, J. R. (2014). The faces of engagement expressions.IEEE Transactions on Affective Computing, 5(1), 86–98. [GS Search]
Yang, J., Wang, K., Peng, X., & Qiao, Y. (2018). Deep recurrent multi-instance learning with spatio-temporal features for engagement intensity prediction. In Int. Conf. on Multimodal Interaction (pp. 594–598). [GS Search]
Arquivos adicionais
Published
Como Citar
Issue
Section
Licença
Copyright (c) 2023 Pablo Werlang, Patrícia Augustin Jaques
Este trabalho está licenciado sob uma licença Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.