SentiLexBR: An Automatic Methodology of Building Sentiment Lexicons for the Portuguese Language
DOI:
https://doi.org/10.5753/jidm.2022.2504Keywords:
Natural Language Processing, Portuguese Language, Sentiment Analysis, Sentiment LexiconAbstract
User reviews are readily available on the Web and widely used for sentiment analysis tasks. Sentiment lexicons plays an important role in sentiment analysis, where each sentiment word is given a sentiment label (positive or negative) or score (1 or -1). However, a sentiment lexicon may express different sentiment polarity according different domain. In addition, only a few studies on Portuguese sentiment analysis are reported due to the lack of resources including domain-specific sentiment lexical corpora. In this paper, we present an effective methodology, called SentiLexBR, using probabilities of the Bayes’ Theorem for building a set of sentiment lexicons. An unsupervised algorithm is proposed to automatically identify sentiment lexicons with their polarities for the Portuguese language. Experimental results on user reviews datasets in 12 different domains indicate the effectiveness of our methodology in domain-specific sentiment lexicon generation for Portuguese. In addition, the sentiment lexicon produced by SentiLexBR also significantly outperforms several alternative approaches of building domain-specific sentiment lexicons.
Downloads
References
Ahire, S. A survey of sentiment lexicons. Computer Science and Engineering IIT Bombay, Bombay, 2014.
Almatarneh, S. and Gamallo, P. A lexicon based method to search for extreme opinions. PLOS ONE 13 (5): 1–19, 05, 2018.
Amora, P. R. P., Teixeira, E. M., Lima, M. I. V., Amaral, G. M., Cardozo, J. R. A., and de Castro Machado, J. An analysis of machine learning techniques to prioritize customer service through social networks. Journal of Information and Data Management 9 (2): 135–135, 2018.
Birjali, M., Kasri, M., and Beni-Hssane, A. A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowledge-Based Systems, 2021.
Bos, T. and Frasincar, F. Automatically building financial sentiment lexicons while accounting for negation. Cognitive Computation, 2021.
Catelli, R., Pelosi, S., and Esposito, M. Lexicon-based vs. bert-based sentiment analysis: A comparative study in italian. Electronics 11 (3): 374, 2022.
Chaturvedi, I., Cambria, E., Welsch, R. E., and Herrera, F. Distinguishing between facts and opinions for sentiment analysis: Survey and challenges. Information Fusion vol. 44, pp. 65–77, 2018.
de Melo, T. Sentiprodbr: Building domain-specific sentiment lexicons for the portuguese language. In Anais do XXXVI Simpósio Brasileiro de Bancos de Dados. SBC, pp. 349–354, 2021.
de Melo, T., da Silva, A. S., de Moura, E. S., and Calado, P. Opinionlink: Leveraging user opinions for product catalog enrichment. Information Processing & Management 56 (3): 823–843, 2019.
Deng, S., Sinha, A. P., and Zhao, H. Adapting sentiment lexicons to domain-specific social media texts. Decision Support Systems vol. 94, pp. 65–76, 2017.
Freitas, C. Sobre a construção de um léxico da afetividade para o processamento computacional do português. Revista Brasileira de Linguística 13 (4): 1031–1059, 2013.
Huang, M., Xie, H., Rao, Y., Feng, J., and Wang, F. L. Sentiment strength detection with a context-dependent lexicon-based convolutional neural network. Information Sciences vol. 520, pp. 389–399, 2020.
Hutto, C. and Gilbert, E. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the International AAAI Conference on Web and Social Media. Vol. 8, 2014.
Labille, K., Alfarhood, S., and Gauch, S. Estimating sentiment via probability and information theory. KDIR vol. 2016, pp. 121–129, 2016.
Labille, K., Gauch, S., and Alfarhood, S. Creating domain-specific sentiment lexicons via text mining. In Workshop Issues Sentiment Discovery Opinion Mining. pp. 1–8, 2017.
Nusko, B., Tahmasebi, N., and Mogren, O. Building a sentiment lexicon for swedish. In Digital Humanities 2016. From Digitization to Knowledge 2016: Resources and Methods for Semantic Processing of Digital Works/Texts, Proceedings of the Workshop, July 11, 2016, Krakow, Poland. Number 126. Linköping University Electronic Press, pp. 32–37, 2016.
Oliveira, M. d. and Melo, T. d. An empirical study of text features for identifying subjective sentences in portuguese. In Brazilian Conference on Intelligent Systems. Springer, pp. 374–388, 2021.
Park, S.-M., Na, C.-W., Choi, M.-S., Lee, D.-H., and On, B.-W. Knu korean sentiment lexicon: Bi-lstm-based method for building a korean sentiment lexicon. Journal of Intelligence and Information Systems 24 (4): 219–240, 2018.
Pereira, D. A. A survey of sentiment analysis in the portuguese language. Artificial Intelligence Review 54 (2): 1087–1115, 2021.
Souza, M. and Vieira, R. Construction of a portuguese opinion lexicon from multiple resources. Simpósio Brasileiro de TI e da Linguagem Humana, 2011.
Thelwall, M. Heart and soul: Sentiment strength detection in the social web with sentistrength, 2017. Cyberemotions: Collective emotions in cyberspace, 2014.
Vilares, D., Peng, H., Satapathy, R., and Cambria, E. Babelsenticnet: a commonsense reasoning framework for multilingual sentiment analysis. In 2018 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, pp. 1292–1298, 2018.
Xiang, R., Jiao, Y., and Lu, Q. Sentiment augmented attention network for cantonese restaurant review analysis. In Proceedings of WISDOM’19: Workshop on Issues of Sentiment Discovery and Opinion Mining (WISDOM’19), 2019.
Yang, L., Zhai, J., Liu, W., Ji, X., Bai, H., Liu, G., and Dai, Y. Detecting word-based algorithmically generated domains using semantic analysis. Symmetry 11 (2): 176, 2019.
Zhang, S., Wei, Z., Wang, Y., and Liao, T. Sentiment analysis of chinese micro-blog text based on extended sentiment dictionary. Future Generation Computer Systems vol. 81, pp. 395–403, 2018.