Mining Comparative Opinions in Portuguese: A Lexicon-based Approach
DOI:
https://doi.org/10.5753/jbcs.2024.2830Keywords:
Opinion Mining, Sentiment Analysis, Comparative Opinions, Preference DetectionAbstract
The constant expansion of e-commerce, recently boosted due to the coronavirus pandemic, has led to a massive increase in online shopping, made by increasingly demanding customers, who seek comments and reviews on the Web to assist in decision-making regarding the purchase of products. In these reviews, part of the opinions found are comparisons, which contrast aspects expressing a preference for an object over others. However, this information is neglected by traditional sentiment analysis techniques and it is not applicable for comparisons, since they do not directly express positive or negative sentiment. In this context, despite efforts in the English language, almost no studies have been done to develop appropriate solutions that allow the analysis of comparisons in the Portuguese language. This work presented one of the first studies on comparative opinion in Portuguese. Four main contributions are (1) A hierarchical approach for detecting comparative opinions, which consists of an initial binary step, which subdivides the regular opinions from the comparatives, to further categorize the comparatives into the five opinion groups: (1) Non-Comparative; (2) Non-Equal Gradable; (3) Equative, (4) Superlative; and (5) Non-Gradable. The results are promising, reaching 87% of Macro-F1 and 0.94 of AUC (Compute Area Under the Curve) for the binary step, and 61% of Macro-F1 in multiple classes; (2) An lexicon algorithm to detect the entity expressed as preferred in comparative sentences, reaching 94% of Macro-F1 for Superlative; (3) Two new datasets with approximately 5,000 comparative and non-comparative sentences in Portuguese; and (4) a lexicon with words and expressions frequently used to make comparisons in the Portuguese language.
Downloads
References
Alaei, A. R., Becken, S., and Stantic, B. (2019). Sentiment analysis in tourism: capitalizing on big data. Journal of Travel Research, 58(2):175-191. DOI: 10.1177/0047287517747753.
Araujo, M., Diniz, J. P., Bastos, L., Soares, E., Ferreira, M., Ribeiro, F., and Benevenuto, F. (2016). ifeel 2.0: A multilingual benchmarking system for sentence-level sentiment analysis. In Tenth International AAAI Conference on Web and Social Media. DOI: 10.1609/icwsm.v10i1.14705.
Asevedo Nóbrega, F. A. and Salgueiro Pardo, T. A. (2018). Update summarization: building from scratch for portuguese and comparing to english. Journal of the Brazilian Computer Society, 24(1):1-12. DOI: 10.1186/s13173-018-0075-1.
Bach, N. X., Van, P. D., Tai, N. D., and Phuong, T. M. (2015). Mining vietnamese comparative sentences for sentiment analysis. In Proc. of the KSE, pages 162-167. DOI: 10.1109/KSE.2015.36.
Baeza-Yates, R., Ribeiro-Neto, B., et al. (1999). Modern information retrieval, volume 463. ACM press New York. Book.
Balage Filho, P., Pardo, T. A. S., and Aluísio, S. (2013). An evaluation of the brazilian portuguese liwc dictionary for sentiment analysis. In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology (STIL), pages 215–-219. Available online [link].
Basiri, M. E., Nemati, S., Abdar, M., Cambria, E., and Acharya, U. R. (2021). Abcdm: An attention-based bidirectional cnn-rnn deep model for sentiment analysis. Future Generation Computer Systems, 115:279-294. DOI: 10.1016/j.future.2020.08.005.
Berthene, A. (2022). Coronavirus pandemic adds $219 billion to us ecommerce sales in 2020-2021. Available online [link]. Acessed on April 16, 2022.
Bespalov, D., Bai, B., Qi, Y., and Shokoufandeh, A. (2011). Sentiment classification based on supervised latent n-gram analysis. In Proc. of the CIKM, pages 375-382. DOI: 10.1145/2063576.2063635.
Breiman, L. (2001). Random forests. Machine learning, 45(1):5-32. DOI: 10.1023/A:1010933404324.
Buyya, R., Calheiros, R. N., and Dastjerdi, A. V. (2016). Big data: principles and paradigms. Morgan Kaufmann. Book.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1):37-46. DOI: 10.1177/001316446002000104.
Dadhich, A. and Thankachan, B. (2022). Sentiment analysis of amazon product reviews using hybrid rule-based approach. In Smart Systems: Innovations in Computing, pages 173-193. Springer. DOI: 10.1007/978-981-16-2877-1_17.
de Barros, L. M. M. (2019). A correlação em construções comparativas da língua portuguesa. Anais do IX SAPPIL-Estudos de Linguagem. Available online [link].
de Melo, T., da Silva, A., and de Moura, E. S. (2018). An aspect-driven method for enriching product catalogs with user opinions. Journal of the Brazilian Computer Society, 24(1):1-19. DOI: 10.1186/s13173-018-0080-4.
de O. Carosia, A. E., Coelho, G. P., and Silva, A. E. d. (2019). The influence of tweets and news on the brazilian stock market through sentiment analysis. In Proc. of the WebMedia, pages 385-392. DOI: 10.1145/3323503.3349564.
Ding, X., Liu, B., and Yu, P. S. (2008). A holistic lexicon-based approach to opinion mining. In Proceedings of the 2008 international conference on web search and data mining, pages 231-240. DOI: 10.1145/1341531.1341561.
Ding, X., Liu, B., and Zhang, L. (2009). Entity discovery and assignment for opinion mining applications. In Proc. of the SIGKDD, pages 1125-1134. DOI: 10.1145/1557019.1557141.
D’Addio, R. M., Domingues, M. A., and Manzato, M. G. (2017). Exploiting feature extraction techniques on users’ reviews for movies recommendation. Journal of the Brazilian Computer Society, 23(1):1-16. DOI: 10.1186/s13173-017-0057-8.
El-Halees, A. M. (2012). Opinion mining from arabic comparative sentences. In Proc. of the ACIT, pages 265-271. Available online [link].
Eldefrawi, M. M., Elzanfaly, D. S., Farhan, M. S., and Eldin, A. S. (2019). Sentiment analysis of arabic comparative opinions. SN Applied Sciences, 1(5):411. DOI: 10.1007/s42452-019-0402-y.
Ganapathibhotla, M. and Liu, B. (2008). Mining opinions in comparative sentences. In Proc. of the Coling, pages 241-248. Available online [link].
Gonçalo Oliveira, H. and Gomes, P. (2014). Eco and onto.pt: A flexible approach for creating a portuguese wordnet automatically. Language Resources and Evaluation Journal, 48(2):373-393. DOI: 10.1007/s10579-013-9249-9.
Haque, T. U., Saber, N. N., and Shah, F. M. (2018). Sentiment analysis on large scale amazon product reviews. In 2018 IEEE international conference on innovative research and development (ICIRD), pages 1-6. IEEE. DOI: 10.1109/ICIRD.2018.8376299.
Hartmann, N., Avanço, L., Balage Filho, P. P., Duran, M. S., Nunes, M. D. G. V., Pardo, T. A. S., Aluísio, S. M., et al. (2014). A large corpus of product reviews in portuguese: Tackling out-of-vocabulary words. In Proc. of the LREC, pages 3865-3871. Available online [link].
Huang, X., Wan, X., Yang, J., and Xiao, J. (2008). Learning to identify comparative sentences in chinese text. In Proc. of the PRICAI, pages 187-198. DOI: 10.1007/978-3-540-89197-0_20.
Ibeke, E., Lin, C., Wyner, A., and Barawi, M. H. (2017). Extracting and understanding contrastive opinion through topic relevant sentences. In Proceedings of the International Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 395-400. Available online [link].
Jindal, N. and Liu, B. (2006a). Identifying comparative sentences in text documents. In Proc. of the SIGIR, pages 244-251. DOI: 10.1145/1148170.1148215.
Jindal, N. and Liu, B. (2006b). Mining comparative sentences and relations. In Proc. of the AAAI, number 13311336, page 9. Available online [link].
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Proc. of the European Conference on Machine Learning (ECML), pages 137-142. DOI: 10.1007/BFb0026683.
Kansaon, D., Brandão, M. A., Reis, J. C., Barbosa, M., Matos, B., and Benevenuto, F. (2020). Mining portuguese comparative sentences in online reviews. In Proc. of the WebMedia, pages 333-340. DOI: 10.1145/3428658.3431081.
Kim, H. D. and Zhai, C. (2009). Generating comparative summaries of contradictory opinions in text. In Proceedings of the 18th ACM International Conference on Information and knowledge management (CIKM), pages 385-394. DOI: 10.1145/1645953.1646004.
Kleinbaum, D. G., Dietz, K., Gail, M., Klein, M., and Klein, M. (2002). Logistic regression. A Self-Learning Text. Springer, 3 edition. DOI: 10.1080/00401706.1995.10485899.
Koppe, G., Meyer-Lindenberg, A., and Durstewitz, D. (2021). Deep learning for small and big data in psychiatry. Neuropsychopharmacology, 46(1):176-190. DOI: 10.1038/s41386-020-0767-z.
Liu, B. (2012). Sentiment analysis and opinion mining, volume 5. Morgan & Claypool Publishers. Book.
McCallum, A., Nigam, K., et al. (1998). A comparison of event models for naive bayes text classification. 752(1):41-48. Available online [link].
Mehta, R. P., Sanghvi, M. A., Shah, D. K., and Singh, A. (2020). Sentiment analysis of tweets using supervised learning algorithms. In Proc of the ICTSCI, pages 323-338. DOI: 10.1007/978-981-15-0029-9_26.
Melo, P. F., Dalip, D. H., Junior, M. M., Gonçalves, M. A., and Benevenuto, F. (2019). 10sent: A stable sentiment analysis method based on the combination of off-the-shelf approaches. Journal of the Association for Information Science and Technology, 70(3):242-255. Available online [link].
Nasti, S. J., Asger, M., and Butt, M. A. (2020). Automatic extraction of product information from multiple e-commerce web sites. In Prof. of the ICRIC, pages 739-747. DOI: 10.1007/978-3-030-29407-6_53.
Nilashi, M., Fallahpour, A., Wong, K. Y., and Ghabban, F. (2022). Customer satisfaction analysis and preference prediction in historic sites through electronic word of mouth. Neural Computing and Applications, pages 1-15. DOI: 10.1007/s00521-022-07186-5.
Pathak, A. R., Pandey, M., and Rautaray, S. (2021). Topic-level sentiment analysis of social media data using deep learning. Applied Soft Computing, 108:107440. DOI: 10.1016/j.asoc.2021.107440.
Paul, M. J., Zhai, C., and Girju, R. (2010). Summarizing contrastive viewpoints in opinionated text. In Proceedings of the International Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 66-76. Available online [link].
Pitman, J. (2022). Local consumer review survey 2022. Available online [link]. Acessed on June 15, 2022.
Pompeo, C. (2022). Brazil, argentina, and mexico ranked in the top 10 countries where e-commerce will grow fastest in 2022. Available online [link]. Acessed on April 16, 2022.
Ren, Z. and de Rijke, M. (2015). Summarizing contrastive themes via hierarchical non-parametric processes. In Proceedings of the 38th International Conference on Special Interest Group on Information Retrieval (SIGIR), pages 93-102. DOI: 10.1145/2766462.2767713.
Ribeiro, F. N., Araújo, M., Gonçalves, P., Gonçalves, M. A., and Benevenuto, F. (2016). Sentibench-a benchmark comparison of state-of-the-practice sentiment analysis methods. EPJ Data Science, 5(1):1-29. Available online [link].
Sapir, E. (1944). Grading, a study in semantics. Philosophy of science, 11(2):93-116. Available online [link].
Serrano-Guerrero, J., Olivas, J. A., Romero, F. P., and Herrera-Viedma, E. (2015). Sentiment analysis: A review and comparative analysis of web services. Information Sciences, 311:18-38. DOI: 10.1016/j.ins.2015.03.040.
Singh, M., Jakhar, A. K., and Pandey, S. (2021). Sentiment analysis on the impact of coronavirus in social life using the bert model. Social Network Analysis and Mining, 11(1):1-11. DOI: 10.1007/s13278-021-00737-z.
Souza, E., Costa, D., Castro, D. W., Vitório, D., Teles, I., Almeida, R., Alves, T., Oliveira, A. L., and Gusmão, C. (2017). Characterising text mining: a systematic mapping review of the portuguese language. IET Software, 12(2):49-75. DOI: 10.1049/iet-sen.2016.0226.
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., and Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational linguistics, 37(2):267-307. DOI: 10.1162/COLI_a_00049.
Thelwall, M. and Buckley, K. (2013). Topic-based sentiment analysis for the social web: The role of mood and issue-related words. Journal of the American Society for Information Science and Technology, 64(8):1608-1617. DOI: 10.1002/asi.22872.
Trisna, K. W. and Jie, H. J. (2022). Deep learning approach for aspect-based sentiment classification: A comparative review. Applied Artificial Intelligence, pages 1-37. DOI: 10.1080/08839514.2021.2014186.
Tsai, C.-F., Chen, K., Hu, Y.-H., and Chen, W.-K. (2020). Improving text summarization of online hotel reviews with review helpfulness and sentiment. Tourism Management, 80:104122. DOI: 10.1016/j.tourman.2020.104122.
Wang, Y., Huang, M., Zhu, X., and Zhao, L. (2016). Attention-based lstm for aspect-level sentiment classification. In Proc. of the EMNLP, pages 606-615. Available online [link].
Wei, N., Zhao, S., Liu, J., and Wang, S. (2022). A novel textual data augmentation method for identifying comparative text from user-generated content. Electronic commerce research and applications, 53:101143. DOI: 10.1016/j.elerap.2022.101143.
Xu, K., Liao, S. S., Li, J., and Song, Y. (2011). Mining comparative opinions from customer reviews for competitive intelligence. Decision support systems, 50(4):743-754. DOI: 10.1016/j.dss.2010.08.021.
Xu, S., Zhang, X., Wu, Y., and Wei, F. (2022). Sequence level contrastive learning for text summarization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 11556-11565. DOI: 10.1609/aaai.v36i10.21409.
Yang, S. and Ko, Y. (2009). Extracting comparative sentences from korean text documents using comparative lexical patterns and machine learning techniques. In Proc. of the AACL-IJCNLP, pages 153-156. Available online [link].
Yang, S. and Ko, Y. (2011). Extracting comparative entities and predicates from texts using comparative type classification. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language technologies (ACL-HLT), pages 1636-1644. Available online [link].
Younis, U., Asghar, M. Z., Khan, A., Khan, A., Iqbal, J., and Jillani, N. (2020). Applying machine learning techniques for performing comparative opinion mining. Open Computer Science, 10(1):461-477. DOI: 10.1515/comp-2020-0148.
Zhang, L., Wang, S., and Liu, B. (2018). Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4):e1253. DOI: 10.1002/widm.1253.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Daniel Kansaon, Michele A. Brandão, Júlio C. S. Reis, Fabrício Benevenuto
This work is licensed under a Creative Commons Attribution 4.0 International License.