Sentiment Analysis in Tweets: Exploring Instance-based Transfer Learning for Dataset Enrichment
DOI:
https://doi.org/10.5753/jidm.2022.2564Keywords:
machine learning, sentiment analysis, supervised learning, transfer learning, TwitterAbstract
Due to the popularity of user-generated content driven by social networks, sentiment analysis has become a very rich and influential research field. A challenging problem in this classification task is curating sufficient labeled data to train a classifier with a good performance. In order to address that issue, a promising strategy is to enrich the dataset of interest with labeled data from other datasets of different domains. However, another issue emerges: how to properly select data from a broad set of datasets to improve the classifier’s performance. This manuscript presents instance-based transfer learning strategies to enrich the training set that is initially composed of the labeled target-dataset. Notably, we investigate the benefits of selecting similar and dissimilar instances from a set of source-datasets to transfer them to the target-dataset. Our results show that one of the strategies produces statistically significant performance improvement and that diversity plays an essential role in enhancing performance.
Downloads
References
Barreto, S., Moura, R., Carvalho, J., Paes, A., and Plastino, A. Sentiment analysis in tweets: an assessment study from classical to modern text representation models. CoRR vol. abs/2105.14373, 2021.
Bravo-Marquez, F., Frank, E., Mohammad, S. M., and Pfahringer, B. Determining word-emotion associations from tweets by multi-label classification. In Proceedings of the 2016 IEEE/WIC/ACM Int. Conf. on Web Intelligence (WI). IEEE, pp. 536–539, 2016.
Carvalho, J. and Plastino, A. On the evaluation and combination of state-of-the-art features in twitter sentiment analysis. Artificial Intelligence Review vol. 54, pp. 1887–1936, 03, 2021.
Guimarães, E., Vianna, D., Paes, A., and Plastino, A. Enriching datasets for sentiment analysis in tweets with instance selection. In Anais do IX Symposium on Knowledge Discovery, Mining and Learning. SBC, Porto Alegre, RS, Brasil, pp. 73–80, 2021.
Guo, J., Shah, D., and Barzilay, R. Multi-source domain adaptation with mixture of experts. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. ACL, pp. 4694–4703, 2018.
Liu, B. Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Studies in Natural Language Processing. Cambridge University Press, 2020.
Liu, M., Song, Y., Zou, H., and Zhang, T. Reinforced training data selection for domain adaptation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. ACL, pp. 1957–1968, 2019.
Martínez-Cámara, E., Martín-Valdivia, M., López, L., and Montejo-Ráez, A. Sentiment analysis in twitter. Natural Language Engineering vol. 20, pp. 1–28, 01, 2014.
Pan, S. J. and Yang, Q. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering 22 (10): 1345–1359, 2010.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research vol. 12, pp. 2825–2830, 2011.
Ruder, S., Ghaffari, P., and Breslin, J. G. Data selection strategies for multi-domain sentiment analysis. CoRR vol. abs/1702.02426, 2017.
Ruder, S. and Plank, B. Learning to select data for transfer learning with Bayesian Optimization. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. ACL, pp. 372–382, 2017.