Incremental Learning for Fake News Detection
DOI:
https://doi.org/10.5753/jidm.2022.2542Keywords:
fake news, online learning, text categorization, machine learningAbstract
Fake news is a concern that has impacted people’s lives for a long time. However, this problem has worsened deeply with the increase of social media popularity, which became a fertile ground to spread fast and affect humanity’s social, political, and economic future. Despite several studies on fake news detection, some critical gaps still need to be addressed. One of them is that most studies are unrealistic since they use machine learning with offline learning models. The language used in communication change continuously, reflecting society’s nature. Therefore, as facts covered by the news are dynamic, the static models learned by offline learning methods can quickly become obsolete. This study evaluates fake news detection using the online learning paradigm, which is best suited for dynamic problems whose underlying data distribution can change over time. We have addressed how automatic fake news classification suffers from concept drifting. For this, we have applied state-of-the-art methods that can learn incrementally to classify documents covering two historical events: the United States presidential election and the coronavirus disease (Covid-19) pandemic. We also evaluated three different types of feedback (uncertain, delayed, and immediate) and two training strategies: (i) updating the model only when it makes a prediction error and (ii) updating it after both error or success. The results obtained by our carefully designed experiments indicated that the performance of online learning models improved over time, while offline models did not sustain their performance.
Downloads
References
Alberto, T. C., Lochter, J. V., and Almeida, T. A. Post or block? advances in automatically filtering undesired comments. Journal of Intelligent & Robotic Systems 80 (1): 245–259, 2015a.
Alberto, T. C., Lochter, J. V., and Almeida, T. A. Tubespam: Comment spam filtering on youtube. In Proceedings of the 14th International Conference on Machine Learning and Applications (ICMLA’15). IEEE, Miami, FL, USA, pp. 138–143, 2015b.
Allcott, H. and Gentzkowf, M. Social media and fake news in the 2016 election. Journal of Economic Perspectives 31 (2): 211–236, 2016.
Almeida, T. A., Silva, T. P., Santos, I., and Hidalgo, J. M. G. Text normalization and semantic indexing to enhance instant messaging and SMS spam filtering. Knowledge-Based Systems vol. 108, pp. 25–32, May, 2016.
Almeida, T. A. and Yamakami, A. Facing the spammers: A very effective approach to avoid junk e-mails. Expert Systems with Applications 39 (7): 6557–6561, June, 2012.
Almeida, T. A., Yamakami, A., and Almeida, J. Spam filtering: how the dimensionality reduction affects the accuracy of naive Bayes classifiers. Journal of Internet Services and Applications 1 (3): 183–200, Feb., 2011.
Alves, J. L., Weitzel, L., Quaresma, P., Cardoso, C. E., and Cunha, L. Brazilian presidential elections in the era of misinformation: A machine learning approach to analyse fake news. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, I. Nyström, Y. Hernández Heredia, and V. Milián Núñez (Eds.). Springer International Publishing, Cham, pp. 72–84, 2019.
Biesialska, M., Biesialska, K., and Costa-jussà, M. R. Continual lifelong learning in natural language processing: A survey. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), pp. 6523–6541, 2020.
Bittencourt, M. M., Silva, R. M., and Almeida, T. A. ML-MDLText: A multilabel text categorization technique with incremental learning. In Proceedings of the 8th Brazilian Conference on Intelligent Systems (BRACIS’19). IEEE, Salvador, BA, Brasil, pp. 1–6, 2019.
Bittencourt, M. M., Silva, R. M., and Almeida, T. A. ML-MDLText: An efficient and lightweight multilabel text classifier with incremental learning. Applied Soft Computing vol. 96, pp. 106699, Nov., 2020.
Cardoso, E. F., Silva, R. M., and Almeida, T. A. Towards automatic filtering of fake reviews. Neurocomputing vol. 309, pp. 106–116, 2018.
Charles F. Bond, J. and DePaulo, B. M. Accuracy of deception judgments. Personality and Social Psychology Review 10 (3): 214–234, 2006.
Cormack, G. V. Trec 2007 spam track overview. In Proceedings of the Sixteenth Text REtrieval Conference (TREC’2007). National Institute of Standards and Technology (NIST), Gaithersburg, MD, USA, pp. 1–9, 2007.
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., and Singer, Y. Online passive-aggressive algorithms. Journal of Machine Learning Research vol. 7, pp. 551–585, Dec., 2006.
Faceli, K., Lorena, A. C., Gama, J., Almeida, T. A., and de Carvalho, A. C. P. L. F. Inteligência artificial: uma abordagem de aprendizado de máquina. LTC, 2021.
Faustini, P. and Ferreira Covões, T. Fake news detection using one-class classification. In Proceedings of the 8th Brazilian Conference on Intelligent Systems (BRACIS’19). IEEE, Salvador, BA, Brazil, pp. 592–597, 2019.
Freund, Y. and Schapire, R. E. Large margin classification using the perceptron algorithm. Machine Learning 37 (3): 277–296, Dec., 1999.
Galhardi, C. P., Freire, N. P., Minayo, M. C. d. S., and Fagundes, M. C. M. Fato ou fake? uma análise da desinformação frente à pandemia da COVID-19 no Brasil. Ciência & Saúde Coletiva vol. 25, pp. 4201 – 4210, 10, 2020.
Gama, J., Sebastiao, R., and Rodrigues, P. P. On evaluating stream learning algorithms. Machine Learning 90 (3): 317–346, Mar., 2013.
Ghosh, S. and Shah, C. Towards automatic fake news classification. Proceedings of the Association for Information Science and Technology 55 (1): 805–807, 2018.
Gôlo, M., Caravanti, M., Rossi, R., Rezende, S., Nogueira, B., and Marcacini, R. Learning textual representations from multiple modalities to detect fake news through one-class learning. In Proceedings of the Brazilian Symposium on Multimedia and the Web. WebMedia ’21. Association for Computing Machinery, New York, NY, USA, pp. 197–204, 2021.
Gruppi, M., Horne, B. D., and Adali, S. NELA-GT-2019: A large multi-labelled news dataset for the study of misinformation in news articles. CoRR vol. abs/2003.08444, pp. 1–5, 2020.
Gruppi, M., Horne, B. D., and Adali, S. NELA-GT-2020: A large multi-labelled news dataset for the study of misinformation in news articles. CoRR vol. abs/2102.04567, pp. 1–6, 2021.
Horne, B. D., Nørregaard, J., and Adali, S. Robust fake news detection over time and attack. ACM Transactions on Intelligent Systems and Technology 11 (1): 7:1–7:23, Dec., 2019.
Kaliyar, R. K., Goswami, A., and Narang, P. Multiclass fake news detection using ensemble machine learning. In 2019 IEEE 9th International Conference on Advanced Computing (IACC). IEEE, pp. 103–107, 2019.
Kaliyar, R. K., Goswami, A., Narang, P., and Sinha, S. FNDNet – a deep convolutional neural network for fake news detection. Cognitive Systems Research vol. 61, pp. 32–44, 2020.
Khan, J. Y., Khondaker, M. T. I., Afroz, S., Uddin, G., and Iqbal, A. A benchmark study of machine learning models for online fake news detection. Machine Learning with Applications vol. 4, pp. 100032, 2021.
Kolter, J. Z. and Maloof, M. A. Dynamic weighted majority: An ensemble method for drifting concepts. The Journal of Machine Learning Research vol. 8, pp. 2755–2790, 2007.
Krawczyk, B. and Woźniak, M. Online query by committee for active learning from drifting data streams. In 2017 International Joint Conference on Neural Networks (IJCNN). pp. 2120–2127, 2017.
Ksieniewicz, P., Zyblewski, P., Choraś, M., Kozik, R., Giełczyk, A., and Woźniak, M. Fake news detection from data streams. In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, pp. 1–8, 2020.
McCallum, A. and Nigam, K. A comparison of event models for naive Bayes text classification. In Proceedings of the 15th AAAI Workshop on Learning for Text Categorization (AAAI’98). AAAI Press/The MIT Press, Madison, Wisconsin, pp. 41–48, 1998.
Minku, L. L., White, A. P., and Yao, X. The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Transactions on Knowledge and Data Engineering vol. 22, pp. 730–742, 2010.
Monteiro, R. A., Santos, R. L. S., Pardo, T. A. S., de Almeida, T. A., Ruiz, E. E. S., and Vale, O. A. Contributions to the study of fake news in portuguese: New corpus and automatic detection results. In 13th International Conference on Computational Processing of the Portuguese Language (PROPOR’2018). Springer International Publishing, Canela, Rio Grande do Sul, Brazil, pp. 324–334, 2018.
Pérez-Rosas, V., Kleinberg, B., Lefevre, A., and Mihalcea, R. Automatic detection of fake news. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, USA, pp. 3391–3401, 2018.
Pérez-Rosas, V. and Mihalcea, R. Cross-cultural deception detection. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 440–445, 2014.
Pérez-Rosas, V. and Mihalcea, R. Experiments in open domain deception detection. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 1120–1125, 2015.
Rasool, T., Butt, W. H., Shaukat, A., and Akram, M. U. Multi-label fake news detection using multi-layered supervised learning. In Proceedings of the 2019 11th International Conference on Computer and Automation Engineering. ICCAE 2019. Association for Computing Machinery, New York, NY, USA, pp. 73–77, 2019.
Rezayi, S., Balakrishnan, V., Arabnia, S., and Arabnia, H. R. Fake news and cyberbullying in the modern era. In Proceedings of the 2018 International Conference on Computational Science and Computational Intelligence. CSCI 2018. IEEE, New York, NY, USA, pp. 7–12, 2018.
Salton, G. and Buckley, C. Term-weighting approaches in automatic text retrieval. Information Processing & Management 24 (5): 513–523, Aug., 1988.
Salvi, C., Iannello, P., Cancer, A., McClay, M., Rago, S., Dunsmoor, J. E., and Antonietti, A. Going viral: How fear, socio-cognitive polarization and problem-solving influence fake news detection and proliferation during covid-19 pandemic. Frontiers in Communication vol. 5, pp. 127, 2021.
Shu, K., Sliva, A., Wang, S., Tang, J., and Liu, H. Fake news detection on social media: A data mining Perspective. ACM SIGKDD Explorations Newsletter 19 (1): 22–36, 2017.
Silva, R. M. and Almeida, T. A. How concept drift can impair the classification of fake news. In Proceedings of the 9th Symposium on Knowledge Discovery, Mining and Learning (KDMiLe’21). Brazilian Computing Society, Rio de Janeiro, RJ, Brazil, pp. 1–8, 2021.
Silva, R. M., Almeida, T. A., and Yamakami, A. MDLText: An efficient and lightweight text classifier. Knowledge-Based Systems vol. 118, pp. 152–164, Feb., 2017.
Silva, R. M., de Sales Santos, R. L., Pardo, T. A. S., and Almeida, T. A. Towards automatically filtering fake news in portuguese. Expert Systems with Applications vol. 146, pp. 1–48, May, 2020.
Song, C., Ning, N., Zhang, Y., and Wu, B. A multimodal fake news detection model based on crossmodal attention residual and multichannel convolutional neural networks. Information Processing & Management 58 (1): 102437, 2021.
van der Linden, S., Roozenbeek, J., and Compton, J. Inoculating against fake news about covid-19. Frontiers in Psychology vol. 11, pp. 1–7, 2020.
Vosoughi, S., Roy, D., and Aral, S. The spread of true and false news online. Science 359 (6380): 1146–1151, 2018.
Wang, W. Y. “liar, liar pants on fire”: A new benchmark dataset for fake news detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Vancouver, Canada, pp. 422–426, 2017.
Zarocostas, J. How to fight an infodemic. The lancet 395 (10225): 676, 2020.
Zhang, S. and Kejriwal, M. Concept drift in bias and sensationalism detection: An experimental study. In Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. ASONAM’19. Association for Computing Machinery, New York, NY, USA, pp. 601–604, 2019.
Zhang, T. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the 21th International Conference on Machine Learning (ICML’04). ACM, Banff, Alberta, Canada, pp. 116–123, 2004.
Zhou, L., Burgoon, J., Twitchell, D., Qin, T., and Nunamaker Jr., J. A comparison of classification methods for predicting deception in computer-mediated communication. Journal of Management Information Systems 20 (4): 139–165, 2004.
Zhou, X., Jain, A., Phoha, V. V., and Zafarani, R. Fake news early detection: A theory-driven model. Digital Threats: Research and Practice 1 (2): 12:1–12:25, June, 2020.
Zhou, X. and Zafarani, R. A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Computing Surveys 53 (5): 109:1–109:40, 2020.