Assessing the combination of DistilBERT news representations and difusion topological features to classify fake news
DOI:
https://doi.org/10.5753/jidm.2021.1895Keywords:
distilBERT, fake news, fake news classification, topological features, ensemblesAbstract
Fake news (FN) have affected people’s lives in unimaginable ways. The automatic classification of FN is a vital tool to prevent their dissemination and support fact-checking. Related work has shown that FN spread faster, deeper, and more broadly than truthful news on social media. Deep learning has produced state-of-the-art solutions in this field, mainly based on textual attributes. In this paper, we propose to combine compact representations of the textual news properties generated using DistilBERT, with topological metrics extracted from their propagation network in social media. Using a dataset related to politics and distinct learning algorithms, we extensively assessed the components of the proposed solution. Regarding the textual attributes, we reached results comparable to stateof-the-art solutions using only the news title and contents, which is useful for FN early detection. We assessed the influential topological metrics, and the effect of their combination with the news textual features. We also explored the use of ensembles. Our results were very promising, revealing the potential of the features proposed and the adoption of ensembles.
Downloads
References
Bauskar, S., Badole, V., Jain, P., and Chawla, M. Natural Language Processing based Hybrid Model for Detecting Fake News Using Content-Based Features and Social Features. International Journal of Information Engineering and Electronic Business 11 (4): 1–10, 2019.
Bondielli, A. and Marcelloni, F. A survey on fake news and rumour detection techniques. Information Sciences vol. 497, pp. 38–55, 2019.
Costa, L. d. F., Rodrigues, F. A., Travieso, G., and Villas Boas, P. R. Characterization of complex networks: A survey of measurements. Advances in Physics 56 (1): 167–242, Jan, 2007.
da Fonseca Vieira, V., da Silva Felix, L. G., Barbosa, C. M. G., and Xavier, C. R. Investigating the relation between companies with topological analysis of a network of stock exchange in brazil. J. Inf. Data Manag. 10 (3), 2019.
Demšar, J. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7 (1): 1–30, 2006.
Devlin, J., Chang, M., Lee, K., and Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (NAACL-HLT), J. Burstein, C. Doran, and T. Solorio (Eds.). pp. 4171–4186, 2019.
Gadzicki, K., Khamsehashari, R., and Zetzsche, C. Early vs late fusion in multimodal convolutional neural networks. In 2020 IEEE 23rd International Conference on Information Fusion (FUSION). pp. 1–6, 2020.
Hansen, D. L., Shneiderman, B., Smith, M. A., and Himelboim, I. Chapter 3 - social network analysis: Measuring, mapping, and modeling collections of connections. In Analyzing Social Media Networks with NodeXL (Second Edition), Second Edition ed., D. L. Hansen, B. Shneiderman, M. A. Smith, and I. Himelboim (Eds.). Morgan Kaufmann, pp. 31 – 51, 2020.
Leão, J. C., Laender, A. H. F., and de Melo, P. O. S. V. Overcoming bias in community detection evaluation. J. Inf. Data Manag. 11 (3), 2020.
Liao, H., Liu, Q., Shu, K., and xie, X. Fake news detection through graph comment advanced learning. arXiv preprint arXiv:2011.01579 , 2021.
Murphy, K. P. Machine Learning: A Probabilistic Perspective. The MIT Press, 2012.
Papanastasiou, F., Katsimpras, G., and Paliouras, G. Tensor factorization with label information for fake news detection. arXiv preprint arXiv:1908.03957 , 2019.
Pierri, F., Piccardi, C., and Ceri, S. Topology comparison of twitter diffusion networks effectively reveals misleading information. Scientific Reports 10 (1), Jan, 2020.
Polikar, R. Ensemble learning. Scholarpedia 4 (1): 2776, 2009.
Reis, J. C. S., Correia, A., Murai, F., Veloso, A., and Benevenuto, F. Supervised learning for fake news detection. IEEE Intelligent Systems 34 (2): 76–81, 2019.
Sanh, V., Debut, L., Chaumond, J., and Wolf, T. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 , 2019.
Shu, K., Cui, L., Wang, S., Lee, D., and Liu, H. Defend: Explainable fake news detection. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery Data Mining. KDD ’19. Association for Computing Machinery, New York, NY, USA, pp. 395–405, 2019.
Shu, K., Mahudeswaran, D., Wang, S., Lee, D., and Liu, H. Fakenewsnet: A data repository with news content, social context and dynamic information for studying fake news on social media. arXiv preprint arXiv:1809.01286 vol. 8, 2018.
Shu, K., Mahudeswaran, D., Wang, S., and Liu, H. Hierarchical propagation networks for fake news detection: Investigation and exploitation. In Proceedings of the International AAAI Conference on Web and Social Media. Vol. 14. pp. 626–637, 2020.
Shu, K., Sliva, A., Wang, S., Tang, J., and Liu, H. Fake news detection on social media: A data mining perspective. SIGKDD Explor. Newsl. 19 (1): 22–36, Sept., 2017.
Shu, K., Zhou, X., Wang, S., Zafarani, R., and Liu, H. The role of user profiles for fake news detection. In Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. pp. 436–439, 2019.
Sáenz, C. A. C., Dias, M., and Becker, K. Combining compact news representations generated using distilbert and topological features to classify fake news. In Anais do VIII Symposium on Knowledge Discovery, Mining and Learning. SBC, Porto Alegre, RS, Brasil, pp. 209–216, 2020.
Takikawa, H. and Nagayoshi, K. Political polarization in social media: Analysis of the “twitter political field” in japan. In 2017 IEEE International Conference on Big Data (Big Data). pp. 3143–3150, 2017.
Wang, T., Lin, C., and Lin, H. Dga botnet detection utilizing social network analysis. In 2016 International Symposium on Computer, Consumer and Control (IS3C). pp. 333–336, 2016.
Wang, W. Y. ”Liar, liar pants on fire: A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648 , 2017.
Zhang, C. and Ma, Y. Ensemble Machine Learning: Methods and Applications. Springer Publishing Company, Incorporated, 2012.
Zhang, L., Wang, S., and Liu, B. Deep learning for sentiment analysis: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8 (4), 2018.
Zhou, X., Wu, J., and Zafarani, R. Safe: Similarity-aware multi-modal fake news detection. arXiv preprint arXiv:2003.04981 , 2020.
Zhou, X. and Zafarani, R. Network-based fake news detection: A pattern-driven approach. SIGKDD Explor. Newsl. 21 (2): 48–60, Nov., 2019.
Zhou, X. and Zafarani, R. A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Comput. Surv. 53 (5), Sept., 2020.