A Comprehensive Dataset of Brazilian Fact-Checking Stories

Authors

  • Igor Marques Universidade Federal de Minas Gerais
  • Isadora Salles Universidade Federal de Minas Gerais
  • João M. M. Couto Universidade Federal de Minas Gerais
  • Breno C. Pimenta Universidade Federal de Minas Gerais
  • Samuel Assis Universidade Federal de Minas Gerais
  • Julio C. S. Reis Universidade Federal de Viçosa
  • Ana Paula C. da Silva Universidade Federal de Minas Gerais
  • Jussara M. de Almeida Universidade Federal de Minas Gerais
  • Fabrício Benevenuto Universidade Federal de Minas Gerais

DOI:

https://doi.org/10.5753/jidm.2022.2354

Keywords:

Fact-checking, Social Media, Misinformation, Fake news

Abstract

In recent years, digital platforms have become a powerful means for large scale information diffusion world-wide, particularly in Brazil. Understanding key aspects driving the misinformation diffusion process is of paramount
importance to the design and implementation of new tools to automatically detect misinformation content. In this scenario, fact-checking performed by high credibility agencies provide rich labeled data, which is fundamental to build tools capable of detecting and mitigating the effects of misinformation. This paper opens a novel dataset, referred to as FactCenter, to the research community, containing fact-check instances collected from 6 different Brazilian fact-checking agencies. This dataset has 11 647 fact-check instances, covering several topics and domains. We present an initial analysis of the data collected, enriched by data from Facebook, which demonstrates the potential of our repository for future studies.

Downloads

Download data is not yet available.

References

Bessi, A. and Ferrara, E. Social bots distort the 2016 us presidential election online discussion. First Monday 21 (11), 2016.

Blei, D. M., Ng, A. Y., and Jordan, M. I. Latent dirichlet allocation. Journal of machine Learning research (JMLR) vol. 3, pp. 993–1022, 2003.

Castillo, C., Mendoza, M., and Poblete, B. Information credibility on twitter. In Proceedings of the International Conference on World Wide Web (WWW). pp. 675–684, 2011.

Ciampaglia, G. L., Shiralkar, P., Rocha, L. M., Bollen, J., Menczer, F., and Flammini, A. Computational fact checking from knowledge networks. PLOS ONE 10 (6): e0128193, 2015.

Córdova Sáenz, C. A., Dias, M., and Becker, K. Assessing the combination of distilbert news representations and diffusion topological features to classify fake news. Journal of Information and Data Management (JIDM) vol. 12(1), 2021.

Couto, J. M. M., Pimenta, B., de Araújo, I. M., Assis, S., Reis, J. C. S., da Silva, A. P., Almeida, J., and Benevenuto, F. Central de fatos: Um repositório de checagens de fatos. In Proceedings of the Dataset Showcase Workshop (DSW/SBBD). pp. 128–137, 2021.

Ferrara, E. What types of covid-19 conspiracies are populated by twitter bots? First Monday, 2020.

Golbeck, J., Mauriello, M., Auxier, B., Bhanushali, K. H., Bonk, C., Bouzaghrane, M. A., Buntain, C., Chanduka, R., Cheakalos, P., Everett, J. B., et al. Fake news vs satire: A dataset and analysis. In Proceedings of the ACM Conference on Web Science (WebSci). pp. 17–21, 2018.

Gomes Jr, L. and Frizzon, G. Fake news and brazilian politics–temporal investigation based on semantic annotations and graph analysis. In Proceedings of the Brazilian Symposium on Databases (SBBD). pp. 169–174, 2019.

Lazer, D. M., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F., Metzger, M. J., Nyhan, B., Pennycook, G., Rothschild, D., et al. The science of fake news. Science 359 (6380): 1094–1096, 2018.

Lazer, D. M. J., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F., Metzger, M. J., Nyhan, B., Pennycook, G., Rothschild, D., Schudson, M., Sloman, S. A., Sunstein, C. R., Thorson, E. A., Watts, D. J., and Zittrain, J. L. The science of fake news. Science 359 (6380): 1094–1096, 2018.

Machado, C., Kira, B., Narayanan, V., Kollanyi, B., and Howard, P. A study of misinformation in whatsapp groups with a focus on the brazilian presidential elections. In Companion Proceedings of the World Wide Web conference (WWW). pp. 1013–1019, 2019.

Maros, A., Almeida, J. M., and Vasconcelos, M. A study of misinformation in audio messages shared in whatsapp groups. In Disinformation in Open Online Media, J. Bright, A. Giachanou, V. Spaiser, F. Spezzano, A. George, and A. Pavliuc (Eds.). Springer International Publishing, Cham, pp. 85–100, 2021.

Martins, A. D. F., Cabral, L., Mourão, P. J. C., Monteiro, J. M., and Machado, J. Detection of misinformation about covid-19 in brazilian portuguese whatsapp messages. In Proceedings of the International Conference on Applications of Natural Language to Information Systems. pp. 199–206, 2021.

Moreno, J. and Bressan, G. Factck. br: a new dataset to study fake news. In Proceedings of the Brazillian Symposium on Multimedia and the Web (WebMedia). pp. 525–527, 2019.

Myslinski, L. J. Fact checking method and system, 2012. Google Patents. US Patent 8,185,448.

Newman, D., Lau, J. H., Grieser, K., and Baldwin, T. Automatic evaluation of topic coherence. In Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT). pp. 100–108, 2010.

Newman, N., Fletcher, R., Kalogeropoulos, A., and Nielsen, R. K. Reuters Institute Digital News Report 2019. Reuters Institute for the Study of Journalism, 2019.

Potthast, M., Kiesel, J., Reinartz, K., Bevendorff, J., and Stein, B. A stylometric inquiry into hyperpartisan and fake news. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2018.

Reis, J. C. and Benevenuto, F. Supervised learning for misinformation detection in whatsapp. In Proceedings of the Brazilian Symposium on Multimedia and the Web (WebMedia). pp. 245–252, 2021.

Reis, J. C., Melo, P., Garimella, K., Almeida, J. M., Eckles, D., and Benevenuto, F. A dataset of fact-checked images shared on whatsapp during the brazilian and indian elections. In Proceedings of the International AAAI Conference on Weblogs and Social Media (ICWSM). pp. 903–908, 2020.

Reis, J. C. S., Correia, A., Murai, F., Veloso, A., and Benevenuto, F. Supervised learning for fake news detection. IEEE Intelligent Systems 34 (2), 2019.

Resende, G., Melo, P., Reis, J. C. S., Vasconcelos, M., Almeida, J., and Benevenuto, F. Analyzing textual (mis)information shared in whatsapp groups. In Proceedings of the International ACM Conference on Web Science (WebSci). pp. 225–234, 2019.

Resende, G., Melo, P., Sousa, H., Messias, J., Vasconcelos, M., Almeida, J., and Benevenuto, F. (mis)information dissemination in whatsapp: Gathering, analyzing and countermeasures. In Proceedings of the ACM Web Conference (WWW). pp. 818–828, 2019.

Ribeiro, F. N., Saha, K., Babaei, M., Henrique, L., Messias, J., Benevenuto, F., Goga, O., Gummadi, K. P., and Redmiles, E. M. On microtargeting socially divisive ads: A case study of russia-linked ad campaigns on facebook. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT). pp. 140–149, 2019.

Sanh, V., Debut, L., Chaumond, J., and Wolf, T. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, 2020.

Shu, K., Sliva, A., Wang, S., Tang, J., and Liu, H. Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter 19 (1): 22–36, 2017.

Tardaguila, C., Benevenuto, F., and Ortellado, P. Fake news is poisoning brazilian politics. whatsapp can stop it. [link], 2018.

Vlachos, A. and Riedel, S. Fact checking: Task definition and dataset construction. In Proceedings of the ACL Workshop on Language Technologies and Computational Social Science. pp. 18–22, 2014a.

Vlachos, A. and Riedel, S. Fact checking: Task definition and dataset construction. In Proceedings of the ACL Workshop on Language Technologies and Computational Social Science. pp. 18–22, 2014b.

Volkova, S., Shaffer, K., Jang, J. Y., and Hodas, N. Separating facts from fiction: Linguistic models to classify suspicious and trusted news posts on twitter. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. pp. 647–653, 2017.

Vosoughi, S., Roy, D., and Aral, S. The spread of true and false news online. Science 359 (6380): 1146–1151, 2018.

Wang, W. Y. “liar, liar pants on fire”: A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648 , 2017.

Wu, Y., Agarwal, P. K., Li, C., Yang, J., and Yu, C. Toward computational fact-checking. Proceedings of the VLDB Endowment 7 (7): 589–600, 2014.

Downloads

Published

2022-08-15

How to Cite

Marques, I., Salles, I., M. M. Couto, J., C. Pimenta, B., Assis, S., C. S. Reis, J., C. da Silva, A. P., M. de Almeida, J., & Benevenuto, F. (2022). A Comprehensive Dataset of Brazilian Fact-Checking Stories. Journal of Information and Data Management, 13(1). https://doi.org/10.5753/jidm.2022.2354

Issue

Section

Dataset Showcase Workshop 2021 - Extended Papers