Automatic Inference of Brazilian Websites' Reliability for Combating Fake News: Domain and Geolocation Features

Marcos Paulo Cezar de Mendonça; Igor Monteiro Moraes; Diogo Menezes Ferrazani Mattos

doi:10.5753/jisa.2025.5035

Authors

Marcos Paulo Cezar de Mendonça Universidade Federal Fluminense https://orcid.org/0009-0005-7524-6069
Igor Monteiro Moraes Universidade Federal Fluminense https://orcid.org/0000-0002-5919-7923
Diogo Menezes Ferrazani Mattos Universidade Federal Fluminense https://orcid.org/0000-0002-1279-7366

DOI:

https://doi.org/10.5753/jisa.2025.5035

Keywords:

Data Analysis, Machine Learning, Classification, Fake News, Websites Reliability

Abstract

Evaluating the reliability of websites that propagate news is critical in combating disinformation. Websites with low reliability often serve as the breeding ground for fake news that spreads rapidly across social networks. In response, this paper introduces an automatic evaluation approach to assessing the reliability of Brazilian websites by analyzing network-related features, eliminating the need for exhaustive content scanning.
Unlike previous methodologies focused on social network analysis, our approach leverages publicly available website features, including domain-related features, geolocation data, and TLS certificate attributes.
The paper proposes a supervised learning model and curates a comprehensive dataset comprising reliable and unreliable sites. Through rigorous training and evaluation using disjoint data, the model achieves an accuracy greater than 75%, effectively pinpointing reliable content websites.

Downloads

Download data is not yet available.

References

Ahammad, S. H., Kale, S. D., Upadhye, G. D., Pande, S. D., Babu, E. V., Dhumane, A. V., and Bahadur, M. D. K. J. (2022). Phishing url detection using machine learning methods. Advances in Engineering Software, 173:103288. DOI: https://doi.org/10.1016/j.advengsoft.2022.103288.

Al-Shehari, T. and Alsowail, R. A. (2021). An insider data leakage detection using one-hot encoding, synthetic minority oversampling and machine learning techniques. Entropy, 23(10). DOI: 10.3390/e23101258.

Alkawaz, M. H., Steven, S. J., Hajamydeen, A. I., and Ramli, R. (2021). A comprehensive survey on identification and analysis of phishing website based on machine learning methods. In 2021 IEEE 11th IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE), pages 82-87. DOI: 10.1109/ISCAIE51753.2021.9431794.

Baly, R., Karadzhov, G., Alexandrov, D., Glass, J., and Nakov, P. (2018). Predicting factuality of reporting and bias of news media sources. DOI: 10.48550/arXiv.1810.01765.

BRASIL (2022). Tribunal superior eleitoral. programa permanente de enfrentamento à desinformação no âmbito da justiça eleitoral: plano estratégico: eleições 2022. Biblioteca Digital da Justiça Eleitoral. Available online [link].

Cordeiro, A., Sampaio, J., and Ruback, L. (2020). Fakespread: Um framework para análise de propagação de fake news na web. In Anais do XI Workshop sobre Aspectos da Interação Humano-Computador para a Web Social, pages 9-16, Porto Alegre, RS, Brasil. SBC. DOI: 10.5753/waihcws.2020.12342.

Couto, J., Reis, J., Ítalo Cunha, Araújo, L., and Benevenuto, F. (2022). Caracterizando websites de baixa credibilidade no Brasil. In Anais do XL Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos, pages 503-516, Porto Alegre, RS, Brasil. SBC. DOI: 10.5753/sbrc.2022.222361.

de Oliveira, N. R., Pisa, P. S., Lopez, M. A., de Medeiros, D. S. V., and Mattos, D. M. F. (2021). Identifying fake news on social networks based on natural language processing: Trends and challenges. Information, 12(1). DOI: 10.3390/info12010038.

Fisher, T. (2023). What are hops & hop counts?: What is a hop and why is it an important piece of information? Available online [link].

Hua, J., Cui, X., Li, X., Tang, K., and Zhu, P. (2023). Multimodal fake news detection through data augmentation-based contrastive learning. Applied Soft Computing, 136:110125. DOI: 10.1016/j.asoc.2023.110125.

Júnior, M., Melo, P., da Silva, A. P. C., Benevenuto, F., and Almeida, J. (2021). Towards understanding the use of telegram by political groups in brazil. In Proceedings of the Brazilian Symposium on Multimedia and the Web, WebMedia '21, page 237–244, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3470482.3479640.

Medeiros, D. S. V., Cunha Neto, H. N., Lopez, M. A., S. Magalhães, L. C., Fernandes, N. C., Vieira, A. B., Silva, E. F., and F. Mattos, D. M. (2020). A survey on data analysis on large-scale wireless networks: online stream processing, trends, and challenges. Journal of Internet Services and Applications, 11(1):6. DOI: 10.1186/s13174-020-00127-2.

Palaniappan, G., S, S., Rajendran, B., Sanjay, Goyal, S., and B S, B. (2020). Malicious domain detection using machine learning on domain name features, host-based features and web-based features. Procedia Computer Science, 171:654-661. DOI: 10.1016/j.procs.2020.04.071.

Posetti, J. and Matthews, A. (2018). A short guide to the history of ‘fake news’ and disinformation. International Center for Journalists, 7(2018). Available online [link].

Ramos, M. d. M., Machado, R. d. O., and Cerqueira-Santos, E. (2022). “it’s true! i saw it on Whatsapp”: Social media, Covid-19, and political-ideological orientation in brazil. Trends in Psychology, 30(3). DOI: 10.1007/s43076-021-00129-4.

Reis, J. C. S., Correia, A., Murai, F., Veloso, A., and Benevenuto, F. (2019). Supervised learning for fake news detection. IEEE Intelligent Systems, 34(2):76-81. DOI: 10.1109/MIS.2019.2899143.

Rishikesh Mahajan, I. S. (2018). Phishing website detection using machine learning algorithms. International Journal of Computer Applications, 181(23):45-47. DOI: 10.5120/ijca2018918026.

Saleem Raja, A., Vinodini, R., and Kavitha, A. (2021). Lexical features based malicious url detection using machine learning techniques. Materials Today: Proceedings, 47:163-166. DOI: 10.1016/j.matpr.2021.04.041.

Schwittmann, L., Wander, M., and Weis, T. (2019). Domain impersonation is feasible: A study of ca domain validation vulnerabilities. In 2019 IEEE European Symposium on Security and Privacy (EuroS&P), pages 544-559. DOI: 10.1109/EuroSP.2019.00046.

Sen, P. C., Hajra, M., and Ghosh, M. (2020). Supervised classification algorithms in machine learning: A survey and review. In Emerging Technology in Modelling and Graphics, pages 99-111, Singapore. Springer Singapore. DOI: 10.1007/978-981-13-7403-6_11.

Wardle, C. and Derakhshan, H. (2017). Information disorder: Toward an interdisciplinary framework for research and policymaking. Available online [link].

Xuan, C. D., Nguyen, H. D., and Nikolaevich, T. V. (2020). Malicious url detection based on machine learning. International Journal of Advanced Computer Science and Applications, 11(1). DOI: 10.14569/IJACSA.2020.0110119.

Automatic Inference of Brazilian Websites' Reliability for Combating Fake News: Domain and Geolocation Features

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Metrics:

Make a Submission