Comparing Explainable AI Techniques In Language Models: A Case Study For Fake News Detection in Portuguese
DOI:
https://doi.org/10.5753/jbcs.2026.5787Keywords:
Explainable artificial intelligence, BERTimbau, Local interpretable model-agnostic explanations, Integrated gradients, natural language processing, Machine Learning in Healthcare, Deep learning, Language models, TransformerAbstract
Language models are widely used in natural language processing, but their complexity makes interpretation difficult, limiting their adoption in critical decision-making. This work explores Explainable Artificial Intelligence (XAI) techniques, such as LIME and Integrated Gradients (IG), to understand these models. The study evaluates the effectiveness of BERTimbau in classifying Portuguese news as true or fake, using the FakeRecogna and Fake.Br Corpus datasets. In the experiments, LIME proved to be easier to interpret than IG, and both methods showed limitations when applied to texts, as they focus only on the morphological and lexical levels, ignoring other important levels.
Downloads
References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., et al. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Available at: [link].
Ahmed, I., Jeon, G., and Piccialli, F. (2022). From artificial intelligence to explainable artificial intelligence in industry 4.0: a survey on what, how, and where. IEEE Transactions on Industrial Informatics, 18(8):5031-5042. DOI: 10.1109/tii.2022.3146552.
Desai, V., Gattani, A., and Dalvi, H. (2024). Explainable models for the detection of incidents of fake news and hate speech. In Text and Social Media Analytics for Fake News and Hate Speech Detection, pages 114-136. Chapman and Hall/CRC. DOI: 10.1201/9781003409519-6.
Garcia, G., Afonso, L., and Papa, J. (2022). FakeRecogna: A New Brazilian Corpus for Fake News Detection, pages 57-67. DOI: 10.1007/978-3-030-98305-5_6.
Gohel, P., Singh, P., and Mohanty, M. (2021). Explainable ai: current status and future directions. arXiv preprint arXiv:2107.07045. DOI: 10.48550/arxiv.2107.07045.
Lakkaraju, H., Kamar, E., Caruana, R., and Leskovec, J. (2019). Faithful and customizable explanations of black box models. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 131-138. DOI: 10.1145/3306618.3314229.
Lima, T. B., Rolim, V., Nascimento, A. C., Miranda, P., Macario, V., Rodrigues, L., Freitas, E., Gašević, D., and Mello, R. F. (2024). Towards explainable automatic punctuation restoration for portuguese using transformers. Expert Systems with Applications, 257:125097. DOI: 10.1016/j.eswa.2024.125097.
Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems, 30. DOI: 10.48550/arxiv.1705.07874.
Mersha, M. A., Yigezu, M. G., Shakil, H., AlShami, A. K., Byun, S., and Kalita, J. (2025). A unified framework with novel metrics for evaluating the effectiveness of xai techniques in llms. DOI: 10.48550/arxiv.2503.05050.
Moradi, M. and Samwald, M. (2021). Explaining black-box models for biomedical text classification. IEEE journal of biomedical and health informatics, 25(8):3112-3120. DOI: 10.1109/jbhi.2021.3056748.
Moraliyage, H., Kulawardana, G., De Silva, D., Issadeen, Z., Manic, M., and Katsura, S. (2025). Explainable artificial intelligence with integrated gradients for the detection of adversarial attacks on text classifiers. Applied System Innovation, 8(1):17. DOI: 10.3390/asi8010017.
Oliveira, H., Ferreira Mello, R., Barreiros Rosa, B. A., Rakovic, M., Miranda, P., Cordeiro, T., Isotani, S., Bittencourt, I., and Gasevic, D. (2023). Towards explainable prediction of essay cohesion in portuguese and english. In LAK23: 13th International Learning Analytics and Knowledge Conference, pages 509-519. DOI: 10.1145/3576050.3576152.
Pendyala, V. S. and Hall, C. E. (2024). Explaining misinformation detection using large language models. Electronics, 13(9):1673. DOI: 10.3390/electronics13091673.
Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135-1144. DOI: 10.18653/v1/n16-3020.
Santos, R. L., Monteiro, R. A., and Pardo, T. A. (2018). The fake. br corpus-a corpus of fake news for brazilian portuguese. In Latin American and Iberian Languages Open Corpora Forum (OpenCor), pages 1-2. DOI: 10.5753/erbd.2023.229495.
Shevskaya, N. V. (2021). Explainable artificial intelligence approaches: challenges and perspectives. In 2021 International Conference on Quality Management, Transport and Information Security, Information Technologies (IT&QM&IS), pages 540-543. IEEE. DOI: 10.1109/itqmis53292.2021.9642869.
Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: pretrained bert models for brazilian portuguese. In Brazilian conference on intelligent systems, pages 403-417. Springer. DOI: 10.1007/978-3-030-61377-8_28.
Sundararajan, M., Taly, A., and Yan, Q. (2017). Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML'17, page 3319–3328. JMLR.org. DOI: 10.48550/arxiv.1703.01365.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Jéssica Vicentini, Rafael Bezerra de Menezes Rodrigues, Arnaldo Candido Junior, Ivan Rizzo Guilherme

This work is licensed under a Creative Commons Attribution 4.0 International License.

