Do Calibrated Recommendations Affect Explanations? A Study on Post-Hoc Adjustments

Paul Dany Flores Atauchi; André Levi Zanon; Leonardo Chaves Dutra da Rocha; Marcelo Garcia Manzato

doi:10.5753/jis.2025.5563

Authors

Paul Dany Flores Atauchi Universidade de São Paulo https://orcid.org/0000-0003-2600-7836
André Levi Zanon University College Cork https://orcid.org/0000-0003-0526-2678
Leonardo Chaves Dutra da Rocha Universidade Federal de São João del-Rei https://orcid.org/0000-0002-4913-4902
Marcelo Garcia Manzato Universidade de São Paulo https://orcid.org/0000-0003-3215-6918

DOI:

https://doi.org/10.5753/jis.2025.5563

Keywords:

Recommender Systems, Calibration, Explanation, Graph Embeddings

Abstract

Recommender systems generate suggestions by identifying relationships among past interactions, user similarities, and item metadata. Recently, there has been an increased focus on evaluating recommendations based not only on accuracy but also on aspects like transparency and calibration. Transparency is important, as explanations can enhance user trust and persuasion, while calibration aligns users’ interests with recommendation lists, improving fairness and reducing popularity bias. Traditionally, calibration and explanation are applied in post-processing. Our study investigates two key research gaps: (1) the impact of graph embeddings in model-agnostic knowledge graph explanations, exploring their under-researched potential compared to syntactic approaches to produce meaningful explanations; and (2) the effect of calibration on recommendation explanations, assessing whether calibrated recommendation reordering influences the outcomes of explanation algorithms. We evaluate the quality of explanations using a set of metrics, such as diversity, which measures how well different interests of the user are covered; popularity, which assesses how well explanations avoid favoring already popular items; and recency, which examines the inclusion of recently interacted items. Our findings demonstrate that graph embedding methods are effective in generating high-quality explanations using these offline explanation metrics, and that post-hoc knowledge graph explanation algorithms are robust to calibration changes.

Downloads

Download data is not yet available.

References

Abdollahpouri, H., Mansoury, M., Burke, R., and Mobasher, B. (2020). The connection between popularity bias, calibration, and fairness in recommendation. In Proceedings of the 14th ACM Conference on Recommender Systems, RecSys ’20, pages 726–731, New York, NY, USA. Association for Computing Machinery. DOI: https://doi.org/10.1145/3383313.3418487.

Aggarwal, C. C. (2016). An Introduction to Recommender Systems, pages 1–28. Springer International Publishing, Cham. DOI: https://doi.org/10.1007/978-3-319-29659-3_1.

Ali, M., Berrendorf, M., Hoyt, C. T., Vermue, L., Sharifzadeh, S., Tresp, V., and Lehmann, J. (2021). PyKEEN 1.0: A Python library for training and evaluating knowledge graph embeddings. Journal of Machine Learning Research, 22(82):1–6.

Alves, G., Jannach, D., Ferrari De Souza, R., and Manzato, M. G. (2024). User perception of fairness-calibrated recommendations. In Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization, UMAP ’24, pages 78–88, New York, NY, USA. Association for Computing Machinery. DOI: https://doi.org/10.1145/3627043.3659558.

Balloccu, G., Boratto, L., Fenu, G., and Marras, M. (2022). Post-processing recommender systems with knowledge graphs for recency, popularity, and diversity of explanations. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, pages 646–656, New York, NY, USA. Association for Computing Machinery. DOI: https://doi.org/10.1145/3477495.3532041.

Balog, K. and Radlinski, F. (2020). Measuring recommendation explanation quality: The conflicting goals of explanations. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, pages 329–338, New York, NY, USA. Association for Computing Machinery. DOI: https://doi.org/10.1145/3397271.3401032.

Cantador, I., Brusilovsky, P., and Kuflik, T. (2011). HetRec ’11: Proceedings of the 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems. New York, NY, USA. Association for Computing Machinery. DOI: https://doi.org/10.1145/2043932.2044016.

Cao, J., Fang, J., Meng, Z., and Liang, S. (2024). Knowledge graph embedding: A survey from the perspective of representation spaces. ACM Computing Surveys, 56(6). DOI: https://doi.org/10.1145/3643806.

Coba, L., Confalonieri, R., and Zanker, M. (2022). Recoxplainer: A library for development and offline evaluation of explainable recommender systems. IEEE Computational Intelligence Magazine, 17(1):46–58. DOI: https://doi.org/10.1109/MCI.2021.3129958.

Cremonesi, P., Koren, Y., and Turrin, R. (2010). Performance of recommender algorithms on top-N recommendation tasks. In Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys ’10, pages 39–46, New York, NY, USA. Association for Computing Machinery. DOI: https://doi.org/10.1145/1864708.1864721.

da Costa, A., Fressato, E., Neto, F., Manzato, M., and Campello, R. (2018). Case Recommender: A flexible and extensible Python framework for recommender systems. In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys ’18, pages 494–495, New York, NY, USA. Association for Computing Machinery. DOI: https://doi.org/10.1145/3240323.3241611.

da Silva, D. C. and Durão, F. A. (2023). Introducing a framework and a decision protocol to calibrated recommender systems. Applied Intelligence, 53(19):22044–22072. DOI: https://doi.org/10.1007/s10489-023-04681-7.

da Silva, D. C. and Durão, F. A. (2025). Benchmarking fairness measures for calibrated recommendation systems on movies domain. Expert Systems with Applications, 126380. DOI: https://doi.org/10.1016/j.eswa.2025.126380.

da Silva, D. C., Manzato, M. G., and Durão, F. A. (2021). Exploiting personalized calibration and metrics for fairness recommendation. Expert Systems with Applications, 181:115112. DOI: https://doi.org/10.1016/j.eswa.2021.115112.

Dijkstra, E. W. (2022). A Note on Two Problems in Connexion with Graphs, pages 287–290. Association for Computing Machinery, New York, NY, USA, 1st edition. DOI: https://doi.org/10.1145/3544585.3544600.

Du, Y., Ranwez, S., Sutton-Charani, N., and Ranwez, V. (2022). Post-hoc recommendation explanations through an efficient exploitation of the DBpedia category hierarchy. Knowledge-Based Systems, 245:108560. DOI: https://doi.org/10.1016/j.knosys.2022.108560.

Duchi, J., Hazan, E., and Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(7).

Ferrari Dacrema, M., Boglio, S., Cremonesi, P., and Jannach, D. (2021). A troubling analysis of reproducibility and progress in recommender systems research. ACM Transactions on Information Systems, 39(2). DOI: https://doi.org/10.1145/3434185.

Ferraro, A. (2019). Music cold-start and long-tail recommendation: Bias in deep representations. In Proceedings of the 13th ACM Conference on Recommender Systems, RecSys ’19, pages 586–590, New York, NY, USA. Association for Computing Machinery. DOI: https://doi.org/10.1145/3298689.3347052.

Guo, Q., Zhuang, F., Qin, C., Zhu, H., Xie, X., Xiong, H., and He, Q. (2022). A survey on knowledge graph-based recommender systems. IEEE Transactions on Knowledge and Data Engineering, 34(8):3549–3568. DOI: https://doi.org/10.1109/TKDE.2020.3028705.

Hada, D. V., M., V., and Shevade, S. K. (2021). Rexplug: Explainable recommendation using plug-and-play language model. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, pages 81–91, New York, NY, USA. Association for Computing Machinery. DOI: https://doi.org/10.1145/3404835.3462939.

Harper, F. M. and Konstan, J. A. (2015). The MovieLens datasets: History and context. ACM Transactions on Interactive Intelligent Systems, 5(4). DOI: https://doi.org/10.1145/2827872.

He, X., Liao, L., Zhang, H., Nie, L., Hu, X., and Chua, T.-S. (2017). Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, WWW ’17, pages 173–182, Geneva, Switzerland. International World Wide Web Conferences Steering Committee. DOI: https://doi.org/10.1145/3038912.3052569.

Kaminskas, M. and Bridge, D. (2016). Diversity, serendipity, novelty, and coverage: A survey and empirical analysis of beyond-accuracy objectives in recommender systems. ACM Transactions on Interactive Intelligent Systems, 7(1). DOI: https://doi.org/10.1145/2926720.

Li, J. and Yang, Y. (2022). STAR: Knowledge graph embedding by scaling, translation and rotation. In International Conference on AI and Mobile Services, pages 31–45. Springer. DOI: https://doi.org/10.1007/978-3-031-23504-7_3.

Lin, K., Sonboli, N., Mobasher, B., and Burke, R. (2020). Calibration in collaborative filtering recommender systems: A user-centered analysis. In Proceedings of the 31st ACM Conference on Hypertext and Social Media, HT ’20, pages 197–206, New York, NY, USA. NY, USA. Association for Computing Machinery. DOI: https://doi.org/10.1145/3372923.3404793.

Lin, Y., Liu, Z., Sun, M., Liu, Y., and Zhu, X. (2015). Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the AAAI Conference on Artificial Intelligence, 29. DOI: https://doi.org/10.1609/aaai.v29i1.9491.

Musto, C., Narducci, F., Lops, P., De Gemmis, M., and Semeraro, G. (2016). Explod: A framework for explaining recommendations based on the linked open data cloud. In Proceedings of the 10th ACM Conference on Recommender Systems, RecSys ’16, pages 151–154, New York, NY, USA. Association for Computing Machinery. DOI: https://doi.org/10.1145/2959100.2959173.

Musto, C., Narducci, F., Lops, P., de Gemmis, M., and Semeraro, G. (2019). Linked open data-based explanations for transparent recommender systems. International Journal of Human-Computer Studies, 121:93–107. DOI: https://doi.org/10.1016/j.ijhcs.2018.03.003.

Naghiaei, M., Dehghan, M., Rahmani, H. A., Azizi, J., and Aliannejadi, M. (2024). Personalized beyond-accuracy calibration in recommendation. In Proceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR ’24, pages 107–116, New York, NY, USA. Association for Computing Machinery. DOI: https://doi.org/10.1145/3664190.3672507.

Paulheim, H. (2016). Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web, 8(3):489–508. DOI: https://doi.org/10.3233/SW-160218.

Peng, C., Xia, F., Naseriparsa, M., and Osborne, F. (2023). Knowledge graphs: Opportunities and challenges. Artificial Intelligence Review, pages 1–32. DOI: https://doi.org/10.1007/s10462-023-10465-9.

Rana, A., D’Addio, R. M., Manzato, M. G., and Bridge, D. (2022). Extended recommendation-by-explanation. User Modeling and User-Adapted Interaction, 32(1–2):91–131. DOI: https://doi.org/10.1007/s11257-021-09317-4.

Rendle, S., Freudenthaler, C., Gantner, Z., and Schmidt-Thieme, L. (2009). BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ’09, pages 452–461, Arlington, Virginia, USA. AUAI Press. DOI: https://doi.org/10.48550/arXiv.1205.2618.

Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., and Riedl, J. (1994). GroupLens: An open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, CSCW ’94, pages 175–186, New York, NY, USA. Association for Computing Machinery. DOI: https://doi.org/10.1145/192844.192905.

Ricci, F., Rokach, L., and Shapira, B. (2022). Recommender Systems: Techniques, Applications, and Challenges, pages 1–35. Springer US, New York, NY. DOI: https://doi.org/10.1007/978-1-0716-2197-4_1.

Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215. DOI: https://doi.org/10.1038/s42256-019-0048-x.

Souza, L. S. d. and Manzato, M. G. (2022). Aspect-based summarization: An approach with different levels of details to explain recommendations. In Proceedings of the Brazilian Symposium on Multimedia and the Web, WebMedia ’22, pages 202–210, New York, NY, USA. Association for Computing Machinery. DOI: https://doi.org/10.1145/3539637.3557002.

Souza, R. and Manzato, M. (2024a). Uma abordagem em etapa de processamento para redução do viés de popularidade. In Proceedings of the 30th Brazilian Symposium on Multimedia and the Web, pages 310–317, Porto Alegre, RS, Brasil. SBC. DOI: https://doi.org/10.5753/webmedia.2024.241542.

Souza, R. and Manzato, M. (2024b). Uma abordagem em etapa de processamento para redução do viés de popularidade. In Proceedings of the 30th Brazilian Symposium on Multimedia and the Web, pages 310–317, Porto Alegre, RS, Brasil. SBC. DOI: https://doi.org/10.5753/webmedia.2024.241542.

Steck, H. (2018). Calibrated recommendations. In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys ’18, pages 154–162, New York, NY, USA. Association for Computing Machinery. DOI: https://doi.org/10.1145/3240323.3240372.

Steck, H. (2019). Embarrassingly shallow autoencoders for sparse data. In The World Wide Web Conference, WWW ’19, pages 3251–3257, New York, NY, USA. Association for Computing Machinery. DOI: https://doi.org/10.1145/3308558.3313710.

Sun, Z., Deng, Z.-H., Nie, J.-Y., and Tang, J. (2019). RotatE: Knowledge graph embedding by relational rotation in complex space. arXiv preprint arXiv:1902.10197. DOI: https://doi.org/10.48550/arXiv.1902.10197.

Tchuente, D., Lonlac, J., and Kamsu-Foguem, B. (2024). A methodological and theoretical framework for implementing explainable artificial intelligence (XAI) in business applications. Computers in Industry, 155:104044. DOI: https://doi.org/10.1016/j.compind.2023.104044.

Tintarev, N. and Masthoff, J. (2015). Explaining recommendations: Design and evaluation. In Recommender Systems Handbook, pages 353–382. Springer. DOI: https://doi.org/10.1007/978-1-4899-7637-6.

Trouillon, T., Welbl, J., Riedel, S., Gaussier, E., and Bouchard, G. (2016). Complex embeddings for simple link prediction. In Balcan, M. F. and Weinberger, K. Q. (Eds.), Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 2071–2080, New York, NY, USA. PMLR. DOI: https://doi.org/10.5555/3045390.3045609.

Wang, S., Zhang, X., Wang, Y., and Ricci, F. (2024). Trustworthy recommender systems. ACM Transactions on Intelligent Systems and Technology, 15(4). DOI: https://doi.org/10.1145/3627826.

Xu, Z., Zeng, H., Tan, J., Fu, Z., Zhang, Y., and Ai, Q. (2023). A reusable model-agnostic framework for faithfully explainable recommendation and system scrutability. ACM Transactions on Information Systems, 42(1). DOI: https://doi.org/10.1145/3605357.

Zanon, A. L., da Rocha, L. C. D., and Manzato, M. G. (2022). Balancing the trade-off between accuracy and diversity in recommender systems with personalized explanations based on linked open data. Knowledge-Based Systems, 252:109333. DOI: https://doi.org/10.1016/j.knosys.2022.109333.

Zanon, A. L., da Rocha, L. C. D., and Manzato, M. G. (2024). Model-agnostic knowledge graph embedding explanations for recommender systems. In World Conference on Explainable Artificial Intelligence, pages 3–27. Springer. DOI: https://doi.org/10.1007/978-3-031-63797-1_1.

Zhang, S., Tay, Y., Yao, L., and Liu, Q. (2019). Quaternion knowledge graph embeddings. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Red Hook, NY, USA. Curran Associates Inc. DOI: https://doi.org/10.5555/3454287.3454533.

Zhang, Y. and Chen, X. (2020). Explainable recommendation: A survey and new perspectives. Foundations and Trends® in Information Retrieval, 14(1):1–101. DOI: https://doi.org/10.1561/1500000066.