Beyond Recommendations: Intrinsic Evaluation Strategies for Item Embeddings in Recommender Systems

Authors

DOI:

https://doi.org/10.5753/jbcs.2025.5426

Keywords:

Embeddings, Intrinsic Evaluation, Qualitative Evaluation, Recommender Systems, Similarity Tables, Intruder Detection, Autotagging

Abstract

With the constant growth in available information and the widespread adoption of technology, recommender systems have to deal with an ever-growing number of users and items. To alleviate problems of scalability and sparsity that arise with this growth, many recommender systems aim to generate low-dimensional dense representations of items. Among different strategies with this shared goal, e.g., matrix factorization and graph-based techniques, neural embeddings have gained significant attention in recent literature. This type of representation leverages neural networks to learn dense vectors that encapsulate intrinsic meaning. However, most studies proposing embeddings for recommender systems, regardless of the underlying strategy, tend to ignore this property and focus primarily on extrinsic evaluations. This study aims to bridge this gap by presenting a guideline for assessing the intrinsic quality of matrix factorization and neural-based embedding models for collaborative filtering. To enrich the evaluation pipeline, we adapt an intrinsic evaluation task commonly used in Natural Language Processing and propose a novel strategy for evaluating the learned representation in comparison to a content-based scenario. We apply these techniques to established and state-of-the-art recommender models, discussing and comparing the results with those of traditional extrinsic evaluations. Results show how vector representations that do not yield good recommendations can still be useful in other tasks that demand intrinsic knowledge. Conversely, models excelling at generating recommendations may not perform as well in intrinsic tasks. These results underscore the importance of considering intrinsic evaluation, a perspective often overlooked in the literature, and highlight its potential to uncover valuable insights about embedding models.

Downloads

Download data is not yet available.

References

Barkan, O. and Koenigstein, N. (2016). Item2Vec: Neural item embedding for collaborative filtering. In IEEE 26th International Workshop on Machine Learning for Signal Processing, MLSP 2016, pages 1-6, Vietri sul Mare, Italy. IEEE. DOI: 10.1109/MLSP.2016.7738886.

Baroni, M., Dinu, G., and Kruszewski, G. (2014). Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL `14, page 238-247, Baltimore, MD, USA. Association for Computational Linguistics. DOI: 10.3115/v1/P14-1023.

Bengio, Y., Ducharme, R., Vincent, P., and Janvin, C. (2003). A neural probabilistic language model. The Journal of Machine Learning Research, 3:1137-1155. DOI: 10.5555/944919.944966.

Bennett, J. and Lanning, S. (2007). The netflix prize. In Proceedings of KDD Cup and Workshop, KDD '07. Association for Computing Machinery. DOI: 10.1145/1327942.1327945.

Bobadilla, J., Ortega, F., Hernando, A., and Gutiérrez, A. (2013). Recommender systems survey. Knowledge-Based Systems, 46:109-132. DOI: 10.1016/j.knosys.2013.03.012.

Cantador, I., Brusilovsky, P., and Kuflik, T. (2011). 2nd workshop on information heterogeneity and fusion in recommender systems (hetrec 2011). In Proceedings of the 5th ACM conference on Recommender systems, RecSys 2011, New York, NY, USA. ACM.

Caselles-Duprés, H., Lesaint, F., and Royo-Letelier, J. (2018). Word2vec applied to recommendation: hyperparameters matter. In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys `18, pages 352-356, Vancouver, Canada. Association for Computing Machinery. DOI: 10.1145/3240323.3240377.

Chang, C., Zhou, J., Weng, Y., Zeng, X., Wu, Z., Wang, C.-D., and Tang, Y. (2023). KGTN: Knowledge graph transformer network for explainable multi-category item recommendation. Knowledge-Based Systems, 278:110854. DOI: 10.1016/j.knosys.2023.110854.

Chen, H., Wang, Z., Huang, F., Huang, X., Xu, Y., Lin, Y., He, P., and Li, Z. (2022). Generative adversarial framework for cold-start item recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '22, page 2565-2571, Anchorage, AK, USA. Association for Computing Machinery. DOI: 10.1145/3477495.3531897.

CooperUnion (2017). Anime recommendations database. Available at:[link].

de Souza P. Moreira, G., Jannach, D., and da Cunha, A. M. (2019). On the importance of news content representation in hybrid neural session-based recommender systems. IEEE Access, 7:169185-169203. DOI: 10.1109/ACCESS.2019.2954957.

Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7:1-30. DOI: 10.5555/1248547.1248548.

Ding, C., Zhao, Z., Li, C., Yu, Y., and Zeng, Q. (2023). Session-based recommendation with hypergraph convolutional networks and sequential information embeddings. Expert Systems with Applications, 223(119875):1-11. DOI: 10.1016/j.eswa.2023.119875.

Eck, D., Lamere, P., Bertin-Mahieux, T., and Green, S. (2007). Automatic generation of social tags for music recommendation. In Proceedings of the 20th International Conference on Neural Information Processing Systems, NIPS 2007, pages 385-392, Vancouver, Canada. Curran Associates Inc.. DOI: 10.5555/2981562.2981611.

Faruqui, M., Tsvetkov, Y., Rastogi, P., and Dyer, C. (2016). Problems with evaluation of word embeddings using word similarity tasks. In Proceedings of the 1st Workshop on Evaluating Vector Space Representations for NLP, pages 30-35, Berlin, Germany. Association for Computational Linguistics. DOI: 10.18653/v1/W16-2506.

Filho, R. J. R., Wehrmann, J., and Barros, R. C. (2017). Leveraging deep visual features for content-based movie recommender systems. In Proceedings of the 2017 International Joint Conference on Neural Networks, IJCNN 2017, pages 604-611, Anchorage, AK, USA. IEEE. DOI: 10.1109/IJCNN.2017.7965908.

Firan, C. S., Nejdl, W., and Paiu, R. (2007). The benefit of using tag-based profiles. In Proceedings of the 5th Latin American Web Conference, LA-WEB `07, pages 32-41, Santiago, Chile. IEEE Computer Society. DOI: 10.1109/LA-WEB.2007.24.

FU, P., hua LV, J., long MA, S., and jie LI, B. (2017). Attr2vec: a neural network based item embedding method. In Proceedings of the 2nd International Conference on Computer, Mechatronics and Electronic Engineering, CMEE 2017, pages 300-307, Xiamen, China. DEStech Publications. DOI: 10.12783/dtcse/cmee2017/19993.

Gao, Y., Sheng, T., Xiang, Y., Xiong, Y., Wang, H., and Zhang, J. (2023). Chat-REC: Towards interactive and explainable LLMs-augmented recommender system. arXiv:, 2303.14524:1-17. DOI: 10.48550/arXiv.2303.14524.

Gladkova, A. and Drozd, A. (2016). Intrinsic evaluations of word embeddings: What can we do better? In Proceedings of the 1st Workshop on Evaluating Vector Space Representations for NLP, pages 36-42, Berlin, Germany. Association for Computational Linguistics. DOI: 10.18653/v1/W16-2507.

Glider, G. M., Beladia, N., and Kolegraff, N. (2012). Data mining hackathon on big data (7gb) best buy mobile web site. Available at:[link].

Grbovic, M., Radosavljevic, V., Djuric, N., Bhamidipati, N., Savla, J., Bhagwan, V., and Sharp, D. (2015). E-commerce in your inbox: Product recommendations at scale. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD `15, pages 1809-1818, Sydney, Australia. Association for Computing Machinery. DOI: 10.1145/2783258.2788627.

Greenstein-Messica, A., Rokach, L., and Friedman, M. (2017). Session-based recommendations using item embedding. In Proceedings of the 22nd International Conference on Intelligent User Interfaces, IUI `17, pages 629-633, Limassol, Cyprus. Association for Computing Machinery. DOI: 10.1145/3025171.3025197.

Guo, G., Zhang, J., Thalmann, D., and Yorke-Smith, N. (2014). ETAF: An extended trust antecedents framework for trust prediction. In Proceedings of the 2014 International Conference on Advances in Social Networks Analysis and Mining, ASONAM, pages 540-547, Beijing, China. IEEE. DOI: 10.1109/ASONAM.2014.6921639.

Guo, G., Zhang, J., and Yorke-Smith, N. (2013). A novel bayesian similarity measure for recommender systems. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI '13, page 2619-2625. AAAI Press. DOI: 10.5555/2540128.2540506.

Harper, F. M. and Konstan, J. A. (2015). The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TiiS), 5(4). DOI: 10.1145/2827872.

Hasanzadeh, S., Fakhrahmad, S. M., and Taheri, M. (2020). Review-based recommender systems: A proposed rating prediction scheme using word embedding representation of reviews. The Computer Journal, bxaa044(;):1-10. DOI: 10.1093/comjnl/bxaa044.

Hernando, A., Bobadilla, J., and Ortega, F. (2016). A non negative matrix factorization for collaborative filtering recommender systems based on a bayesian probabilistic model. Knowledge-Based Systems, 97(C):188-202. DOI: 10.1016/j.knosys.2015.12.018.

Hidasi, B., Karatzoglou, A., Baltrunas, L., and Tikk, D. (2016). Session-based recommendations with recurrent neural networks. In Proceedings of the International Conference on Learning Representations, ICLR 2016, pages 1-10, San Juan, Puerto Rico. OpenReview. DOI: 10.48550/arXiv.1511.06939.

Hu, Y., Koren, Y., and Volinsky, C. (2008). Collaborative filtering for implicit feedback datasets. In Proceedings of the 8th IEEE International Conference on Data Mining, ICDM `08, pages 263-272, Pisa, Italy. IEEE Computer Society. DOI: 10.1109/ICDM.2008.22.

Júnior, S. M. and Manzato, M. G. (2015). Collaborative filtering based on semantic distance among items. In Proceedings of the 21st Brazilian Symposium on Multimedia and the Web, WebMedia '15, page 53-56, Manaus, Brazil. Association for Computing Machinery. DOI: 10.1145/2820426.2820466.

Khsuro, S., Ali, Z., and Ullah, I. (2016). Recommender systems: Issues, challenges, and research opportunities. In Proceedings of the 7th International Conference on Information Science and Applications, ICISA 2016, pages 1179-1189, Ho Chi Minh, Vietnam. Springer Science+Business Media. DOI: 10.1007/978-981-10-0557-2_112.

Kim, J., Kim, J., Yeo, K., Kim, E., On, K.-W., Mun, J., and Lee, J. (2024). General item representation learning for cold-start content recommendations. arXiv:, 2404.13808:1-14. DOI: 10.48550/arXiv.2404.13808.

Koren, Y., Bell, R., and Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8):30-37. DOI: 10.1109/MC.2009.263.

Le, Q. and Mikolov, T. (2014). Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning, ICML 2014, pages 1188-1196, Beijing, China. JMLR.org. DOI: 10.5555/3044805.3045025.

Lisena, P., Meroño-Peñuela, A., and Troncy, R. (2022). MIDI2vec: Learning MIDI embeddings for reliable prediction of symbolic music metadata. Semantic Web, 13(3):357-377. DOI: 10.3233/SW-210446.

Liu, J., Liu, C., Zhou, P., Ye, Q., Chong, D., Zhou, K., Xie, Y., Cao, Y., Wang, S., You, C., and S.Yu, P. (2023). LLMRec: Benchmarking large language models on recommendation task. arXiv:, 2308.12241:1-13. DOI: 10.48550/arXiv.2308.12241.

Lu, J., Wu, D., Mao, M., Wang, W., and Zhang, G. (2015). Recommender system application developments: A survey. Decision Support Systems, 74:12-32. DOI: 10.1016/j.dss.2015.03.008.

Mikolov, T., Sutskever, I., Chen, K., Conrado, G., and Dan, J. (2013). Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems, NIPS 2013, pages 3111-3119, Stateline, NV, USA. Curran Associates Inc.. DOI: 10.5555/2999792.2999959.

Musto, C., Lops, P., de Gemmis, M., and Semeraro, G. (2017). Semantics-aware recommender systems exploiting linked open data and graph-based features. Knowledge-Based Systems, 136:1-14. DOI: 10.1016/j.knosys.2017.08.015.

Ozsoy, M. G. (2016). From word embeddings to item recommendation. arXiv:, 1601.01356:1-8. DOI: 10.48550/arXiv.1601.01356.

Pazzani, M. J. and Billsus, D. (2007). Content-based recommendation systems. The Adaptive Web, Lecture Notes in Computer Science, vol 4321:325-341. DOI: 10.1007/978-3-540-72079-9_10.

Pires, P. R., Rizzi, B. B., and Almeida, T. A. (2024). Why ignore content? a guideline for intrinsic evaluation of item embeddings for collaborative filtering. In Proceedings of the 30th Brazilian Symposium on Multimedia and the Web, WebMedia 2024, pages 345-354, Juiz de Fora, Brazil. Sociedade Brasileira de Computação. DOI: 10.5753/webmedia.2024.243199.

Qiu, Y., Li, H., Li, S., Jiang, Y., Hu, R., and Yang, L. (2018). Revisiting correlations between intrinsic and extrinsic evaluations of word embeddings. In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, CCL 2018, pages 209-221, Changsha, China. Springer. DOI: 10.1007/978-3-030-01716-3_18.

Řehůřek, R. and Sojka, P. (2010). Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, LREC 2010, pages 45-50, Valletta, Malta. European Language Resources Association (ELRA). DOI: 10.13140/2.1.2393.1847.

Rendle, S., Freudenthaler, C., Gantner, Z., and Schmidt-Thieme, L. (2009). BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, UAI `09, pages 452-461, Montreal, Canada. AUAI Press. DOI: 10.5555/1795114.1795167.

Rendle, S., Krichene, W., Zhang, L., and Anderson, J. (2020). Neural collaborative filtering vs. matrix factorization revisited. In Proceedings of the 14th ACM Conference on Recommender Systems, RecSys '20, pages 240-248, Virtual Event, Brazil. Association for Computing Machinery. DOI: 10.1145/3383313.3412488.

Rendle, S., Krichene, W., Zhang, L., and Koren, Y. (2022). Revisiting the performance of ials on item recommendation benchmarks. In Proceedings of the 16th ACM Conference on Recommender Systems, RecSys '22, pages 427-435, Seattle, WA, USA. Association for Computing Machinery. DOI: 10.1145/3523227.3548486.

Sarwar, B. M., Karypis, G., Konstan, J. A., and Riedl, J. T. (2000). Application of dimensionality reduction in recommender system - a case study. In Proceedings of the 9th WebKDD Workshop on Web Mining for e-commerce, WebKDD `00, pages 1-12, Boston, Massachusetts, USA. Association for Computing Machinery. DOI: 10.21236/ada439541.

Schnabel, T., Labutov, I., Mimno, D., and Joachims, T. (2015). Evaluation methods for unsupervised word embeddings. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, pages 298-307, Lisbon, Portugal. Association for Computational Linguistics. DOI: 10.18653/v1/D15-1036.

Senel, L. K., Utlu, I., Yücesoy, V., Koç, A., and Çukur, T. (2018). Semantic structure and interpretability of word embeddings. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(10):1769-1779. DOI: 10.1109/TASLP.2018.2837384.

Shani, G. and Gunawardana, A. (2011). Evaluating recommendation systems. In Ricci, F., Rokach, L., Shapira, B., and Kantor, P. B., editors, Recommender Systems Handbook, chapter 8, pages 257-259. Springer US, New York, NY, USA. DOI: 10.1007/978-0-387-85820-3.

Shenbin, I., Alekseev, A., Tutubalina, E., Malykh, V., and Nikolenko, S. I. (2020). RecVAE: A new variational autoencoder for top-n recommendations with implicit feedback. In Proceedings of the 13th International Conference on Web Search and Data Mining, WSDM '20, page 528-536, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3336191.3371831.

Sidana, S., Trofimov, M., Horodnytskyi, O., Laclau, C., Maximov, Y., and Amini, M.-R. (2021). User preference and embedding learning with implicit feedback for recommender systems. Data Mining and Knowledge Discovery, 35:568-592. DOI: 10.1007/s10618-020-00730-8.

Siswanto, A. V., Tjong, L., and Saputra, Y. (2018). Simple vector representations of e-commerce products. In 2018 International Conference on Asian Language Processing, IALP 2018, pages 368-372, Bandung, Indonesia. IEEE. DOI: 10.1109/IALP.2018.8629245.

Song, Y., Zhang, L., and Giles, C. L. (2011). Automatic tag recommendation algorithms for social recommender systems. ACM Transactions on the Web, 4(1):4:1-4:31. DOI: 10.1145/1921591.1921595.

Tang, J. and Wang, K. (2018). Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining, WSDM `18, pages 565-573, Marina Del Rey, CA, USA. Association for Computing Machinery. DOI: 10.1145/2939672.2939673.

Vasile, F., Smirnova, E., and Conneau, A. (2016). Meta-prod2vec: Product embeddings using side-information for recommendation. In Proceedings of the 10th ACM Conference on Recommender Systems, RecSys `16, pages 225-232, Boston, Massachusetts, USA. Association for Computing Machinery. DOI: 10.1145/2959100.2959160.

Wang, D., Xu, G., and Deng, S. (2017). Music recommendation via heterogeneous information graph embedding. In Proceedings of the 2017 International Joint Conference on Neural Networks, IJCNN 2017, pages 596-603, Anchorage, AK, USA. IEEE. DOI: 10.1109/IJCNN.2017.7965907.

Wang, J. and Lv, J. (2020). Tag-informed collaborative topic modeling for cross domain recommendations. Knowledge-Based Systems, 203:106119. DOI: 10.1016/j.knosys.2020.106119.

Wang, Q., Yin, H., Wang, H., Nguyen, Q. V. H., Huang, Z., and Cui, L. (2019). Enhancing collaborative filtering with generative augmentation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '19, page 548-556, Anchorage, AK, USA. Association for Computing Machinery. DOI: 10.1145/3292500.3330873.

Wang, T., Brovman, Y. M., and Madhvanath, S. (2021). Personalized embedding-based e-commerce recommendations at ebay. arXiv:, 2102.06156:1-9. DOI: 10.48550/arXiv.2102.06156.

Werneck, H., Silva, N., Viana, M. C., ao, F. M., Pereira, A. C. M., and Rocha, L. (2020). A survey on point-of-interest recommendation in location-based social networks. In Proceedings of the Brazilian Symposium on Multimedia and the Web, WebMedia '20, page 185-192, São Luíz, Brazil. Association for Computing Machinery. DOI: 10.1145/3428658.3430970.

Yang, D., Li, N., Zou, L., and Ma, H. (2022). Lexical semantics enhanced neural word embeddings. Knowledge-Based Systems, 252:109298. DOI: 10.1016/j.knosys.2022.109298.

Yu, J., Yin, H., Xia, X., Chen, T., Li, J., and Huang, Z. (2024). Self-supervised learning for recommender systems: A survey. IEEE Transactions on Knowledge and Data Engineering, 36:335-355. DOI: 10.1109/TKDE.2023.3282907.

Zarzour, H., Al-Sharif, Z. A., and Jararweh, Y. (2019). RecDNNing: a recommender system using deep neural network with user and item embeddings. In Proceedings of the 10th International Conference on Information and Communication Systems, ICICS 2019, pages 99-103, Irbid, Jordan. IEEE. DOI: 10.1109/IACS.2019.8809156.

Zhang, F., Yuan, N. J., Lian, D., Xie, X., and Ma, W.-Y. (2016). Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD `16, pages 353-362, San Francisco, CA, USA. Association for Computing Machinery. DOI: 10.1145/2939672.2939673.

Zhang, S., Yao, L., Sun, A., and Tay, Y. (2019). Deep learning based recommender system: A survey and new perspectives. ACM Comput. Surv., 52(1):5:1-5:35. DOI: 10.1145/3285029.

Zhao, X., Wang, M., Zhao, X., Li, J., Zhou, S., Yin, D., Li, Q., Tang, J., and Guo, R. (2023). Embedding in recommender systems: A survey. arXiv:, 2310.18608:1-42. DOI: 10.48550/arXiv.2310.18608.

Ziegler, C.-N., McNee, S. M., Konstan, J. A., and Lausen, G. (2005). Improving recommendation lists through topic diversification. In Proceedings of the 14th International Conference on World Wide Web, WWW '05, page 22-32, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/1060745.1060754.

Zykov, R., Artem, N., and Alexander, A. (2022). Retailrocket recommender system dataset. DOI: 10.34740/KAGGLE/DSV/4471234.

Downloads

Published

2025-08-18

How to Cite

Pires, P. R., & Almeida, T. A. (2025). Beyond Recommendations: Intrinsic Evaluation Strategies for Item Embeddings in Recommender Systems. Journal of the Brazilian Computer Society, 31(1), 655–673. https://doi.org/10.5753/jbcs.2025.5426

Issue

Section

Articles