A sentence similarity-based approach for enhancing entity linking

Authors

DOI:

https://doi.org/10.5753/jbcs.2025.5427

Keywords:

Natural Language Processing, Entity Linking, Linked Open Data, Sentence Encoder, Sentence Similarity, Entity Similarity, Embedding, Disambiguation, Knowledge Graph

Abstract

Entity linking involves associating mentions of entities in natural language texts, such as references to people or locations, with specific entity representations in knowledge graphs like DBpedia or Wikidata. This process is essential in natural language processing tasks, as it aids in disambiguating entities in unstructured data, thereby improving comprehension and semantic processing. However, entity linking faces challenges due to the complexity and ambiguity of natural languages, as well as discrepancies between the forms of textual entity mentions and entity representations. Building upon our previous work, this study extends the E-BELA --Enhanced Embedding-Based Entity Linking Approach, which is based on literal embeddings. We extend our previous work by evaluating E-BELA using a new dataset, conducting a comprehensive analysis of failure cases and limitations, and providing further discussion of our results. E-BELA associates mentions and entity representations using a similarity or distance metric between vector representations of them in a shared vector space. The results suggest that our approach achieves comparable performance to other state-of-the-art methods, while employing a much simpler model, contributing to the field of natural language processing.

Downloads

Download data is not yet available.

References

Bowman, S. R., Angeli, G., Potts, C., and Manning, C. D. (2015). A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632-642, Lisbon, Portugal. Association for Computational Linguistics. DOI: 10.18653/v1/D15-1075.

Cao, N. D., Izacard, G., 0001, S. R., and Petroni, F. (2021). Autoregressive entity retrieval. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. DOI: 10.48550/arXiv.2010.00904.

Caseli, H. d. M., Nunes, M. d. G. V., and Pagano, A. (2024). O que é pln? In Caseli, H. M. and Nunes, M. G. V., editors, Processamento de Linguagem Natural: Conceitos, Técnicas e Aplicações em Português, book chapter 1. BPLN, 2 edition. Available online [link].

Cer, D., Yang, Y., Kong, S., Hua, N., Limtiaco, N., John, R. S., Constant, N., Guajardo-Cespedes, M., Yuan, S., Tar, C., Sung, Y., Strope, B., and Kurzweil, R. (2018). Universal sentence encoder. CoRR, abs/1803.11175. DOI: 10.48550/arXiv.1803.11175.

Chen, L., Zhu, T., Liu, J., Liang, J., and Xiao, Y. (2023). End-to-end entity linking with hierarchical reinforcement learning. Proceedings of the AAAI Conference on Artificial Intelligence, 37(4):4173-4181. DOI: 10.1609/aaai.v37i4.25534.

Colucci, L., Doshi, P., Lee, K.-L., Liang, J., Lin, Y., Vashishtha, I., Zhang, J., and Jude, A. (2016). Evaluating item-item similarity algorithms for movies. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems, CHI EA '16, page 2141–2147, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/2851581.2892362.

Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2019). BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4171-4186, Minneapolis, MN, USA. Association for Computational Linguistics. DOI: 10.18653/V1/N19-1423.

Di Noia, T., Mirizzi, R., Ostuni, V. C., Romito, D., and Zanker, M. (2012). Linked open data to support content-based recommender systems. In Proceedings of the 8th International Conference on Semantic Systems, I-SEMANTICS '12, page 1–8, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/2362499.2362501.

Di Noia, T. and Ostuni, V. C. (2015). Recommender Systems and Linked Open Data, pages 88-113. Springer International Publishing, Cham. DOI: 10.1007/978-3-319-21768-0_4.

Dubey, M., Banerjee, D., Chaudhuri, D., and Lehmann, J. (2018). Earl: Joint entity and relation linking for question answering over knowledge graphs. In The Semantic Web - ISWC 2018, pages 108-126, Cham. Springer International Publishing. DOI: 10.1007/978-3-030-00671-6_7.

Gomes, J., de Mello, R. C., Ströele, V., and de Souza, J. F. (2022). A hereditary attentive template-based approach for complex knowledge base question answering systems. Expert Systems with Applications, 205:117725. DOI: https://doi.org/10.1016/j.eswa.2022.117725.

Heath, T. and Bizer, C. (2011). Linked Data Design Considerations, pages 41-68. Springer International Publishing, Cham. DOI: 10.1007/978-3-031-79432-2_4.

Iyyer, M., Manjunatha, V., Boyd-Graber, J., and Daumé III, H. (2015). Deep unordered composition rivals syntactic methods for text classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1681-1691, Beijing, China. Association for Computational Linguistics. DOI: 10.3115/v1/P15-1162.

Jia, B., Wu, Z., Zhou, P., and Wu, B. (2021a). Entity linking based on sentence representation. Complexity, 2021(1):8895742. DOI: 10.1155/2021/8895742.

Jia, N., Cheng, X., Su, S., and Ding, L. (2021b). Cogcn: Combining co-attention with graph convolutional network for entity linking with knowledge graphs. Expert Systems, 38(1):e12606. DOI: 10.1111/exsy.12606.

Le, P. and Titov, I. (2018). Improving entity linking by modeling latent relations between mentions. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1595-1604, Melbourne, Australia. Association for Computational Linguistics. DOI: 10.18653/v1/P18-1148.

Leng, H., De La Cruz Paulino, C., Haider, M., Lu, R., Zhou, Z., Mengshoel, O., Brodin, P.-E., Forgeat, J., and Jude, A. (2018). Finding similar movies: dataset, tools, and methods. In Proceedings of the 8th International Conference on Semantic Systems, WSCG'2018, pages 115-124, Plzen, Czech Republic. Václav Skala-UNION Agency. DOI: 10.24132/CSRN.2018.2802.15.

Li, H., Yu, W., and Dai, X. (2023). Joint linking of entity and relation for question answering over knowledge graph. Multimedia Tools and Applications, 82(29):44801-44818. DOI: 10.1007/s11042-023-15646-w.

Li, Q., Li, F., Li, S., Li, X., Liu, K., Liu, Q., and Dong, P. (2022). Improving entity linking by introducing knowledge graph structure information. Applied Sciences, 12(5):44801–44818. DOI: 10.3390/app12052702.

Luo, Y.-X., Yang, B.-L., Xu, D.-H., Tian, L.-G., and He, J.-Y. (2023). Entity linking improvement model by deep modeling of sentence semantics. Journal of Network Intelligence, 8(1):224-236. Available online [link].

Mello, R., Jr., J. G., Souza, J., and Ströele, V. (2024). Constructing a kbqa framework: Design and implementation. In Proceedings of the 30th Brazilian Symposium on Multimedia and the Web, pages 89-97, Porto Alegre, RS, Brasil. SBC. DOI: 10.5753/webmedia.2024.243150.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013a). Efficient estimation of word representations in vector space. In 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, Scottsdale, Arizona, USA. Association for Computing Machinery. DOI: 10.48550/arXiv.1301.3781.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. CoRR, abs/1310.4546. DOI: 10.48550/arXiv.1310.4546.

Mirizzi, R., Di Noia, T., Ragone, A., Ostuni, V., and Di Sciascio, E. (2012). Movie recommendation with dbpedia. In Movie recommendation with DBpedia, volume 835, pages 101-112, Bari, Italy. Available online [link].

Naseem, T., Ravishankar, S., Mihindukulasooriya, N., Abdelaziz, I., Lee, Y.-S., Kapanipathi, P., Roukos, S., Gliozzo, A., and Gray, A. (2021). A semantics-aware transformer model of relation linking for knowledge base question answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 256-262, Online. Association for Computational Linguistics. DOI: 10.18653/v1/2021.acl-short.34.

Ngomo, J. G. N., Lopes, G. R., Campos, M. L. M., and Cavalcanti, M. C. R. (2020). An approach for improving dbpedia as a research data hub. In Proceedings of the Brazilian Symposium on Multimedia and the Web, WebMedia '20, page 65–72, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3428658.3431075.

Pereira, I. M. and Ferreira, A. A. (2019). An item-item similarity approach based on linked open data semantic relationship. In Proceedings of the 25th Brazillian Symposium on Multimedia and the Web, WebMedia '19, page 425–432, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3323503.3349547.

Pereira, I. M. and Ferreira, A. A. (2024). E-bela: Enhanced embedding-based entity linking approach. In Proceedings of the 30th Brazilian Symposium on Multimedia and the Web, pages 115-123, Porto Alegre, RS, Brasil. SBC. DOI: 10.5753/webmedia.2024.243160.

Pershina, M., He, Y., and Grishman, R. (2015). Personalized page rank for named entity disambiguation. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 238-243, Denver, Colorado. Association for Computational Linguistics. DOI: 10.3115/v1/N15-1026.

Reimers, N. and Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982-3992, Hong Kong, China. Association for Computational Linguistics. DOI: 10.18653/v1/D19-1410.

Ristoski, P. and Paulheim, H. (2016). Rdf2vec: Rdf graph embeddings for data mining. In The Semantic Web - ISWC 2016, pages 498-514, Cham. Springer International Publishing. DOI: 10.1007/978-3-319-46523-4_30.

Ristoski, P., Rosati, J., Di Noia, T., De Leone, R., and Paulheim, H. (2019). Rdf2vec: Rdf graph embeddings and their applications. Semantic Web, 10(4):721-752. DOI: 10.3233/SW-180317.

Sakor, A., Singh, K., Patel, A., and Vidal, M.-E. (2020). Falcon 2.0: An entity and relation linking tool over wikidata. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM '20, page 3141–3148, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3340531.3412777.

Shen, W., Li, Y., Liu, Y., Han, J., Wang, J., and Yuan, X. (2023). Entity linking meets deep learning: Techniques and solutions. IEEE Transactions on Knowledge; Data Engineering, 35(03):2556-2578. DOI: 10.1109/TKDE.2021.3117715.

Shen, W., Wang, J., and Han, J. (2015). Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering, 27(2):443-460. DOI: 10.1109/TKDE.2014.2327028.

Srinivasan, U. and Mani, C. (2018). Diversity-ensured semantic movie recommendation by applying linked open data. International Journal of Intelligent Engineering and Systems, 11:275-286. DOI: 10.22266/ijies2018.0430.30.

Stankevičius, L. and Lukoševičius, M. (2024). Extracting sentence embeddings from pretrained transformer models. Applied Sciences, 14(19). DOI: 10.3390/app14198887.

Trivedi, P., Maheshwari, G., Dubey, M., and Lehmann, J. (2017). Lc-quad: A corpus for complex question answering over knowledge graphs. In The Semantic Web - ISWC 2017, pages 210-218, Cham. Springer International Publishing. DOI: 10.1007/978-3-319-68204-4_22.

Usbeck, R., Ngomo, A.-C. N., Haarmann, B., Krithara, A., Röder, M., and Napolitano, G. (2017). 7th open challenge on question answering over linked data (qald-7). In Semantic Web Challenges, pages 59-69, Cham. Springer International Publishing. DOI: 10.1007/978-3-319-69146-6_6.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.. DOI: 10.48550/arXiv.1706.03762.

Williams, A., Nangia, N., and Bowman, S. (2018). A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112-1122, New Orleans, Louisiana. Association for Computational Linguistics. DOI: 10.18653/v1/N18-1101.

Yamada, I., Shindo, H., Takeda, H., and Takefuji, Y. (2016). Joint learning of the embedding of words and entities for named entity disambiguation. In CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings, pages 250-259, United States. Association for Computational Linguistics (ACL). DOI: 10.18653/v1/k16-1025.

Yamada, I., Washio, K., Shindo, H., and Matsumoto, Y. (2022). Global entity disambiguation with BERT. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3264-3271, Seattle, United States. Association for Computational Linguistics. DOI: 10.18653/v1/2022.naacl-main.238.

Downloads

Published

2025-08-08

How to Cite

Pereira, Ítalo M., & Ferreira, A. A. (2025). A sentence similarity-based approach for enhancing entity linking. Journal of the Brazilian Computer Society, 31(1), 598–612. https://doi.org/10.5753/jbcs.2025.5427

Issue

Section

Articles