BENCH4T3: A Framework to Create Benchmarks for Text-to-Triples Alignment Generation

Authors

DOI:

https://doi.org/10.5753/jbcs.2026.5809

Keywords:

Semantic Web RDF Triples to Text Large Language Models

Abstract

Integrating Large Language Models (LLMs) with Knowledge Graphs (KGs) can significantly enhance their capabilities, leveraging LLMs' text generation skills with KGs' explanatory power. However, establishing this connection is challenging and demands proper alignment between unstructured texts and triples. Building benchmarks demands massive human effort in data curation and translation for non-English languages. The demand for adequate benchmarks for validation purposes negatively impacts research advancements. This study proposes an end-to-end framework to guide the automatic construction of text-to-triple alignment benchmarks for any language, using KGs as input. Our solution extracts relations from input triples and processes them to create accurately mapped texts. The proposed pipeline utilizes data curation through prompt engineering and data augmentation to enhance diversity in the generated examples. We experimentally evaluate our framework for creating a bimodal representation of RDF triples and natural language texts, assessing its ability to generate natural language from these triples. A key focus is on developing a benchmark for the underrepresented Portuguese language, facilitating the construction of models that connect structured data (triples) with text. Our solution is suited to creating a benchmark to improve alignment between KG triples and text data. The results indicate that the generated benchmark outperforms the results of existing solutions. The generative approach benefits from our Portuguese benchmark, achieving competitive results compared to established literature benchmarks. Our solution enables automatic generation of benchmarks for aligning triples and text.

Downloads

Download data is not yet available.

References

Agarwal, O., Ge, H., Shakeri, S., and Al-Rfou, R. (2021). Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training. In Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tur, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., and Zhou, Y., editors, Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3554-3565, Online. Association for Computational Linguistics. DOI: 10.18653/v1/2021.naacl-main.278.

Chico, V. J. S. and dos Reis, J. C. (2024). Learning knowledge representation by aligning text and triples via finetuned pretrained language models. In Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 2: KEOD, pages 51-62. INSTICC, SciTePress. DOI: 10.5220/0013015100003838.

Fan, W., Ding, Y., Ning, L., Wang, S., Li, H., Yin, D., Chua, T.-S., and Li, Q. (2024). A survey on rag meeting llms: Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD '24, page 6491–6501, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3637528.3671470.

Ferreira, T. C., Gardent, C., Ilinykh, N., Van Der Lee, C., Mille, S., Moussallem, D., and Shimorina, A. (2020). The 2020 bilingual, bi-directional webnlg+ shared task overview and evaluation results (webnlg+ 2020). In Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+). DOI: 10.5281/zenodo.6552785.

Gardent, C., Shimorina, A., Narayan, S., and Perez-Beltrachini, L. (2017). Creating training corpora for NLG micro-planners. In Barzilay, R. and Kan, M.-Y., editors, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 179-188, Vancouver, Canada. Association for Computational Linguistics. DOI: 10.18653/v1/P17-1017.

Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., and Liu, T. (2024). A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst. Just Accepted. DOI: 10.1145/3703155.

Järvelin, K. and Kekäläinen, J. (2002). Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst., 20(4):422–446. DOI: 10.1145/582415.582418.

Ji, S., Pan, S., Cambria, E., Marttinen, P., and Yu, P. S. (2022). A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems, 33(2):494-514. DOI: 10.1109/TNNLS.2021.3070843.

Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., de las Casas, D., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Scao, T. L., Lavril, T., Wang, T., Lacroix, T., and Sayed, W. E. (2023). Mistral 7b. DOI: 10.48550/arxiv.2310.06825.

Lieb, A. and Goel, T. (2024). Student interaction with newtbot: An llm-as-tutor chatbot for secondary physics education. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, CHI EA '24, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3613905.3647957.

Lin, C.-Y. (2004). ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74-81, Barcelona, Spain. Association for Computational Linguistics. Available at: [link].

Mihindukulasooriya, N., Tiwari, S., Enguix, C. F., and Lata, K. (2023). Text2kgbench: A benchmark for ontology-driven knowledge graph generation from text. In Payne, T. R., Presutti, V., Qi, G., Poveda-Villalón, M., Stoilos, G., Hollink, L., Kaoudi, Z., Cheng, G., and Li, J., editors, The Semantic Web - ISWC 2023, pages 247-265, Cham. Springer Nature Switzerland. DOI: 10.1007/978-3-031-47243-5_14.

Min, B., Ross, H., Sulem, E., Veyseh, A. P. B., Nguyen, T. H., Sainz, O., Agirre, E., Heintz, I., and Roth, D. (2023). Recent advances in natural language processing via large pre-trained language models: A survey. ACM Comput. Surv., 56(2). DOI: 10.1145/3605943.

Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., and Wu, X. (2024). Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering, 36(7):3580-3599. DOI: 10.1109/TKDE.2024.3352100.

Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: a method for automatic evaluation of machine translation. In Isabelle, P., Charniak, E., and Lin, D., editors, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311-318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics. DOI: 10.3115/1073083.1073135.

Pérez, J., Arenas, M., and Gutierrez, C. (2009). Semantics and complexity of sparql. ACM Trans. Database Syst., 34(3). DOI: 10.1145/1567274.1567278.

Thirunavukarasu, A. J., Ting, D. S. J., Elangovan, K., Gutierrez, L., Tan, T. F., and Ting, D. S. W. (2023). Large language models in medicine. Nature Medicine, 29(8):1930-1940. DOI: 10.1038/s41591-023-02448-8.

Wang, H. and Na, T. (2024). Rethinking e-commerce search. SIGIR Forum, 57(2). DOI: 10.1145/3642979.3643007.

Wang, L., Yang, N., Huang, X., Yang, L., Majumder, R., and Wei, F. (2024). Multilingual e5 text embeddings: A technical report. DOI: 10.48550/arxiv.2402.05672.

Zhang, L. and Braun, D. (2024). Twente-BMS-NLP at PerspectiveArg 2024: Combining bi-encoder and cross-encoder for argument retrieval. In Ajjour, Y., Bar-Haim, R., El Baff, R., Liu, Z., and Skitalinskaya, G., editors, Proceedings of the 11th Workshop on Argument Mining (ArgMining 2024), pages 164-168, Bangkok, Thailand. Association for Computational Linguistics. DOI: 10.18653/v1/2024.argmining-1.17.

Zhu, Y., Lu, S., Zheng, L., Guo, J., Zhang, W., Wang, J., and Yu, Y. (2018). Texygen: A benchmarking platform for text generation models. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR '18, page 1097–1100, New York, NY, USA. Association for Computing Machinery. DOI: 10.1145/3209978.3210080.

Downloads

Published

2026-02-06

How to Cite

Chico, V. J. S., Regino, A. G., & dos Reis, J. C. (2026). BENCH4T3: A Framework to Create Benchmarks for Text-to-Triples Alignment Generation. Journal of the Brazilian Computer Society, 32(1), 85–101. https://doi.org/10.5753/jbcs.2026.5809

Issue

Section

Regular Issue