JFUSE: Json FUll Schema Extractor

Authors

DOI:

https://doi.org/10.5753/jidm.2026.5748

Keywords:

schema discovering, schema extraction, JSON, metamodel, tagged union, enumeration

Abstract

Lately, we have witnessed a flood of data generated by several data-centric applications, and the generated data are available in a large fashion of formats.
However, those data are mostly weakly structured, irregular, and incomplete; they do not follow a predefined schema. A challenging task is to understand how those data are organized and structured. JSON has become a trendy format for data-centric applications to store and share data. Its success is due to embodying structure and data in the same representation.
This makes JSON documents loosely coupled with schemas. Still, schemas are essential for applications to deal with the data more efficiently. In this paper, we propose JFUSE, a tool to deal with the problem of discovering a schema from JSON collections. Besides inferring basic types (e.g., atomic types, arrays, and objects), JFUSE also discovers fields that represent keys in the collection, fields minimum/maximum constraint values, enumeration, tagged unions, metadata as data, objects as collections, and arrays as tuples. We propose a metamodel that can be easily transformed into any schema language (e.g., JSON Schema). Our experiments show that the proposed approach infers concise and correct schemas from (huge) JSON collections.basic types (e.g., atomic types, arrays, and objects), JFUSE also discovers fields that represent keys in the collection, fields minimum/maximum constraint values, enumeration, tagged unions, metadata as data, objects as collections, and arrays as tuples. We propose a metamodel that can be easily transformed into any schema language (e.g., JSON Schema). Our experiments show that the proposed approach infers concise and correct schemas from (huge) JSON collections.

Downloads

Download data is not yet available.

References

Abdelhedi, F., Brahim, A. A., Rajhi, H., Ferhat, R. T., and Zurfluh, G. (2021). Automatic extraction of a document-oriented nosql schema. In ICEIS (1), pages 192-199. DOI: dx.doi.org/10.5220/0010433501920199.

Baazizi, M.-A., Colazzo, D., Ghelli, G., and Sartiani, C. (2019). Parametric schema inference for massive JSON datasets. The VLDB Journal, 28:497-521. DOI: doi.org/10.1007/s00778-018-0532-7.

Banhara, N., Schreiner, G. A., da Silva Feitosa, S., and Duarte, D. (2024). Enumeration, tagged unions, tuples, and collections: A novel approach to extracting json schema. In Simposio Brasileiro de Banco de Dados (SBBD), pages 234-246. SBC. DOI: doi.org/10.5753/sbbd.2024.240239.

Bell, G., Hey, T., and Szalay, A. (2009). Beyond the data deluge. Science, 323(5919):1297-1298. DOI: doi.org/10.1126/science.1170411.

Bourhis, P., Reutter, J. L., Suarez, F., and Vrgoc, D. (2017). JSON: data model, query languages and schema specification. In Proceedings of the 36th ACM SIGMOD-SIGACT. DOI: doi.org/10.1145/3034786.3056120.

Canovas Izquierdo, J. L. and Cabot, J. (2013). Discovering implicit schemas in json data. In Web Engineering: 13th International Conference, ICWE 2013, Aalborg, Denmark, July 8-12, 2013. Proceedings 13, pages 68-83. Springer. DOI: doi.org/10.1007/978-3-642-39200-9_8.

Frozza, A. A., dos Santos Mello, R., and da Costa, F. d. S. (2018). An approach for schema extraction of JSON and extended JSON document collections. In IRI. IEEE. DOI: doi.org/10.1109/IRI.2018.00060.

Kellou-Menouer, K., Kardoulakis, N., Troullinou, G., Kedad, Z., Plexousakis, D., and Kondylakis, H. (2022). A survey on semantic schema discovery. The VLDB Journal, 31(4):675-710. DOI: doi.org/10.1007/s00778-021-00717-x.

Klessinger, S., Klettke, M., Storl, U., and Scherzinger, S. (2023). Extracting JSON schemas with tagged unions. arXiv preprint arXiv:2306.07085. DOI: doi.org/10.48550/arXiv.2306.07085.

Namba, J. (2021). Enhancing JSON schema discovery by uncovering hidden data. In VLDB 2021 PhD Workshop.

Peng, D., Cao, L., and Xu, W. (2011). Using json for data exchanging in web service applications. Journal of Computational Information Systems, 7(16):5883-5890.

Pezoa, F., Reutter, J. L., Suarez, F., Ugarte, M., and Vrgoc, D. (2016). Foundations of json schema. In International World Wide Web Conferences, WWW '16. DOI: 10.1145/2872427.2883029.

Spoth, W., Kennedy, O., Lu, Y., Hammerschmidt, B., and Liu, Z. H. (2021). Reducing ambiguity in JSON schema discovery. In Proceedings of the 2021 SIGMOD. DOI: doi.org/10.1145/3448016.3452801.

Yun, J., Tak, B., and Han, W.-S. (2024). Recg: Bottom-up json schema discovery using a repetitive cluster-and-generalize framework. Proc. VLDB Endow., 17(11):3538-3550. DOI: 10.14778/3681954.3682019.

Downloads

Published

2026-03-13

How to Cite

Benhara, N., Duarte, D., Schreiner, G. A., & Feitosa, S. (2026). JFUSE: Json FUll Schema Extractor. Journal of Information and Data Management, 17(1), 1–9. https://doi.org/10.5753/jidm.2026.5748

Issue

Section

SBBD 2024 Full papers - Extended papers