Ecore4PROV-DM: A Metamodel for Enhancing Data Provenance Adoption in Information Systems

Authors

DOI:

https://doi.org/10.5753/isys.2024.4691

Keywords:

Data Provenance, Model-Driven Engineering, W3C PROV, W3C PROV-DM, Metamodel Evaluation

Abstract

Effective management of data provenance is essential in Information Systems, particularly for data-intensive applications. Despite the W3C PROV family of documents establishing a standard for representing provenance, integrating this information into software development processes remains a significant challenge. This paper addresses the problem by introducing the Ecore4PROV-DM metamodel, developed using Model-Driven Engineering techniques to align with the W3C PROV data model (PROV-DM). The metamodel's application is demonstrated through real-world scenarios, including the Urban Observatory project at Newcastle University. Evaluated using a subset of the Metamodel Quality Requirements and Evaluation (MQuaRE) framework, focusing on three key quality requirements, Ecore4PROV-DM exhibits high accuracy and completeness, making it a robust tool for provenance modeling. By bridging the gap between the conceptual richness of W3C PROV-DM and practical implementation needs, Ecore4PROV-DM facilitates precise provenance representation and seamless integration into diverse Information Systems.

Downloads

Download data is not yet available.

Author Biographies

Marcos Alves Vieira, Instituto Federal Goiano (IF Goiano) – Campus Iporá / Universidade Federal de Goiás (UFG) - Instituto de Informática (INF)

Professor at the Instituto Federal Goiano (IF Goiano) - Campus Iporá. He holds a Bachelor's degree in Informatics from the Instituto Federal de Goiás (IFG), a Master's degree and a Ph.D. in Computer Science from the Instituto de Informática (INF) of the Universidade Federal de Goiás (UFG). During his master's degree, he developed research focused on Smart Spaces for Ubiquitous Computing and Model-Driven Engineering. During his doctorate, he applied Model-Driven Engineering techniques to develop a metamodel enabling the instantiation of Data Provenance models in accordance with the W3C PROV standard, which was used as the basis for building a graphical provenance modeling tool.

Gislainy Crisostomo Velasco, Universidade Federal de Goiás (UFG) - Instituto de Informática (INF)

Graduated in Information Systems from the Federal University of Goiás (2018). Master's at Computer Science at UFG. He is currently Staff Software Engineer - Tropix, Inc.. Has experience in Computer Science, focusing on Computer Science, acting on the following subjects: blockchain, smart contracts, Non-Fungible Token (Nft)

Sergio T. Carvalho, Universidade Federal de Goiás (UFG) - Instituto de Informática (INF)

Sergio Teixeira de Carvalho has experience in research and development projects in Computer Science with an emphasis on computer systems, distributed systems and software engineering. He conducts research in the fields of ubiquitous computing, software architecture, digital health and computing applied to health (mobile health applications, gamification and game development). He holds a PhD in Computer Science from the Fluminense Federal University (UFF), a master's degree in Applied Computing and Automation, also from the UFF, and a bachelor's degree in Computer Science from the Federal University of Goiás (UFG). He is currently an associate professor at the Institute of Informatics (INF), UFG, Brazil, working in the posgraduate program in Computer Science (doctoral and master's advisor), in the specialization in Digital Health (postgraduate lato sensu studies), and in graduate studies in Computer Science courses. He is full member of ACM (Association for Computing Machinery), SBC (Brazilian Computer Society) and SBIS (Brazilian Society of Health Informatics). He is member of the Board of Directors of the INF/UFG. He served as Director of INF/UFG from jun/2017 to jun/2021, having served also as Vice Director from jun/2013 to jun/2017. He has taught in various colleges and university centres in Computer Science area. He served also as a Director of the Information Systems and the Technological Support, both Information Technology areas of the Court of Justice of the State of Goiás/Brazil. He was also Vice Coordinator of the Health Information Governance Commission (CGIS/UFG) from aug/2022 to jun/2023. For more than seventeen years, he worked as an Information Technology professional in private companies and public institutions at the three levels of government (municipal, state and federal), acting as director of software systems development, director of datacenter support, manager and coordinator of software projects, and in the areas of software development, database administration and network and systems administration.

References

ARDC, Australian Research Data Commons (2022). Data Provenance. Disponível em: [link]. Acesso em: 16 abr. 2024.

Bastin, L., Reynolds, O., Garcia-Dominguez, A., and Sprinks, J. (2023). Facilitating provenance documentation with a model-driven-engineering approach. In EGU General Assembly 2023, pages 24–28, Vienna, Austria. EGU23-8321.

Bruel, J. M., Combemale, B., Guerra, E., Jézéquel, J.-M., Kienzle, J., de Lara, J., Mussbacher, G., Syriani, E., and Vangheluwe, H. (2018). Model transformation reuse across metamodels. In Rensink, A. and Sánchez Cuadrado, J., editors, Theory and Practice of Model Transformation, pages 92–109, Cham. Springer International Publishing.

Bucchiarone, A., Cabot, J., Paige, R. F., and Pierantonio, A. (2020). Grand challenges in model-driven engineering: an analysis of the state of the research. Software and Systems Modeling, 19(1):5–13.

Callahan, S. P., Freire, J., Santos, E., Scheidegger, C. E., Silva, C. T., and Vo, H. T. (2006). Vistrails: visualization meets data management. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD ’06, page 745–747, New York, NY, USA. Association for Computing Machinery.

Community, T. G. (2022). The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res., 50(W1):W345–W351.

Gil, Y., Miles, S., Belhajjame, K., Deus, H., Garijo, D., Klyne, G., Missier, P., Soiland-Reyes, S., and Zednik, S. (2013). PROV Model Primer.

Glavic, B. (2021). Data provenance. Foundations and Trends® in Databases, 9(3-4):209–441.

Herschel, M., Diestelkämper, R., and Ben Lahmar, H. (2017). A survey on provenance: What for? What form? What from? The VLDB Journal, 26(6):881–906.

Hu, R., Yan, Z., Ding, W., and Yang, L. T. (2020). A survey on data provenance in IoT. World Wide Web, 23(2):1441–1463.

ISO Central Secretary (2023). Geographic information – Metadata – Part 3: XML schema implementation for fundamental concepts. Standard, International Organization for Standardization, Geneva, CH.

Kinderen, S. D., Kaczmarek-Hess, M., Ma, Q., and Razo-Zapata, I. S. (2017). Towards Meta Model Provenance: A Goal-Driven Approach to Document the Provenance of Meta Models. In Poels, G., Gailly, F., Asensio, E. S., and Snoeck, M., editors, 10th IFIP Working Conference on The Practice of Enterprise Modeling (PoEM), volume LNBIP-305 of The Practice of Enterprise Modeling, pages 49–64, Leuven, Belgium. Springer International Publishing. Part 1: Regular Papers.

Kudo, T. N. (2021). A metamodel for aligning requirements standards and testing standards and a framework for evaluating metamodels [in Portuguese]. PhD thesis, Universidade Federal de São Carlos, São Carlos – SP, Brazil.

Kudo, T. N., Bulcão Neto, R. F., and Vincenzi, A. M. R. (2020a). Toward a Metamodel Quality Evaluation Framework: Requirements, Model, Measures, and Process. In Proceedings of the XXXIV Brazilian Symposium on Software Engineering, SBES ’20, page 102–107, New York, NY, USA. Association for Computing Machinery.

Kudo, T. N., Bulcão-Neto, R. F., and Vincenzi, A. M. R. (2020b). Metamodel Quality Requirements and Evaluation (MQuaRE). Technical report, Departamento de Computação, UFScar, São Carlos-SP, Brazil. v 2.0.

López-Fernández, J. J., Cuadrado, J. S., Guerra, E., and de Lara, J. (2015). Example-driven meta-model development. Software & Systems Modeling, 14(4):1323–1347.

Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E. A., Tao, J., and Zhao, Y. (2006). Scientific workflow management and the kepler system. Concurr. Comput., 18(10):1039–1065.

Madiot, F., Goubet, L., Begaudeau, S., Chauvin, M., Musset, J., and Pupier, A. (2024). Eclipse Acceleo Wiki. Disponível em: [link]. Acesso em: 16 abr. 2024.

Madiot, F. and Paganelli, M. (2015). Eclipse sirius demonstration. P&D@ MoDELS, 1554:9–11.

Moreau, L. (2017). PROV-Template: A Quick Start.

Moreau, L., Batlajery, B. V., Huynh, T. D., Michaelides, D., and Packer, H. (2018). A templating system to generate provenance. IEEE Transactions on Software Engineering, 44(2):103–121.

Moreau, L., Missier, P., Belhajjame, K., B’Far, R., Cheney, J., Coppens, S., Cresswell, S., Gil, Y., Groth, P., Lebo, G. K. T., McCusker, J., Miles, S., Myers, J., and Sahoo, S. (2013a). PROV-DM: The PROV Data Model.

Moreau, L., Missier, P., Cheney, J., and Soiland-Reyes, S. (2013b). PROV-N: The Provenance Notation.

Pérez, B., Rubio, J., and Sáenz-Adán, C. (2018). A systematic review of provenance systems. Knowledge and Information Systems, 57(3):495–543.

Rodrigues da Silva, A. (2015). Model-driven engineering: A survey supported by the unified conceptual model. Computer Languages, Systems & Structures, 43:139–155.

Schmidt, D. C. (2006). Guest editor’s introduction: Model-driven engineering. Computer, 39(2):0025–31.

Steinberg, D., Budinsky, F., Merks, E., and Paternostro, M. (2008). EMF: Eclipse Modeling Framework. Pearson Education, Boston.

Velasco, G. C., Vieira, M. A., and Carvalho, S. T. (2023). Evaluation of a high-level metamodel for developing smart contracts on the ethereum virtual machine. In Anais do VI Workshop em Blockchain: Teoria, Tecnologias e Aplicações, pages 29–42, Porto Alegre, RS, Brasil. SBC.

Vieira, M. A. and Carvalho, S. T. (2024). MDE-Based Graphical Tool for Modeling Data Provenance According to the W3C PROV Standard. In Proceedings of the 12th International Conference on Model-Based Software and Systems Engineering - MODELSWARD, pages 141–148. INSTICC, SciTePress.

Völter, M., Stahl, T., Bettin, J., Haase, A., and Helsen, S. (2013). Model-driven software development: technology, engineering, management. John Wiley & Sons.

Wolf, M., Kunze, J. A., Lagoze, C., and Weibel, D. S. (1998). Dublin Core Metadata for Resource Discovery. RFC 2413.

Wolstencroft, K., Haines, R., Fellows, D., Williams, A., Withers, D., Owen, S., Soiland-Reyes, S., Dunlop, I., Nenadic, A., Fisher, P., Bhagat, J., Belhajjame, K., Bacall, F., Hardisty, A., Nieva de la Hidalga, A., Balcazar Vargas, M. P., Sufi, S., and Goble, C. (2013). The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Research, 41(W1):W557–W561.

Downloads

Published

2024-12-26

How to Cite

Alves Vieira, M., Crisostomo Velasco, G., & Carvalho, S. (2024). Ecore4PROV-DM: A Metamodel for Enhancing Data Provenance Adoption in Information Systems. ISys - Brazilian Journal of Information Systems, 17(1), 13:1 – 13:31. https://doi.org/10.5753/isys.2024.4691

Issue

Section

Extended versions of selected articles