APProve: Facilitating Active Monitoring, Versioning and Provenance Tracking of Data Management Plans
DOI:
https://doi.org/10.5753/jidm.2025.4358Keywords:
Data Management Plans, FAIR Data Principles, Versioning, Data Provenance, Data GovernanceAbstract
As modern societies become increasingly data-driven, reliance on research data generated by complex experimental setups and scientific systems has grown. However, applying this scientific knowledge to diverse contexts can be challenging. This paper introduces the Active Plans Provenance (APProve) framework, which offers solutions and mechanisms to monitor variations and trace the lineage of Data Management Plans (DMPs) in a simple and dynamic manner. APProve facilitates the synchronized evolution of DMPs with research projects, ensuring accessibility, shareability, and long-term maintainability for researchers. To assist researchers in tracking DMP versions, we developed a loosely coupled architecture that integrates seamlessly with traditional DMP generation tools. To evaluate the framework's effectiveness, we conducted two distinct experiments, both involving DMPs generated by the ARGOS system. The initial experiment assessed APProve functionalities using a static DMP from the VODAN BR project. The second experiment analyzed an a-DMP from the OpenSoils platform project, a Brazilian soil data governance system. APProve empowers users to monitor changes, visualize version histories, and trace retrospective provenance related to DMP modifications. Additionally, it simplifies the DMP import process and provides comprehensive project visualization, including automated comparison of DMP versions across projects. The experiments demonstrated the framework's efficacy in tracking and visualizing DMP version histories, thereby enhancing the management and evolution of these plans. In the VODAN BR case, APProve captured provenance details and enabled comparisons across revisions. For OpenSoils, the framework facilitated dynamic updates, ensuring alignment with ongoing project changes and highlighting discrepancies. Both cases confirmed APProve’s ability to improve accessibility and usability for researchers.
Downloads
References
Abraham, R., Schneider, J., and vom Brocke, J. (2019). Data governance: A conceptual framework, structured review, and research agenda. International Journal of Information Management, 49:424–438. DOI: https://doi.org/10.1016/j.ijinfomgt.2019.07.008.
Borges, V. and Campos, M. L. M. (2023). Vodan br – uma plataforma de apoio para dados covid-19 seguindo os princípios fair.
Campos, M. L. M., e. a. (2020). Vodan brazil - the brazilian experience at the virus outbreak data network.
Cardoso, J., Miksa, T., and Borbinha, J. (2018). Debunking active data management plans. In 2018 IEEE International Conference on Big Data (Big Data), pages 5308–5310. DOI: 10.1109/BigData.2018.8621860.
Council, I. S. (2024). We bring the world together through the power of science.
Cruz, S. M. S., Ceddia, M. B., Tàvora Miranda, R. C., Rizzo, G., Klinger, F., Cerceau, R., Mesquita, R., Cerceau, R., Marinho, E. C., Schmitz, E. A., Sigette, E., and Cruz, P. V. (2018). Data provenance in agriculture. In Belhajjame, K., Gehani, A., and Alper, P., editors, Provenance and Annotation of Data and Processes, pages 257–261, Cham. Springer International Publishing.
Cruz, S. M. S. d., Campos, M. L. M., and Mattoso, M. (2009). Towards a taxonomy of provenance in scientific workflow management systems. In 2009 Congress on Services - I, pages 259–266. DOI: 10.1109/SERVICES-I.2009.18.
da Cruz, S. M. S., Ceddia, M. B., Schmitz, E. A., Rizzo, G. S., Miranda, R. C. T., Cruz, S. O., Correa, A. C., Klinger, F., Marinho, E., and Cruz, P. V. (2018). Towards an e-infrastructure for open science in soils security. DOI: 10.5753/bresci.2018.3273.
de Oliveira, N. Q., Borges, V., Rodrigues, H. F., Campos, M. L. M., and Lopes, G. R. (2022). A practical approach of actions for fairification workflows. In Garoufallou, E., Ovalle-Perandones, M.-A., and Vlachidis, A., editors, Metadata and Semantic Research, pages 94–105, Cham. Springer International Publishing.
de Oliveira Veiga, V. S., Henning, P., Dib, S., Dib, S., da Costa Lima, J., da Silva, L. O. B., and Pires, L. F. (2019). Liinc em Revista, 15(2). DOI: 10.18617/liinc.v15i2.5030.
de Souza, D. L., Zambalde, A. L., Mesquita, D. L., de Souza, T. A., and da Silva, N. L. C. (2020). A perspectiva dos pesquisadores sobre os desafios da pesquisa no brasil. Educação e Pesquisa, 16.
Henning, P., da Silva, L. O. B., Pires, L. F., van Sinderen, M., and Moreira, J. L. R. (2021). The fairness of data management plans: an assessment of some european dmps. Revista Eletrônica de Comunicação, Informação amp; Inovação em Saúde, 15(3). DOI: 10.29397/reciis.v15i3.2270.
Jones, S., Pergl, R., Hooft, R., Miksa, T., Samors, R., Ungvari, J., Davis, R. I., and Lee, T. (2020). Data management planning: How requirements and solutions are beginning to converge. Data Intelligence, 2(1-2):208–219. DOI: 10.1162/dinta00043.
Mattoso, M., Werner, C., Travassos, G. H., Braganholo, V., Ogasawara, E., Oliveira, D., da Cruz, S. M. S., Martinho, W., and Murta, L. (2010). Towards supporting the life cycle of large scale scientific experiments. 5(1):79–92. DOI: 10.1504/IJBPIM.2010.033176.
Miksa, T., e. a. (2021). Application profile for machine-actionable data management plans. Data Science Journal, 20.
Moreau, L. (2010). The foundations for provenance on the web. Foundations and Trends® in Web Science, 2(2–3):99–241. DOI: 10.1561/1800000010.
Moreau, L., M. P. e. a. (2013). Prov-dm: The prov data model. Papadopoulou, E., Kakaletris, G., Tziotzios, D., Moa, H., and Hasan, A. (2020). Argos: a collaborative tool to plan and follow your data.
Pinheiro, A. G., Cerceau, R., Campos, M. L. M., and da Cruz, S. M. S. (2023). Uma abordagem de acompanhamento da evolução de planos de gestão de dados ativos. pages 17–24. DOI: 10.5753/bresci.2023.233895.
Poole, A. H. (2015). How has your science data grown? digital curation and the human factor: a critical literature review. Archival Science, 15:101–139.
Preston-Werner, T. (2015). Semantic versioning 2.0.0. Sales, L., Henning, P., Veiga, V., Costa, M. M., Sayão, L. F., da Silva, S.,
Bonino, L. O., and Pires, L. F. (2020). GO FAIR Brazil: A Challenge for Brazilian Data Science. Data Intelligence, 2(1-2):238–245. DOI: 10.1162/dinta00046.
Schultes, E. A., Strawn, G. O., and Mons, B. (2018). Ready, set, go fair: Accelerating convergence to an internet of fair data and services. In International Conference on Data Analytics and Management in Data Intensive Domains.
Sharma, S. e. a. (2023). Evaluating tools for data management plans: A comparative study of the dart rubric and the belmont scorecard. pages 26–46.
Simms, S., Jones, S., Mietchen, D., and Miksa, T. (2017). Machine-actionable data management plans (madmps). Research Ideas and Outcomes, 3:e13086. DOI: 10.3897/rio.3.e13086.
Wilkinson, M., e. a. (2016). The fair guiding principles for scientific data management and stewardship. Scientific data, 3(1).
Wittenburg, P., Sustkova, H. P., Montesanti, A., Bloemers, S. M., de Waard, S. H., Musen, M. A., Graybeal, J. B., Hettne, K. M., Jacobsen, A., Pergl, R., Hooft, R. W. W., Staiger, C., van Gelder, C. W. G., Knijnenburg, S. L., van Arkel, A. C., Meerman, B., Wilkinson, M. D., Sansone, S.A., Rocca-Serra, P., McQuilton, P., Gonzalez-Beltran, A. N., Aben, G. J. C., Henning, P., Alencar, S., Ribeiro, C., Silva, C. R. L., Sayao, L., Sales, L., Veiga, V., Lima, J., Dib, S., Xavier, P., Murtinho, R., Tendel, J., Schaap, B. F., Brouwer, P. M., Gavai, A. K., Bouzembrak, Y., Marvin, H. J. P., Mons, A., Kuhn, T., Gambardella, A. A., de Miranda Azevedo, R., Muhonen, V., van der Naald, M., Smit, N. W., Buys, M. J., de Bruin, T. F., Schoots, F., Goodson, H. J. E., Rzepa, H. S., Jeffery, K. G., Shanahan, H. P., Axton, M., Tkachenko, V., Maya, A. D., Meyers, N. K., Conlon, M., Haak, L. L., and Schultes, E. A. (2019). The fair funder pilot programme to make it easy for funders to require and for grantees to produce fair data.

