Capturing Provenance from Deep Learning Applications Using Keras-Prov and Colab: a Practical Approach

Authors

  • Débora Pina COPPE/Federal University of Rio de Janeiro
  • Liliane Kunstmann COPPE/Federal University of Rio de Janeiro
  • Felipe Bevilaqua COPPE/Federal University of Rio de Janeiro
  • Isabela Siqueira COPPE/Federal University of Rio de Janeiro
  • Alan Lyra COPPE/Federal University of Rio de Janeiro
  • Daniel de Oliveira Fluminense Federal University
  • Marta Mattoso COPPE/Federal University of Rio de Janeiro

DOI:

https://doi.org/10.5753/jidm.2022.2544

Keywords:

Deep Learning, Metadata, Provenance

Abstract

Due to the exploratory nature of DNNs, DL specialists often need to modify the input dataset, change a filter when preprocessing input data, or fine-tune the models’ hyperparameters, while analyzing the evolution of the training. However, the specialist may lose track of what hyperparameter configurations have been used and tuned if these data are not properly registered. Thus, these configurations must be tracked and made available for the user’s analysis. One way of doing this is to use provenance data derivation traces to help the hyperparameter’s fine-tuning by providing a global data picture with clear dependencies. Current provenance solutions present provenance data disconnected from W3C PROV recommendation, which is difficult to reproduce and compare to other provenance data. To help with these challenges, we present Keras-Prov, an extension to the Keras deep learning library to collect provenance data compliant with PROV. To show the flexibility of Keras-Prov, we extend a previous Keras-Prov demonstration paper with larger experiments using GPUs with the help of Google Colab. Despite the challenges of running a DBMS with virtual environments, DL analysis with provenance has added trust and persistence in databases and PROV serializations. Experiments show Keras-Prov data analysis, during training execution, to support hyperparameter fine-tuning decisions, favoring the comparison, and reproducibility of such DL experiments. Keras-Prov is open source and can be downloaded from https://github.com/dbpina/keras-prov.

Downloads

Download data is not yet available.

References

Agrawal, P., Arya, R., Bindal, A., Bhatia, S., Gagneja, A., Godlewski, J., Low, Y., Muss, T., Paliwal, M. M., Raman, S., Shah, V., Shen, B., Sugden, L., Zhao, K., and Wu, M.-C. Data platform for machine learning. In Proceedings of the 2019 International Conference on Management of Data. SIGMOD ’19. Association for Computing Machinery, New York, NY, USA, pp. 1803–1816, 2019.

Almeida, R. F., da Silva, W. M. C., Castro, K., de Araújo, A. P. F., Walter, M. E. T., Lifschitz, S., and Holanda, M. Managing data provenance for bioinformatics workflows using aprovbio. Int. J. Comput. Biol. Drug Des. 12 (2): 153–170, 2019.

Beeharry, Y. and Fokone, R. T. Hybrid approach using machine learning algorithms for customers’ churn prediction in the telecommunications industry. Concurr. Comput. Pract. Exp. 34 (4): e6627, 2022.

Fairweather, E., Wittner, R., Chapman, M., Holub, P., and Curcin, V. Non-repudiable provenance for clinical decision support systems. In Provenance and Annotation of Data and Processes. Springer International Publishing, Virtual Event, pp. 165–182, 2021.

Fekete, J., Freire, J., and Rhyne, T. Exploring reproducibility in visualization. IEEE Computer Graphics and Applications 40 (5): 108–119, 2020.

Freire, J., Koop, D., Santos, E., and Silva, C. T. Provenance for computational tasks: A survey. Computing in Science & Engineering 10 (3): 11–21, 2008.

Gehani, A. and Tariq, D. Spade: Support for provenance auditing in distributed environments. In Middleware 2012, P. Narasimhan and P. Triantafillou (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 101–120, 2012.

Gharibi, G., Walunj, V., Rella, S., and Lee, Y. Modelkb: Towards automated management of the modeling lifecycle in deep learning. In 2019 IEEE/ACM 7th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE). pp. 28–34, 2019.

Glorot, X., Bordes, A., and Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, G. Gordon, D. Dunson, and M. Dudík (Eds.). Proceedings of Machine Learning Research, vol. 15. PMLR, Fort Lauderdale, FL, USA, pp. 315–323, 2011.

Goodfellow, I., Bengio, Y., and Courville, A. Deep Learning. MIT Press, 2016a. [link].

Goodfellow, I. J., Bengio, Y., and Courville, A. C. Deep Learning. Adaptive computation and machine learning. MIT Press, 2016b.

Huang, G., Liu, Z., van der Maaten, L., and Weinberger, K. Q. Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, pp. 2261–2269, 2017.

Huynh, T., Stalla-Bourdillon, S., and Moreau, L. Provenance-based Explanations for Automated Decisions: Final IAA Project Report, 2019.

Iqbal, S., Hassan, S., Aljohani, N. R., Alelyani, S., Nawaz, R., and Bornmann, L. A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies. Scientometrics 126 (8): 6551–6599, 2021.

Klambauer, G., Unterthiner, T., Mayr, A., and Hochreiter, S. Self-normalizing neural networks, 2017.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. pp. 1097–1105, 2012.

Lourenço, R., Freire, J., and Shasha, D. E. Bugdoc: A system for debugging computational pipelines. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, D. Maier, R. Pottinger, A. Doan, W. Tan, A. Alawini, and H. Q. Ngo (Eds.). ACM, pp. 2733–2736, 2020.

McPhillips, T., Bowers, S., Belhajjame, K., and Ludäscher, B. Retrospective provenance without a runtime provenance recorder. In 7th USENIX Workshop on the Theory and Practice of Provenance (TaPP 15). USENIX Association, Edinburgh, Scotland, 2015.

Missier, P., Belhajjame, K., and Cheney, J. The w3c prov family of specifications for modelling provenance metadata. In Proceedings of the 16th International Conference on Extending Database Technology. EDBT ’13. Association for Computing Machinery, New York, NY, USA, pp. 773–776, 2013.

Montavon, G., Orr, G., and Müller, K.-R. Neural networks: tricks of the trade. Vol. 7700. Springer, 2012.

Moreau, L. and Groth, P. Provenance: an introduction to prov. Synthesis Lectures on the Semantic Web: Theory and Technology 3 (4): 1–129, 2013.

Nair, V. and Hinton, G. E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10. Omnipress, Madison, WI, USA, pp. 807–814, 2010.

Nilsback, M.-E. and Zisserman, A. A visual vocabulary for flower classification. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). Vol. 2. IEEE, pp. 1447–1454, 2006.

Ormenisan, A. A., Ismail, M., Haridi, S., and Dowling, J. Implicit provenance for machine learning artifacts. Proceedings of MLSys vol. 20, 2020.

Pimentel, J. F., Murta, L., Braganholo, V., and Freire, J. noworkflow: a tool for collecting, analyzing, and managing provenance from python scripts. VLDB 10 (12): 1841–1844, 2017.

Pina, D., Kunstmann, L., de Oliveira, D., Valduriez, P., and Mattoso, M. Provenance supporting hyperparameter analysis in deep neural networks. In Provenance and Annotation of Data and Processes. Springer International Publishing, Virtual Event, pp. 20–38, 2021.

Pina, D., Neves, L., de Oliveira, D., and Mattoso, M. Captura automática de dados de proveniência de experimentos de aprendizado de máquina com keras-prov. In Anais Estendidos do XXXVI Simpósio Brasileiro de Bancos de Dados. SBC, Porto Alegre, RS, Brasil, pp. 69–74, 2021.

Portella, G., Nakano, E. Y., Rodrigues, G. N., and Melo, A. C. M. A. Utility-based strategy for balanced cost and availability at the cloud spot market. In 12th IEEE International Conference on Cloud Computing, CLOUD 2019, Milan, Italy, July 8-13, 2019, E. Bertino, C. K. Chang, P. Chen, E. Damiani, M. Goul, and K. Oyama (Eds.). IEEE, pp. 214–218, 2019.

Russell, S. J. and Norvig, P. Artificial Intelligence: A Modern Approach (4th Edition). Pearson, 2020.

Sáenz-Adán, C., Pérez, B., Izquierdo, F. J. G., and Moreau, L. Integrating provenance capture and UML with UML2PROV: principles and experience. IEEE Trans. Software Eng. 48 (2): 53–68, 2022.

Schelter, S., Boese, J.-H., Kirschnick, J., Klein, T., and Seufert, S. Automatically tracking metadata and provenance of machine learning experiments. In Machine Learning Systems Workshop at NIPS. pp. 27–29, 2017.

Silva, V., Campos, V., Guedes, T., Camata, J., de Oliveira, D., Coutinho, A. L., Valduriez, P., and Mattoso, M. Dfanalyzer: Runtime dataflow analysis tool for computational science and engineering applications. SoftwareX vol. 12, pp. 100592, 2020.

Silva, V., de Oliveira, D., Valduriez, P., and Mattoso, M. Dfanalyzer: runtime dataflow analysis of scientific applications using provenance. VLDB vol. 11, pp. 2082–2085, 2018.

Souza, R., Azevedo, L., Lourenço, V., Soares, E., Thiago, R., Brandão, R., Civitarese, D., Brazil, E. V., Moreno, M., Valduriez, P., Mattoso, M., Cerqueira, R., and Netto, M. A. S. Provenance data in the machine learning lifecycle in computational science and engineering. In 2019 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS). IEEE, pp. 1–10, 2019.

Downloads

Published

2022-12-19

How to Cite

Pina, D., Kunstmann, L., Bevilaqua, F., Siqueira, I., Lyra, A., de Oliveira, D., & Mattoso, M. (2022). Capturing Provenance from Deep Learning Applications Using Keras-Prov and Colab: a Practical Approach. Journal of Information and Data Management, 13(5). https://doi.org/10.5753/jidm.2022.2544

Issue

Section

SBBD Demonstrations 2021 - Extended Papers