Curating, Enriching and FAIRifying Datasets from the Brazilian COVID-19 Vaccination
DOI:
https://doi.org/10.5753/jidm.2022.2356Keywords:
Data Science, COVID-19, Data Provenance, FAIR Pipelines, Data paperAbstract
As the world struggles to face the challenges of vaccination against COVID-19, more attention needs to be paid to the issues related to the lack of transparency and accessibility of curated vaccination datasets. Among the strategies to combat COVID-19, vaccination and data-centered epidemiological investigations are the best ones. This paper presents the process of building cured and annotated datasets with provenance metadata. The primary dataset is based on the registration data of the Vaccination Campaign against COVID-19 in Brazil. The dataset contains thousands of records processed up to March 2021. The data were analyzed, treated, cross-checked, and linked with other sources to correct and complement them, resulting in cured datasets and aligned to the FAIR Data principles.
Downloads
References
Alimadadi, A., Aryal, S., Manandhar, I., Munroe, P. B., Joe, B., and Cheng, X. Artificial intelligence and machine learning to fight covid-19, 2020.
Barbosa Pina, D., Kunstmann, L., de Oliveira, D., Valduriez, P., and Mattoso, M. Uma abordagem para coleta e análise de dados de configurações em redes neurais profundas. Proceedings of 2nd SBBD DSW vol. 2, pp. 187–192, 2020.
Bernal, J. L., Andrews, N., Gower, C., Robertson, C., Stowe, J., Tessier, E., Simmons, R., Cottrell, S., Roberts, R., O’Doherty, M., et al. Effectiveness of the pfizer-biontech and oxford-astrazeneca vaccines on covid-19 related symptoms, hospital admissions, and mortality in older adults in england: test negative case-control study. BMJ 373 (1), 2021.
Buneman, P., Khanna, S., and Wang-Chiew, T. Why and where: A characterization of data provenance. In International conference on database theory. Springer, pp. 316–330, 2001.
Christen, P. Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection / peter christen, 2012.
Clarindo, J. P., Fontes, W., and Coutinho, F. Qualisus: um dataset sobre dados da saúde pública no brasil. Proceedings of 2nd SBBD DSW vol. 2, pp. 418–428, 2020.
Cruz, S. M. S., Campos, M. L. M., and Mattoso, M. Towards a taxonomy of provenance in scientific workflow management systems. In 2009 Congress on Services - I. pp. 259–266, 2009.
Depoux, A., Martin, S., Karafillakis, E., Preet, R., Wilder-Smith, A., and Larson, H. The pandemic of social media panic travels faster than the covid-19 outbreak, 2020.
Doyle, R. and Conboy, K. The role of is in the covid-19 pandemic: A liquid-modern perspective. International Journal of Information Management vol. 55, pp. 102–184, 2020.
Fridman, A., Gershon, R., and Gneezy, A. Covid-19 and vaccine hesitancy: A longitudinal study. PLOS ONE 16 (4): 1–12, 04, 2021.
Gonçalves, M. V., dos Santos, J., Ferreira, C., Zavaleta, J., Cruz, S. M. S., and Sampaio, J. Datasets curados e enriquecidos com proveniência da campanha nacional de vacinação contra covid-19. In Anais do III Dataset Showcase Workshop. SBC, Porto Alegre, RS, Brasil, pp. 148–159, 2021.
Ienca, M. and Vayena, E. On the responsible use of digital data to tackle the covid-19 pandemic. Nature medicine 26 (4): 463–464, 2020.
Jacobsen, A., Kaliyaperumal, R., da Silva Santos, L. O. B., Mons, B., Schultes, E., Roos, M., and Thompson, M. A Generic Workflow for the Data FAIRification Process. Data Intelligence 2 (1-2): 56–65, 01, 2020.
Landi, A., Thompson, M., Giannuzzi, V., Bonifazi, F., Labastida, I., da Silva Santos, L. O. B., and Roos, M. The “A” of FAIR – As Open as Possible, as Closed as Necessary. Data Intelligence 2 (1-2): 47–55, 01, 2020.
Martins, W. A., de Oliveira, G. M. M., Brandão, A. A., Mourilhe-Rocha, R., Mesquita, E. T., Saraiva, J. F. K., Bacal, F., and Lopes, M. A. C. Q. Vacinação do Cardiopata contra COVID-19: As Razões da Prioridade. Arquivos Brasileiros de Cardiologia vol. 116, pp. 213–218, 2021.
Mathieu, E., Ritchie, H., Ortiz-Ospina, E., Roser, M., Hasell, J., Appel, C., Giattino, C., and Rodés-Guirao, L. A global database of covid-19 vaccinations. Nature human behaviour 1 (5): 1–7, 2021.
Ministério da Saúde – Brasil. Portaria nº 69, de 14 de janeiro de 2021. institui a obrigatoriedade de registro de aplicação de vacinas contra a covid-19 nos sistemas de informação do ministério da saúde, 2021. [Acessado em 13 abr. 2021].
Missier, P., Belhajjame, K., and Cheney, J. The w3c prov family of specifications for modelling provenance metadata. In Proceedings of the 16th International Conference on Extending Database Technology. pp. 773–776, 2013.
Oliveira, L. A., Muraro, R., Cristina, A. P., Andrade, A., Cecconello, S., and Lalucci, M. M. Vacinação contra a covid-19 em mato grosso: primeiros resultados. Nota Técnica - Universidade Federal de Mato Grosso, 06, 2021.
Rocha, T. A. H., Boitrago, G. M., Mônica, R. B., Almeida, D. G. d., Silva, N. C. d., Silva, D. M., Terabe, S. H., Staton, C., Facchini, L. A., and Vissoci, J. R. N. Plano nacional de vacinação contra a covid-19: uso de inteligência artificial espacial para superação de desafios. Ciência & Saúde Coletiva vol. 26, pp. 1885–1898, 2021.
Romain, D., Laurence, M., Alison, S., Stryeck, S., Mogens, T., Mohamed, Y., Clement, J., Laurent, D., Daniel, a. J., Daniel, e. B., Elena, B., Sophie, G., Hannah C., G., Jean-Eudes, H., Vassilios, I., Yvan, L. B., Emilie, L., and Anne, C.-T. Fairness literacy: The achilles’ heel of applying fair principles. Data Sci. J. vol. 19, pp. 32, 2020.
Sikos, L. F. and Philp, D. Provenance-aware knowledge representation: A survey of data models and contextualized knowledge graphs. Data Science and Engineering vol. 5, pp. 293–316, 2020.
Squire, M. Clean Data: Save time by discovering effortless strategies for cleaning, organizing, and manipulating your data. Birmingham, Packt Publishing Ltd, 2015.
Tagoe, E. T., Sheikh, N., Morton, A., Nonvignon, J., Sarker, A. R., Williams, L., and Megiddo, I. Covid-19 vaccination in lower-middle income countries: national stakeholder views on challenges, barriers, and potential solutions. Frontiers in Public Health, 2021.
Vasileiou, E., Simpson, C. R., Shi, T., Kerr, S., Agrawal, U., Akbari, A., Bedston, S., Beggs, J., Bradley, D., Chuter, A., et al. Interim findings from first-dose mass covid-19 vaccination roll-out and covid-19 hospital admissions in scotland: a national prospective cohort study. The Lancet 397 (10285): 1646–1657, 2021.
Victora, C. G., Castro, M. C., Gurzenda, S., Medeiros, A., Franca, G. V., and Barros, A. J. Estimating the early impact of immunization against covid-19 on deaths among elderly people in brazil: analyses of routinely-collected data on vaccine coverage and mortality. medRxiv, 2021.
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., et al. The fair guiding principles for scientific data management and stewardship. Scientific data 3 (1): 1–9, 2016.