Covid Data Analytics Repository: An interdisciplinary look into the COVID-19 pandemic in Brazil
DOI:
https://doi.org/10.5753/jidm.2022.2266Keywords:
coronavirus, datasets, digital health, social networksAbstract
This article describes the construction and deployment of the Covid Data Analytics Repository, a source for interdisciplinary studies about the impact of the COVID-19 pandemic in Brazil. We collected different types of data from official (IBGE, DATASUS) and non-official (Brasil.IO) sources, online social networks (Instagram, Twitter), and from a search engine analysis tool (Google Trends). We used these data to perform investigations aimed to understand the impacts of COVID-19 in the country, from economics to social behavior. At the moment of publication of this article, our repository contains 1,508 documents, classified into two main types: (i) databases and tables downloaded from the aforementioned sources; and (ii) papers, reports, maps and graphs resulting from the analyses that we performed. As a means to allow reproducibility and foster follow-up studies, we released our repository for public use.
Downloads
References
Aiello, A. E., Renson, A., and Zivich, P. N. Social media– and internet-based disease surveillance for public health. Annual Review of Public Health 41 (1): 101–118, 2020.
Bastos, S. B. and Cajueiro, D. O. Modeling and forecasting the early evolution of the Covid-19 pandemic in Brazil. Scientific Reports 10 (1): 19457, 2020.
Blei, D. M., Ng, A. Y., and Jordan, M. I. Latent dirichlet allocation. the Journal of machine Learning research vol. 3, pp. 993–1022, 2003.
Box, G. E., Jenkins, G. M., Reinsel, G. C., and Ljung, G. M. Time series analysis: forecasting and control. John Wiley & Sons, Washington, USA, 2015.
Brodeur, A., Clark, A. E., Fleche, S., and Powdthavee, N. Covid-19, lockdowns and well-being: Evidence from google trends. Journal of public economics vol. 193, pp. 104346, 2021.
Brum, P. V., Teixeira, M. C., Miranda, R., Vimieiro, R., Meira Jr, W., and Pappa, G. L. A characterization of portuguese tweets regarding the covid-19 pandemic. In Anais do VIII Symposium on Knowledge Discovery, Mining and Learning. SBC, SBC, Online, October 4-8, 20210, pp. 177–184, 2020.
Chen, E., Lerman, K., and Ferrara, E. Tracking social media discourse about the COVID-19 pandemic: development of a public coronavirus Twitter data set. JMIR Public Health and Surveillance 6 (2): e19273, 2020.
Cinelli, M., Quattrociocchi, W., Galeazzi, A., Valensise, C. M., Brugnoli, E., Schmidt, A. L., Zola, P., Zollo, F., and Scala, A. The COVID-19 social media infodemic. Scientific Reports 10 (1): 16598, 2020.
Cota, W. Monitoring the number of COVID-19 cases and deaths in Brazil at municipal and federative units level. SciELO Preprints 20 (x): 1–13, 2020.
Cunha, E. L. T. P., Magno, G., Gonçalves, M. A., Cambraia, C. N., and Almeida, V. He votes or she votes? Female and male discursive strategies in Twitter political hashtags. PLOS ONE 9 (1): e87041, Jan., 2014.
Dong, E., Du, H., and Gardner, L. An interactive web-based dashboard to track COVID-19 in real time. The Lancet Infectious Diseases 20 (5): 533–534, 2020.
Du, J., Xu, J., Song, H., Liu, X., and Tao, C. Optimization on machine learning based approaches for sentiment analysis on hpv vaccines related tweets. Journal of biomedical semantics 8 (1): 1–7, 2017.
Guimarães, R. B., CatÃO, R. D. C., MARTINUCI, O. D. S., Pugliesi, E. A., and Matsumoto, P. S. S. O raciocínio geográfico e as chaves de leitura da covid-19 no território brasileiro. Estudos avançados vol. 34, pp. 119–140, 2020.
Kang, G. J., Ewing-Nelson, S. R., Mackey, L., Schlitt, J. T., Marathe, A., Abbas, K. M., and Swarup, S. Semantic network analysis of vaccine sentiment in online social media. Vaccine 35 (29): 3621–3638, 2017.
Li, C., Chen, L. J., Chen, X., Zhang, M., Pang, C. P., and Chen, H. Retrospective analysis of the possibility of predicting the covid-19 outbreak from internet searches and social media data, china, 2020. Eurosurveillance 25 (10): 10, 2020.
Locatelli, M. S. et al. Correlations between web searches and COVID-19 epidemiological indicators in Brazil. Brazilian Archives of Biology and Technology 65 (x): 00–7, 2022.
Marques-Toledo, C. d. A., Degener, C. M., Vinhal, L., Coelho, G., Meira, W., Codeço, C. T., and Teixeira, M. M. Dengue prediction by the web: Tweets are a useful tool for estimating and forecasting dengue at country and city level. PLoS neglected tropical diseases 11 (7): e0005729, 2017.
Martins, A. D. F., Cabral, L., Mourão, P. J. C., de Sá, I. C., Monteiro, J. M., and Machado, J. COVID19.BR: a dataset of misinformation about COVID-19 in Brazilian Portuguese WhatsApp messages. In III Dataset Showcase Workshop (DSW). SBC, Online, October 4-8, 2021, pp. 138–147, 2021.
Mavragani, A. and Gkillas, K. Covid-19 predictability in the united states using google trends time series. Scientific reports 10 (1): 1–12, 2020.
Miller, M. 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository: Johns Hopkins University Center for Systems Science and Engineering. Bulletin - Association of Canadian Map Libraries and Archives (ACMLA) 164 (2020): 47–51, 2020.
Moreira, P., Fonseca, R., Alzamora, P. L., Franco, R. A. S., Guiginski, J., Cunha, E. L. T. P., Bernardes, T., Chagas, B., Ferreguetti, K., Passos, L., Cardoso, L., Schneider, R., Pereira, W., da Silva, A. P. C., and Meira Jr., W. Covid Data Analytics: repositório de dados provenientes de múltiplas fontes sobre a pandemia de COVID-19 no Brasil. In III Dataset Showcase Workshop (DSW). Vol. 03. SBC, Online, October 4-8, 2021, pp. 107–116, 2021.
Myers, L. and Sirois, M. J. Spearman correlation coefficients, differences between. Encyclopedia of statistical sciences vol. 12, pp. 138–147, 2004.
Pankratz, A. Forecasting with dynamic regression models. Vol. 935. John Wiley & Sons, Washington, USA, 2012.
Peixoto, P. S., Marcondes, D., Peixoto, C., and Oliva, S. M. Modeling future spread of infections via mobile geolocation data and population dynamics. An application to COVID-19 in Brazil. PLOS ONE 15 (7): e0235732, 2020.
Pereira, I. G., Guerin, J. M., Silva Júnior, A. G., Garcia, G. S., Piscitelli, P., Miani, A., Distante, C., and Gonçalves, L. M. G. Forecasting Covid-19 dynamics in Brazil: a data driven approach. International Journal of Environmental Research and Public Health 17 (14): 5115, 2020.
Ranzani, O. T., Bastos, L. S., Gelli, J. G. M., Marchesi, J. F., Baião, F., Hamacher, S., and Bozza, F. A. Characterisation of the first 250 000 hospital admissions for COVID-19 in Brazil: a retrospective analysis of nationwide data. The Lancet Respiratory Medicine 9 (4): 407–418, 2021.
Rey S. J., Arribas-Bel D., W. L. J. Geographic data science with pysal and the pydata stack, 2020.
Riffe, T. et al. Data resource profile: COVerAGE-DB: a global demographic database of COVID-19 cases and deaths. International Journal of Epidemiology 50 (2): 390–390f, 2021.
Silva, R. J., Silva, K., Mattos, J., et al. Análise espacial sobre a dispersão da covid-19 no estado da bahia. SciELO Preprints vol. 15, pp. 1–10, 2020.
Sultana, A., Tasnim, S., Hossain, M. M., Bhattacharya, S., and Purohit, N. Digital screen time during the covid-19 pandemic: a public health concern. F1000Research 10 (81): 81, 2021.
Szwarcwald, C. L., Bastos, F. I., Esteves, M. A. P., and Andrade, C. L. A disseminação da epidemia da aids no brasil, no período de 1987-1996: uma análise espacial. Cadernos de Saúde Pública vol. 16, pp. S07–S19, 2000.
Veiga e Silva, L., de Andrade Abi Harb, M. d. P., dos Santos, A. M. T. B., de Mattos Teixeira, C. A., Gomes, V. H. M., Cardoso, E. H. S., da Silva, M. S., Vijaykumar, N. L., Carvalho, S. V., Ponce de Leon Ferreira de Carvalho, A., and Frances, C. R. L. COVID-19 mortality underreporting in Brazil: analysis of data from government internet portals. Journal of Medical Internet Research 22 (8): e21413, 2020.
Weisberg, S. Applied linear regression. Vol. 528. John Wiley & Sons, Washington, USA, 2005.
Zarei, K., Farahbakhsh, R., Crespi, N., and Tyson, G. A first Instagram dataset on COVID-19. arXiv preprint: 2004.12226 10 (x): 0–13, 2020.