It's All About the Database

by David Keeshin


June 10, 2020


During my recent sheltering, I modeled a data management solution for tracking Covid19 data. It is a working model that tracks Country, US State and US County case data from a variety of sources.

Background


The solution has three main components. The first is an on-premises data staging database and asynchronous controller. The second is an asynchronous pipeline. The third is a data vault.


images/AsyncProcess2020.jpg

Figure 1.- Asynchronous Data Vault Pipeline with Message Controller


How It Works


The first step is an automated process that uses Powershell scripts to gather JSON and or CSV data using REST calls to source data that comes directly from these sites:


The process then does some minor data cleansing and adds population totals from The World Bank or the US Census Bureau. Next, it sends COVID 19 totals as a JSON message, over an asynchronous pipeline. I control the pipeline with a custom asynchronous controller program built in C#. It sends and receives JSON messages through an Azure Service Bus using serverless Azure Functions and lands in a data vault built in an Azure SQL Database.


You can check out the schema for the data vault as well as the current integration code on github at https://github.com/dkeeshin/covid19datavault

Two Main Reasons


1. Contribute. I have been working with the data from https://covidtracking.com , https://ourworldindata.org and from the https://github.com/nytimes/covid-19-data now for two months, I have gained insights to help improve data management. I have offered these insigths, free of charge to the data source organizations.


https://covidtracking.com and https://ourworldindata.org are not-for-profits. Please go to their sites to learn more about who and what they are. Make a contribution if you can.


2. The data from these sites is just that, data. To become meaningful information, it needs structure like a data vault. The data vault can be made easily available to experts for research and analytics. Hopefully, I have provided that.


Be well.


Leave a Comment:

* Required

Here are some useful categories to links that will help decipher some of the mysteries of SQL Server and other data technologies