Image for post
Image for post
Photo by NASA on Unsplash

Python, Pandas and COVID-19

We live in a modern world of data, and we should be using it in every way that we can to understand trends. But for many, it will be Microsoft Excel that they may turn to for any sort of analysis. But after Googling for transposing columns to rows, and then trying to merge row, you kinda give up.

The answer is to use Python and Pandas, and which allows us to quickly process data and get it into the right format. Let’s start on a data set which contains date, country, province, lat, long, confirmed cases, recovered cases, and deaths:

Image for post
Image for post

To say that we just want the current date (9 April 2020), and ignore all the other dates. For this we can create the code of:

This will filter out all the rows which are not matched to ‘2020–04–09’ and will give:

Next, we will fill in any Not A Number (NAN) with a blank, and drop the Date, Lat and Long columns:

Next we will merge the Country/Region and Province/State columns, so that we get a unique column (“Name”), as, for example, China, France and the UK define its data with provinces. We can then drop the Country/Region and Province/State columns, and add the new one:

Finally we can save our data frame to a CSV file, and load it up for analysis:

This gives:

Image for post
Image for post

The code for this is here:

Now let’s load this up for analysis [here]:

Image for post
Image for post

Those on the upper left hand side are the countries which have more recovered over confirmed. And for our data analysis we get:

Spreadsheets are history, Python is the future!

Written by

Professor of Cryptography. Serial innovator. Believer in fairness, justice & freedom. EU Citizen. Auld Reekie native. Old World Breaker. New World Creator.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store