author picture
AnswerMiner
March 16, 202010 min read

Coronavirus (COVID-19) exploratory data analysis with AnswerMiner

This blog post was updated at April 27.

You could already read Bence Buday’s other article in our blog about analyzing Facebook page with AnswerMiner. Now he made some research and spent some time to analyze and visualize coronavirus data (jan 24 - apr 27) that will be updated.

We all know what COVID-19 is. All platforms of the media are full with this virus and its economical effect. It is hard to write without any emotions or comments. I am sitting at home in voluntary quarantine, lost almost ⅓ of my savings and have no idea what the future holds for us.

Anyway, this post won’t tell you how deadly the coronavirus is or how much toilet paper you should buy from the stores so please keep reading only if you are interested in data visualization!

What have you seen so far on the web?

Based on my researches in the topic the most used charts are the follows:

  • column chart
  • histogram
  • and line diagrams

Beside these most of the information just comes as text and numbers. I do not prefer this since there is not any attached data, which could explain the WHY and HOW.

Let see this example:

Media news: “Today, 345 people died in Italy”. - this is horrible and there are no words for such a loss like this. Is there a war in Italy? Actually yes, against an invisible enemy which does exactly the same damn thing what humans do, it wants to live.

Let me try to add some extra info for this sentence to make it more understandable. “Yesterday 345 people died from the 26407 cases in Italy, where the healthcare system is totally overloaded. Most of them had some serious health issues and age over 65.” I hope that you can feel the differences between the headlines. Not better, but much more understandable.

Any nice visualisation?

Some institutes and research companies developed global map based visualisation, which shows us how the virus could get about on Earth in time. You can find there the number of cases and countries too.

These maps are pretty nice ones. With them you can see how quickly a virus can travel around the world. I think this is the most appropriate map about coronavirus.

The Dataset by myself

For my own visualisation I had to make my own dataset from different sources all over the internet.

PLEASE KEEP IN MIND: THESE ARE THE REGISTERED CASES!

There must be many-many cases which are not registered, since the hosts have no idea that the virus is in his/her body.

I gathered the:

  • all cases, [1]
  • number of recovered cases, [1]
  • number of death, [1]
  • number of affected countries, [2]
  • iShare MSCI WORLD UCITS ETF (DIST) value ($), [3]
  • Argus US Jet Fuel price ($/gallon). [4]

All of the other columns are calculated from the above mentioned ones. I will present them later. If you are wondering what the two last columns are for, please do not mind so long. One of them gives a great overview about stock life nowadays, the other shows how the oil price/gallon in USD looks like during coronavirus.

Lets visualize!

What I wanted to see first is how it looks when only all cases and death cases are together. These numbers show us the minimum death rate of the virus (based on the registered cases). The chart below shows the number of all cases colored by the number of total deaths on each recorded day.

All cases colored by amount of death (UPDATED at 04.27.)

The more cases means more deaths, this is clear. However only the coloring won’t tell us how the rate changes in time, so for the next visualization I used “Death Rate - 2” (death cases / all cases).

Updated at April-13

It is still not at the peak point, however it seems that the growth looks like a linear rise instead of an exponential one.

Updated at April-27

I was wondering how I could present the nearness of any peakpoint of the virus and just realized that the total number of the cases are not the best way for doing this. The reason is clear, all cases can not have a negative turn in growth. However active cases could. On the chart below, you can see that the height of the dots could not fit to a linear line, or at least not all of them. (Just grab a ruler and try it by yourself!) Unfortunately it is still not possible to find any dot which neighbours are under it. So no peak point until 27th of April.

Active Cases (Created at 04.27.)

Death rate - 2 in time (UPDATED at 04.27.)

(March-16) 4% - This rate is growing slowly since the full recovery tooks more time then dying, which let us assume that the “Death Rate - 1” (death cases / all closed cases) should show a regressive behaviour. Sad, but true.

Updated at March-22

4.3% - 5 days ago, I expected that the Death Rate - 2 will increase slowly. It seems that I was right. At this point I must mention that due to the Italien cases, where there are no free beds, equipment and nurses in the hospitals, doctors have to decide about lives day by day. This fact (beside the very high average age of the patients) can support my expectation at Death Rate -2, however will harm Death Rate - 1.

Updated at March-29

4.68% - Unfortunately the line in this chart is still going upwards. There could be many reason behind this:

  • the lack of capacity of the healthcare system,
  • only those people are tested who have symptoms, or
  • the virus have mutated and began more deadly.

Updated at April-13

6.29% - this is the 4th version of the chart and as you can see above, the growth of this rate is still going up.

Updated at April-27

6.90% - 5th version, however this one is not the highest one since 7.01% was the biggest death rate 2. This is a great information because I have found a chart where some kind of peak point can be shown.

Death rate - 1 in time (UPDATED at 04.13.)

(March-16) 7% - My suggestion looks valid since in the meantime this rate is going down and sits around 7%. The two numbers must meet somewhere. I do not know where (5%? 6%?) and when, but time will tell us.

Updated at March-22

12.3% - almost a week ago it seemed to me that at the end of the pandemic, the total death rate can stop around 5-6%. However the situation in Italy pushes up the numbers. Well, how about 8%? (Keep in mind, we are talking about the registered cases.).

Updated at March-29

18% - in the last week, this rate had produced continuous growth, except on sunday (03.29). Could it be the end of this growth? Anyhow the lowest value of this chart was 5.63 (03.07) since 1st of February. This is a dream at the moment.

Updated at April-13

21.51% - according to this chart, the growth of the death rate 1 seems to be stagnant.

Updated at April-27

18.67% - another peak point confirmed! Great!

Affected number of countries in time (UPDATED at 04.27.)

(March-16) Right at the moment it is hard to say whether the number of affected countries affect the rates above or not. A couple of weeks is needed to say it. (This post will be updated.)

Updated at March-22

179 of 195 countries are infected.

Updated at March-29

“The WHO counts big cruise ships and territories as individual countries too which cause numbers above 195.”

Updated at April-13

208 countries, territories and cruise ships are affected so far. That means the virus is “everywhere” on Earth.

I do not have to present, but this virus hit the stock market and oil industry as well.

Updated at April-27

Still 208

Behind the line and column charts

Column and line diagrams are very useful and easy to understand for all but there are other ways to represent data.

How about a bubble plot chart? Cases of death and recovery won’t be less in the future. Only active cases can be 0 at the end of this crisis. In a bubble plot chart this will result in a growing line of bubbles. Each bubble is a date. The more the bubble goes right, the more recoveres there are. The more the bubble goes up, the more death there is. For better understanding, the best case when the bubble is close to the x axis and far from the y axis.

Recovered and death bubble plot sized by active cases (UPDATED at 04.27.)

In the upcoming weeks I hope that the movements in the right direction will be way bigger than in the up direction. (Update of this post is coming later.)

Updated at March-29

The movement of the new bubbles are two-way whereof the upwards direction is bad. Right at the moment it seems linear instead of logarithmic.

Updated at April-27

Since both death rates produced reduction, this chart is more like logarithmic than linear, which is also great.

The next chart is a totally different one. I could hide information with coloring and sizing. Name of the week days sized by the sum of the new cases on the week day. Otherwise, from all cases the most of them were registered on sundays. The second one in this competition in thursday. Coloring is about to tell, which day is the deadliest. The more red the coloring is, the more deadly the week day is.

Updated at April-27

Friday is the ultimate winner at the moment.

Name of week days stacked by daily new cases (sum) and colored by daily death (sum) (UPDATED at 04.13.)

Updated at March-22

According to my source [2], there are still a few country without any registered COVID-19 patients (16).

Updated at March-29

For now saturday became the deadliest day in COVID-19 history.

Updated at April-13

Saturday is still the deadliest day but thursday is coming up.

The heat map is also a pretty spectacular chart. Here you can see how the iShare MSCI WORLD UCITS ETF (DIST) value ($) went down day by day, separated by affected countries. Will the dive stop when all countries on Earth have the virus? Good question!Update - according to my source [2], there are still a few countries without any registered COVID-19 patients (16).

iShare MSCI WORLD UCITS ETF (DIST) value ($) in time separated by number of affected countries:

(March-16) More bubbles? Yes! Here you can see the percentage of daily cases from all cases on a single day. Please note that 40% of all cases have been registered in the last 6 days (jan 24 - mar 16).

Updated at March-22

The last 7 days (03.16 - 03.22) gives the 49% of all cases. This means that the spreading speed of the virus is + 50% / week. Is it possible that there will be 471.000 total cases on 03.29? Will there be 21120 deaths? You will get the answer from the post update.

Updated at March-29

52% / week.

Updated at April-13

30% / week which does not mean that we have less new cases, but the speed of spreading is more linear than exponential.

Updated at April-27

21% / week - this also verifies that the growth of the pandemic is rather in a linear or logarithmic phase than in aexponential.

Daily new cases percentage of all cases (UPDATED at 04.27.)

Summary

(March-16) What I could see from my charts is that somewhere in the future all countries may will be affected, and the death rate should stop between 4% and 7% in registered cases. The next weeks will show how the spreading will go on. Hopefully now this virus is in a peak period, but this is not 100% sure, since many countries just reported their first diseased patient.

Updated at March-22 - death rate can be between 4% and 7%, but it really depends on the load of the healthcare system. The peak period theory seems correct, however the absolute peak point is far.

Updated at March-29

The current situation worldwide seems not enough to slow the spreading speed, but we have to wait a little to see the effect of the restrictions taken by the governments. Are we already in the peak period? If you have any advice, suggestion or idea to be presented with the AnswerMiner app, write them! Thanks for still reading this blogpost!

Updated at April-27.

Fortunately I could find a few small peak points, however the big one is still not reached.

Please follow the WHO instructions beside the law of your country!

Stay healthy!

Used sources:

Cookies help us in delivering our service. You consent to our cookies and you agree to our privacy policy and cookie policy if you continue to use our website. Learn more about our privacy policy here and cookie policy here.