My 2 Cents on the COVID-19 Dashboard by JHU

For what it’s worth: a few remarks on the currently extremely popular COVID-19 Dashboard provided by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU)

When the Corona pandemic started, I was tempted to build and publish a dashboard about the spread of COVID-19, too. But then I decided not to. Not only because much brighter people than me beat me to it, but also because the topic is very sensitive. People suffer and people die. Not the type of data I want to analyze and visualize. Hence, I stayed out, stayed home and wrote a few articles about something else (see my recent posts).

Today I changed my mind. I still don’t want to create a comprehensive COVID-19 dashboard, but I would like to add my 2 cents to the definitely most popular Corona Data Visualization: the COVID-19 Dashboard provided by the Johns Hopkins University:

COVID-19 Dashboard JHUThis afternoon, I read that this dashboard is currently clicked 1.2 billion (!) times a day.

First things first: it is a great dashboard providing the most important numbers at a glance and various options to drill down into the details. I am very impressed by its features, how quickly this visualization was made available and how often the underlying data is updated. My congratulations to the data science team of the JHU.

Having said that, I also see a few weaknesses and today’s post will discuss two of them.

Without further ado, here are my remarks:

Remark 1: The Lack of Context in the Table of Confirmed Cases

At the most prominent position of a dashboard (top left) the probably most interesting and important information is shown, the confirmed cases by country:

COVID-19 Confirmed Cases as of 2020/04/13The numbers are accurate, of course, but they do lack context in my humble opinion.

If you look at this table, you could get the impression as if the United States were affected by the virus more than three times as hard as Spain. And this is correct regarding the absolute numbers.

But you must not forget that the US have 7 times the population of Spain.

If you add the context of population sizes to the analysis, the ranking will considerably change. Please don’t get me wrong: I am not saying you should not show the absolute numbers. Of course you should. But you should also provide a relative metric like [Cases per 100,000 inhabitants] or [Number of Inhabitants per Case].

If you do that, you will get to something like this:

Cases per 100k Inhabitants 1 to 15For the very small countries at the top of the list (like San Marino and Andorra), you may argue that this could be an unfortunate coincidence because of the very small numbers (cases and population). And I agree.

However, switching from absolute to relative numbers also changes the view for much larger countries.

For instance, regarding the absolute numbers, Spain has “only” 6% more cases than Italy. However, in relative numbers, Spain has 355 cases per 100k inhabitants, whilst Italy has 259, i.e. the relative number in Spain is 37% higher than in Italy.

Now, where are the USA? You have to scroll down to the next page and will see that the USA are currently ranked on position 19 regarding the metric [Confirmed Cases per 100k Inhabitants].

Cases per 100k Inhabitants 16 to 30170 out of 100,000 US Americans currently have a confirmed COVID-19 infection. Remember: Spain has 355, i.e. more than twice as many.

Now compare that to the view of the absolute numbers, where the US has more than 3 times the cases of Spain.

You see where I am going here?

Remark 2: Inexplicable Definition of Regions

Right to the World Map, the JHU COVID-19 dashboard shows a descending sorted table for total deaths. These numbers are lacking context (relation to population), too, as described for the confirmed cases above.

But there is one additional thing, which I find questionable: the definition of the regions. In this view you’ll find countries (like Italy or Spain), provinces in China (Hubei), provinces in Canada (like Quebec or British Colombia), cities in the US (like New York or Los Angeles), counties in the US (like Essex, New Jersey), etc.

COVID-19 Deaths as of 2020/04/13I can’t recognize any kind of decision rule for clustering the data like this. Maybe there is one, I just don’t understand it.

There are a few issues, why I find this view hard to understand:

  • the clustering of regions does not fit to the one in the view of confirmed cases by country
  • just as in the view of cases, the context of the population sizes is missing. You can’t compare the regions by absolute numbers, if you don’t know how many people are living there
  • finally, the inexplicable clustering of regions

Conclusion

The COVID-19 Dashboard provided by Johns Hopkins University is a great dashboard and I appreciate and applaud the efforts of their Center for Systems Science and Engineering to make this available for the masses.

However, I do think there is some room for improvement, especially regarding the addition of more context to the country-level visualizations.

Download Link

If you are interested in the Excel workbook used for my calculations and screenshots, here it is:

Download COVID-19 Measures in Context (zipped Microsoft Excel workbook, 3.3MB)

Please note that the data in this workbook was downloaded from the European Centre for Disease Prevention and Control on 13th of April 2020. The data in this workbook will not be updated.

Last, but not least: a personal note

Some of you may find this article and discussion inappropriate or even insensitive. I had my doubts, too, and it wasn’t an easy decision. People are suffering and people died. Please be assured that everyone who is infected and everyone who already lost a loved one has my deepest and heartfelt sympathies.

That being said, if we want to understand the data behind this pandemic, we need to discuss the best ways of analyzing and visualizing it. This post is meant as my humble contribution to this discussion. No more, no less.

Please let me know what you think.

Stay tuned.

Comments

5 responses to “My 2 Cents on the COVID-19 Dashboard by JHU”

  1. Mynda Avatar

    Hi Robert,
    I agree with your review. I also decided against building my own dashboard on this topic for the same reasons you cited.
    Another improvement I’d like to see is the red font on black background, which is very difficult to read, especially as the font gets smaller.
    Mynda

  2. Robert Avatar

    Mynda,
    many thanks for your comment. It is an honour to have you as a reader and commenter here.
    I fully agree with you. I have never been a big fan of the trending black background dashboards. I find them hard to read, too. And not only because of the red font color, but in general. I may be old school, but I still think the good old “Black on White” (or grey on white) is best in terms of readability.

  3. Microsoft Excel Recalc Or Die Avatar

    Hi Robert,
    Great post and critical feedforward on the subject, I think at least something positive of all this “dashboard histeria” is what the FT datavis designer; John Burn-Murdoch expressed:
    “Just as an aside, I think once things settle down, this is going to end up being an amazing resource in terms of public engagement with and response to data visualization.”
    Link: https://medium.com/nightingale/how-john-burn-murdochs-influential-dataviz-helped-the-world-understand-coronavirus-6cb4a09795ae
    On the other hand; the more I read about testing types and procedures, the more I realized the confirmed cases # are more tricky to count properly due to their nature. As Thomas Lumley comments; “Counting rare things is hard, and false positives are overwhelmingly more important than false negatives, which is currently a problem for antibody tests.”
    Link: https://www.statschat.org.nz/2020/04/19/counting-rare-things-is-hard/
    Thirdly, I like the standpoint of [standards of analysis], as Elijah Meeks puts it: “We won’t be able to properly evaluate all this until we look back and understand how we can build standards….Still, the pandemic has surfaced some important conversations that have been ongoing within data visualization for some time, from how to visualize uncertainty, to finding ways to humanize data and figuring out when it’s better to say something, rather than display it.”
    Lastly,(and my apologies for so many links) this topic is so complex, interesting and relevant for all, that Amanda Makulec is right when she says “Finding that sweet spot of visualization readability while conveying the full breadth of information, context and variance is an eternal challenge. That’s why a robust critique process is important…”
    Link: https://builtin.com/data-science/data-visualization-lessons-pandemic
    And here Robert, I really thank you, you have been very critical with the work I have shared with you and again thanks for the blogging again. It ignites discussion, and we (society) need it more. 🙂

  4. Jennifer Avatar
    Jennifer

    Hi
    Will you be posting more blogs? It’s been a long time since you posted – hope you are well! Big fan!

  5. Robert Avatar

    Jennifer,
    many thanks for your comment and kind words. True, I haven’t posted anything new in the past one and a half years. I have published quite a few posts in 2020, but there wasn’t much feedback on those articles.
    Most people aren’t reading blogs anymore, they apparently prefer watching YouTube and TicTok videos. Hence, the blog is currently suffering from my lack of motivation. I do have a couple of ideas in the pipeline, but I can’t make any promises that I will be reviving the blog. The existing content will stay online, though.

Leave a Reply to Jennifer Cancel reply

Your email address will not be published. Required fields are marked *