Plotting preprocessed polygon data as Cartograms in Tableau: the results of the US Presidential Elections since 1900
Choropleth Maps or Filled Maps (as they are called in Tableau) are a great technique for analyzing geographical data, especially for maps with a high level of detail (e.g. US by counties or ZIP codes). They make it very easy to identify geographical hot spots first and then drill down into these regions using other visualization types.
Having said that, they also have their limitations and disadvantages. Even worse, under certain circumstances Choropleth Maps can be misleading and visualize the data incorrectly.
A classic example of misleading Choropleth Maps are the well known US Presidential Election Maps. We all have seen them, right? A map of the USA with color coded states: a red fill color if the Republican candidate won the state and a blue fill color if the Democratic candidate won (and in some years other colors for independent candidates).
Although these maps correctly depict the geographical distribution of “who won where”, they are usually telling the wrong overall story.
How come?
Traditionally Republicans very often win some of the larger states with a small population density like Alaska, Montana and Wyoming (and thereby only a small number of electoral votes). As a consequence red often dominates the color coding of the map, even if the Democratic candidate won the entire election.
Cartograms are a geographical visualization on a map avoiding this problem. Cartograms are distorting the map by resizing its regions according to e.g. the population, the popular votes or the electoral votes, At the same time the Cartogram algorithm tries to retain the topology of the map as far as possible.
Today’s article presents a dashboard on Tableau Public comparing a Filled Map with a Cartogram for every US Presidential Election since 1900.
Disadvantages of Choropleth (Filled) Maps
As already mentioned in the introduction, Choropleth or Filled Maps are a great geographical visualization technique, but they also come with a few disadvantages:
- they do not allow a direct comparison of regions
- they are visualizing only a snapshot of the data, i.e. no development over time
- they require a lot of real estate on a dashboard
- they do not show the exact values (the only viable option is showing tooltips when hovering over or selecting a region, everything else just clutters the view)
- there are possible perception problems with regards to the size of regions (e.g. Rhode Island on a US map)
- there may be possible misinterpretations because the size of a region may have a greater impact on the user’s visual perception than the intensity of the fill color
Serious limitations, for sure, but if you use Choropleth Maps with caution, they can still be very helpful.
Having said that, depending on the geographical level of detail, the heterogeneity of the size of the regions and the measure you are visualizing, Choropleth Maps can not only be limited, they may even be misleading.
Let’s have a look at one of the standard examples of a misleading Choropleth Map. Here is the Map of the 2012 US Presidential Election (48 continuous states plus District of Columbia, i.e. Alaska and Hawaii not included):
If you look at this map pretending you would not know the results, it looks as if Mitt Romney (red) would have won the election with a wide margin. As we all know, this wasn’t the case. On the contrary.
Now why is the Choropleth Map so misleading in this case?
Let’s have a look at the numbers (again Alaska and Hawaii excluded):
Barack Obama clearly won the election: 53% of the states, 66% of the popular votes, 62% of the electoral votes. To be crystal clear: the third column does not mean 84.5 million Americans voted for Obama. The number follows the “the winner takes it all” system in the US Presidential Elections: Barack Obama won 26 states and those states had a total of 84.5 million votes.
The last two columns of the table make clear why the Choropleth Map is misleading: the 23 “red” states make 56% of the total acreage. That’s why red is dominating blue on the map.
The problem comes from the considerable heterogeneity of the states regarding their acreage and the number of popular votes. A look at “popular votes per square mile” for each state makes this evident:
The numbers range from more than 4,200 (District of Columbia) down to 2.5 votes per square mile (Wyoming). Since the map only shows the continuous 48 US states plus DC, the absolute minimum isn’t even included: Alaska had 0.45 votes per square mile in 2012.
There is only a very weak correlation between votes and acreage (coefficient = 0.37). For a visualization of this data on a Choropleth Map, however, there should be a significant correlation.
Cartograms
Since the Choropleth Map is so misleading and confusing, it shouldn’t be used at all in this case. However, a geographical visualization of election results is definitely a very interesting view.
So, instead of a Choropleth Map, we should use a Cartogram.
What is a Cartogram?
“A cartogram is a map in which some thematic mapping variable […] is substituted for land area or distance. The geometry or space of the map is distorted in order to convey the information of this alternate variable.
[…]
An area cartogram is sometimes referred to as a value-by-area map or an isodemographic map, the latter particularly for a population cartogram, which illustrates the relative sizes of the populations of the countries of the world by scaling the area of each country in proportion to its population; the shape and relative location of each country is retained to as large an extent as possible, but inevitably a large amount of distortion results.” (source: Wikipedia - Cartogram)
For the example of the US Presidential Elections, we could use the population, the electoral votes or the popular votes per state to scale the areas. I decided to go with the popular votes and created this Cartogram of the 2012 election results:
Now this is more what we expected to see, isn’t it? Blue clearly dominates over red. The size of all blue areas add up to 66% of the total size of all regions and thereby represents the number of votes in the states won by Barack Obama.
Well, truth be told, this isn’t 100% accurate. The area of all blue states in the Cartogram is 65.85% of the total area instead of 65.89% (democratic votes). However, this is close enough in my book.
The Tableau Public Visualization
My favorite Bible joke is the one about the 10 commandments:
Moses comes down from the mountain Sinai and speaks to his people: “I have good news and bad news. The good news: I brought him down to ten. The bad news: adultery is still in.”
Why am I telling you this joke? Well, regarding Cartograms in Tableau, I have good news and bad news, too.
The good news: you can use Tableau to plot data as a Cartogram using the Polygon Map approach.
The bad news: Tableau has no built-in feature to create the distorted polygon data. You have to calculate the polygons first with another tool and use the results as a data source in Tableau.
Although Cartograms are not a native feature of Tableau, Tableau is a very good and fast option to visualize Cartogram polygon data. Let’s have a look at a larger data set. Here is an interactive dashboard showing a Filled Map and a Cartogram for all 29 US Presidential Elections since 1900 on Tableau Public:
Select a year from the drop down filter, click on a state to highlight it in both views and hover over the states to see more information in the tooltips.
Collecting and Preprocessing the Data
The laborious part of this project was collecting and preprocessing of the data. This is what I did:
- Election Results
I compiled a database including the year, the state, the number of popular votes per state and the winner per state from The American Presidency Project. Many thanks to John T. Woolley and Gerhard Peters over at the University of California, Santa Barbara for providing the data. - The Polygon Data of the United States
As the starting point for the Cartogram algorithm, I needed the polygon data of the boundaries per state. I first tried to get the polygons from an ESRI Shapefile using Richard Leeke’s ShapeToTab utility (more here: Create Your Own Filled Maps in Tableau), just like I did in the previous post (Create Excel Freeform Shapes from Polygons). This worked like a charm, but I ended up with more than 30,000 data points. This was way too detailed for my purposes and it was bloating the database.
Hence, I used the US polygons provided by Tableau in this knowledge base article: Creating Polygon-Shaded Maps. Less than 7,000 data points and still detailed enough for drawing a nice polygon map in Tableau. - The Size of the States
I took the areas of the states from this Wikipedia page: List of U.S. states and territories by area. This isn’t really necessary, neither for the Cartogram algorithm nor for the Tableau visualization. I included the data only to be able to show the area in the tooltips. - The Distorted Data – Polygons of the Cartograms
This was the hardest part. I searched the Internet and found a wealth of information on Cartograms: websites explaining the concept, papers describing different algorithms and even the open source code of some Cartogram implementations in C, Python and others.
I could have used one of them, but I wanted to fully understand how the algorithm works. Thus, I decided to write my own code to implement a Cartogram algorithm. For no particular reason, I used “An algorithm to construct continuous area cartograms” by J. Dougenik, N. Chrisman and D. Niemeyer, published in “Professional Geographer” back in 1985. The article is 30 years old and in the meantime more sophisticated algorithms have been developed, but as a starting point, this paper was perfect for me.
I implemented the algorithm, fed in the data, created 29 Cartogram polygon data sets for all US Presidential Maps since 1900 and ended up with a database I could use as the source data of the Tableau workbook shown above.
The Implementation in Tableau
After I had my ducks in a row (the database), the implementation of the views and dashboards in Tableau was a walk in the park:
- The Filled Map view is the usual suspect: double click on the [State] dimension, select filled maps on the Marks Shelf, put [Winner] on the Color Shelf and [Year] on the Filters Shelf. The rest is just formatting
- The Cartogram uses the Polygon Marks Type. Have a look at this Tableau knowledge base article for the details: Creating Polygon-Shaded Maps
- For some of the information shown in the tooltips, I used the new Tableau 9 Level of Detail functionality. I won’t go into the details here. If you are interested, download the workbook and have a look for yourself.
As I said, the laborious part was the data collection and preprocessing. Creating the Tableau views and dashboards took less than 2 hours.
The Drawbacks of Cartograms (generally and in Tableau)
Cartograms are a great alternative when Choropleth Maps fail. However, they also have some disadvantages:
- They aren’t very common. If you show a Cartogram to your Management Board or client for the very first time, I bet you will look into very astonished faces
- Cartograms aren’t a Tableau native feature. You need another tool or – as I did – implement an algorithm and create the data outside of Tableau
- The more different Cartograms you need for your Tableau dashboard, the more time it takes to create the polygons and the more the database will be bloated. For instance, 6,710 points (the polygons of the 49 regions) times 29 elections already make more than 194,000 rows. Imagine you need more Cartograms and / or a higher level of detail; e.g. counties or ZIP-codes
- You have to know in advance which Cartograms shall be shown. This is obvious in the example I used (one Cartogram per election), but it may turn into a problem if you have a lot of possible combinations / filters
- Cartograms try to retain the original topology of the map, but per definition they distort the regions. Retaining the topology works well for the overall map (especially in Tableau which displays the real map in the background), but you can’t really recognize the individual regions anymore.
An example:
Be honest: would you have recognized Wyoming by its shape without the tooltip?
Agreed, some disadvantages, but Cartograms still display the data in the US Presidential Elections showcase more accurately than a Choropleth Map. They definitely are a viable and useful visualization technique for geographical data.
I had a lot of fun with Cartograms, so I may do some more work on this and maybe write another post. Let’s see. In the meantime, let me know what you think about Cartograms.
Stay tuned.