The Cartogram Data Generator: a free Excel Tool to create your own polygon data for plotting Cartograms in Tableau
One of the recent posts showed how Cartograms can be a viable alternative when Choropleth or Filled Maps are stretched to their limits and tell the wrong overall story:
The article included a Tableau Public dashboard comparing Filled Maps with Cartograms for the results of the US Presidential Elections since 1900.
Since Tableau does not provide Cartograms as a built-in chart type, the distorted polygons have to be calculated outside of Tableau and then used as a data source to plot the maps using Tableau’s polygon map approach.
Today’s article is the follow-up post. It provides a free and open Microsoft Excel workbook to create Cartogram data with a few mouse clicks: copy in the data and the original polygon points of all regions and let the tool create a ready-to-use data set to plot Cartograms in Tableau.
How to use the Cartogram Data Generator
Step 1: Download the Tool
First download the Cartogram Data Generator:
Download Cartogram Data Generator (Microsoft Excel 2010 – 2013, 991.8K)
Step 2: Collect the Data
The tool requires a defined set up of the data and the polygons of the regions.
The data set up looks like this:The first three columns of the table represent the regions:
-
an ascending index (ID) starting at 1
-
the region’s abbreviation
-
the full name of the region
The following columns (from column E to the right) contain the data of the measures you want to create Cartograms for.
In the EU example shown in the screenshot, the table has 28 regions (EU member states) and 12 measures (Members of Parliament, Population 2014, etc.). You can expand this table to more regions and more measures, but it is crucial that everything is inside the table “tab_data” (List Object). This is important, because the code is referencing this table name and you therefore should not delete or rename it.
Step 3: Collect the Polygons
Next step is collecting the polygon data for all regions. In the end the polygon data has to look like this:
The first two columns contain the regions (ID and abbreviation), followed by the point order of the polygon and the latitude and longitude of this point.
You can find ready-to-use polygon data for some states or continents on the Internet. If you do not find what you are looking for, you can still try to find an ESRI Shapefile and use Richard Leeke’s ShapeToTab utility to transfer it to polygon data (more at the end of this article: Create Your Own Filled Maps in Tableau).
There is one very important thing to know: the algorithm handles only one polygon per region. If you have regions in your data with more than one polygon, you need to find out which one is the largest (e.g. the main land without the islands around it) and delete all others.
Again, do not rename or delete the table “tab_polygons”.
Step 4: Copy the Data and Polygons to the Tool
Open the tool with Microsoft Excel and make sure macros are enabled.
I left the example data of the European Union in the tool for demonstration purposes, so you have to clear the tables first.
-
go to the worksheet [data]
-
clear the existing values (from row 6 downwards)
-
clear the header row for the measure columns (cell E5 to the right).
-
add as many rows (regions) and columns (measures) as you need
-
insert your own data in the data body range and the measure names in the column headers
-
if applicable, delete unused (empty rows and columns) of the table
Do the same for the table “tab_polygons” on worksheet [polygons].
Step 5: Check Data Validity and Integrity
The tool / algorithm does not perform any checks on the validity, integrity or plausibility of the data. It expects to find a valid data set and valid polygons. Hence, data integrity and validity is in the user’s responsibility and you should especially check the following points:
-
the IDs of the regions are unique (no duplicates)
-
the values in the measure columns are either numbers or empty cells, i.e. no texts. Make sure the values are real number, not texts just looking like numbers
- all data rows of one region in the table “tab_polygons” (in the defined point order) make a valid, closed polygon, i.e. coordinates of the first and last point are identical
-
the IDs of the regions in the data table and the polygon table correspond correctly, e.g. if ID of Austria is 1 in the data table, all points with the ID 1 in the polygon table make the valid polygon of Austria
-
each region has exactly one polygon (see step 3): if a region / country in your polygon data consists of more than one polygon, you have to delete all except for the largest one, e.g. delete all islands of Italy, Spain, United Kingdom, etc. and keep only one polygon per country (the main land)
Step 6: Adjust the settings
Go to the worksheet [dashboard] and adjust the settings:
There are only two things you may or may not want to change:
-
The number of iterations of the algorithm (cell C4)
The originators of the algorithm (J. Dougenik, N. Chrisman and D. Niemeyer, 1985) recommend a minimum of 8, but I would even set this to 12 or 15.
The more iterations, the better the results and the longer the runtime and vice versa. -
The option to unpivot the data by clicking the checkbox
Unpivoting means the output of the data will be a normalized table, i.e. the measure columns will be transferred to two columns: the first including the measure name, the second including the value. This is especially helpful if the measures in your cross tab data are e.g. years (like in the US Presidential Election example of the previous post).
Step 7: Run the Algorithm
Go to the tab CARTOGRAM DATA GENERATOR and click on the icon “Create Cartogram Data”:
The algorithm runs and shows where it is in Excel’s status bar at bottom left:
Please be patient, this may take a few minutes. The runtime of the algorithm depends on
-
the number of regions
-
the number of measures (columns of the table “tab_data”)
-
the number of polygon points (rows of the table “tab_polygons”)
-
the number of iterations you defined in step 4
Agreed, it isn’t really fast, but it is not too bad. For the EU example used here (28 states, 12 measures, 23,637 polygon points, 15 iterations), it took ~ 3 minutes on my machine (without unpivoting). A welcome opportunity to grab a coffee, but not enough time to walk the dog. I am sure you will find something useful to do while the algorithm is running.
Step 8: Check the Results
On the worksheet [dashboard] you can see the main results of the algorithm:
The key performance indicator of the algorithm is the size error, i.e. the difference between the desired area of the region (according to the measure value) and the actual size of the polygon of this region in the Cartogram.
The dashboard shows various statistics of the size error: minimum, maximum, average, weighted average, standard deviation, 25%, 50% and 75% percentiles and the count of areas with a size error larger than the standard deviation. At the bottom of the dashboard you see the size errors by region.
The dashboard is using a fixed cell range and displays only the first 10 measures and the first 100 regions. As mentioned above, you can use more regions and measures in the tool, but the dashboard will only show the first 10 respectively 100. These limits may be increased by expanding the cell ranges on the dashboard and copying the formulas, but you would also have to change some constants in the VBA code. In any case, 10 measures and 100 regions already give a very good impression of the results.
With this dashboard, you can evaluate the quality of the algorithm’s outcome, i.e. how good the Cartogram polygons represent the values in the data. If the size errors are too high, you need to increase the number of iterations and try again.
Step 9: Check the Output
After the algorithm is finished, you will find a new, time-stamped Excel workbook in the same folder, called e.g. “cartogram_data_2015-06-20 12-00.xlsx”. This workbook contains two sheets: the data (as a cross tab or a normalized table, depending on the unpivot checkbox) and the Cartogram polygon data. You can directly connect to this data set with Tableau or import it into your database.
If you selected the unpivot option, you may want to change the generic column headers “Measure Name” to e.g. “Year” and “Measure Value” to “Popular votes”. You can do this directly in Excel or change the names later in Tableau.
Step 10: Create the Cartograms in Tableau
Let’s say we are connecting Tableau directly to Excel.
Case 1: Without unpivoting
Open Tableau, connect to the Excel workbook created in step 5 and drag both sheets to the top area of the data source page:
Tableau automatically sets the join by ID, so no need to change anything here.
Go to the worksheet and you can plot your Cartogram in Tableau using the polygon map approach and filtering by one of the measure names:
Again open Tableau, connect to the Excel workbook and drag both sheets to the top area of the data source page:
Make sure the join is by ID and Measure Name (or whatever you called the field in the Excel workbook).
A few more mouse clicks for the polygon map approach and you will see this:
Another Example – Cartograms of the European Union
You may have seen the Cartograms with Tableau in action already in the previous post, but I wanted to show another example: some selected statistics of the European Union displayed on a Filled Map and on a Cartogram. The dashboard lets you select the measure to be used to resize the regions and the measure to color the map:
The Implementation
As mentioned in the previous post, the Cartogram Data Generator uses “An algorithm to construct continuous area cartograms” by J. Dougenik, N. Chrisman and D. Niemeyer, published in “Professional Geographer” back in 1985. This paper is 30 years old and in the meantime there are more sophisticated and probably faster options available, but it still does the job.
My contribution to this is minimal: I simply implemented Dougenik et al’s algorithm as it is in Excel and VBA. Although it definitely is one of the more complex VBA projects published here on Clearly and Simply, I won’t go into the details. The VBA project is open, the size is manageable (17 procedures with ~ 500 lines of code) and I tried to clearly comment every procedure, variable and section of the code. If you are interested in the implementation, please have a look for yourself. If you have any recommendations how to improve the code, please leave me a comment or drop me an email. Any input is much appreciated.
What’s Next?
That’s it with Cartograms in Tableau for the time being.
Having said that, with this VBA implementation of a Cartogram algorithm and some of the techniques posted here previously (Faster Choropleth Maps with Microsoft Excel and Create Excel Freeform Shapes from Polygons), we have all components available to dynamically plot Cartograms in a stand-alone Excel workbook. We just have to combine them in one workbook. The next post will explain how to create Cartograms directly in Excel and provide an example workbook for free download.
Stay tuned.
Update on June 26, 2015
I just noticed that there were a few smaller bugs in the Cartogram Data Generator originally posted for download. The results of the algorithm (the Cartogram polygon data) were correct, but there have been some issues with the statistics displayed on the dashboard. I fixed the bugs and updated the post now.
To the few readers (less than 30) who downloaded the tool already: I apologize for any possible confusion and inconvenience. Please download the updated tool again.