An improved version of a Microsoft Excel Choropleth Map with a better performance for detailed maps
Very soon after starting this blog in 2009 I published a post with a set of Microsoft Excel Choropleth Map templates. This post is still one of the most popular articles and downloads here.
A lot of related posts followed and I am feeling very honored that my blogging colleague and France’s data visualization guru Bernard Lebelle of Impact Visuel used 2 of my maps published here on Clearly and Simply in his great new book “Convaincre avec des graphiques efficaces”. Bernard was kind enough to point his readers to my blog in the book. This is much appreciated. However, he should have heaped the praise on Tushar Mehta, who invented this technique. I only “stole” Tushar’s idea.
Tushar’s approach works great and I know from comments and emails that a lot of my readers have used it with great success.
There is only one minor drawback with Tushar’s approach: the performance decreases considerably when using it on a map with a lot of shapes, like the US by Counties.
Today’s post tries to heal this. It discusses how to considerably improve the performance of a detailed map. The article describes the original approach, the optimization potential, the improved implementation and - as always – provides the Excel workbook for free download.
The Original Approach
I know I mentioned this on several occasions before (and in the introduction, too), but I want to reemphasize it once more: the approach how to create Choropleth Maps in Excel wasn’t my idea. The whole concept and the VBA code is the brainchild of Tushar Mehta, long time Microsoft Excel MVP, who published this with an example workbook (Conditional Color of Shapes) a long time before I wrote my first article on Choropleth Maps with Excel here.
Here are the cornerstones of Tushar’s technique:
On the worksheet(s):
- A map with one Microsoft Office shape for each region (e.g. US state)
- Each shape has a name (e.g. S_CA for the shape of California)
- One named range for each cell containing the data to be visualized (e.g. D_CA for California)
- A lookup table assigning the named shapes to the named ranges (the data). This range with this lookup table is called “MapNameToShape”
- Another lookup table assigning RGB values (the colors) to a defined number of bins (thresholds to define which value will be visualized by which color). This lookup table has the range name “MapValueToColor”
The VBA consists of 2 main routines:
The sub UpdateMap loops through all cells of the lookup table MapNameToShape and calls the second sub “CheckColor” for each shape / data point.
The sub CheckColor looks up the shape name of the currently processed region (passed through by the calling sub UpdateMap) and detects the RGB color for the current value using the worksheet function VLOOKUP on the table MapValueToColor. Finally it sets the fill color of the current shape to this RGB color.
Named ranges and lookup tables in the worksheet, 30 (!) lines of VBA code and you are good to go. As I said in my first post on Choropleth Maps, Tushar’s idea is as brilliant as it is simple and I can’t thank him enough for sharing this technique.
The Challenge
Tushar’s technique is really straightforward and pretty easy to implement. Even more important it works well and proved to be very stable with all maps I ever used it for.
There is only one little downside: the performance of the code considerably decreases if you are using maps with a lot of regions. The US map by counties (more than 3,000 shapes), for instance, takes more than 21 seconds on my computer to update. This may count for little if you need to create a map as a one-off solution e.g. for inserting it into your PowerPoint slide deck. However, if you want to use a big map with a lot of shapes on an interactive dashboard and the map needs to be updated on user request, the performance is a real issue. 20 seconds to update a dashboard is everything else than a perfect user experience.
Good news: Tushar left some room for improvement. The VBA code is not as simple anymore, but the performance gain is worth the effort.
The Background
Why is the code becoming so slow with a large number of shapes?
Root cause of most performance issues of Excel VBA workbooks is the fact that Microsoft Excel and VBA are 2 separate applications. Going through the barrier between Excel and VBA is really costly in terms of performance.
With “going through the barrier” I mean reading data from the workbook by VBA, writing results from VBA to the workbook or using Microsoft Excel worksheet functions (Application.WorksheetFunction) in the VBA code. It goes without saying that it is inevitable to go through this barrier when using VBA in Excel workbooks. However, if you want to maximize the performance of your Excel VBA model, you should try to minimize the number of times your code is going through the barrier.
Exactly this is the starting point for today’s post. The original approach
- reads one cell to get the name of the named range of the data point
- reads the value of the named range
- detects the name of the shape
- checks if the value belongs to the first bin
- if the value belongs to the first bin, it gets the first RGB color
- otherwise it uses Excel’s VLOOKUP function to get the RGB value from the table “MapValueToColor”
- finally it formats the shape with the fill color found in step 5 or 6
Each of those steps goes through the barrier at least one time (the VLOOKUP even more often).
This is no problem as long as you are using a US map by state (i.e. 50 shapes) or the World Map (i.e. ~ 190 countries), but it definitely becomes an issue with several thousand shapes.
The Improved Version
What’s different in the improved version?
On the worksheet(s):
- The named ranges for the data points of each shape / region (e.g. D_CA) have been replaced by one named range (“myValues”) covering all data values of all regions. As a consequence, the named range “MapNameToShape” does not exist anymore.
- An additional column on the worksheet [data] contains the index number of each shape. This index is used in the VBA to assign a value to a shape. The VBA also contains a sub to create the values of this column automatically after inserting another map (see also section “The Bonus Features” below)
There are quite a few other things I changed on the worksheets, but all of those are for implementing the additional features which are not essentially for the pure Choropleth Map functionality.
The VBA Code:
I will not go into each detail of the VBA code here. I tried to comment the code the best I could, so if you are interested in the details, download the workbook and have a look for yourself. If you have any questions, please leave me a comment.
Here are just the major changes of the new approach in a nutshell:
- The code is not reading the data, the bins and the shapes step by step, but rather pushes them into VBA arrays in one go
- The code does not use the worksheet function VLOOKUP anymore
- The values / shapes are assigned to the bins, i.e. the code detects which shape / value belongs to which bin and which color it will get
- A shape range object is used to format the shapes not one by one, but rather format all shapes belonging to one bin in one go
This new approach minimizes the number of times the VBA has to go through the barrier to the Excel workbook and thereby considerably improves the performance of the map.
Considerably? Really?
Yes.
Updating a map now takes between 2 to 3 seconds on my computer (a bit more with the progress information in the status bar, see below) and compared to the 21 seconds we had before I think it is fair to call this a considerable improvement. Agreed, still not perfect, but way better than before.
The Bonus Features
In order to provide a nice workbook for your testing, I added a few new features:
- Real Data
The Choropleth Map visualizes the US unemployment rates from 2003 to June 2012 by county (source: Bureau of Labour Statistics).
- Different Color Scales
The workbook includes 23 different color scales I mostly took from the great online tool Color Brewer (color advice for cartography). You can change the colors of the map by simply selecting a new scale from a drop down.
- Legend
A legend on the map shows the colors and the size of the bins they represent next to the map. It goes without saying that this legend is automatically updated after you selected a new color scale.
- Optional Status Bar Progress Information
The performance of updating the map is somehow borderline in terms of user experience. Much faster than before, but still 2 or 3 seconds to update the map. You may want to keep your user informed about what is happening by displaying progress information in Excel’s status bar. However, updating the status bar takes additional time, so you can chose on worksheet control whether you want to do this or not.
- Recreate the Shape Index
Last but not least the code also provides a sub to automatically recreate the index of the shapes (see also the section “The Improved Version” above). Not a feature for everyday use, but helpful if you want to use the workbook as a template, insert your own shapes and create your own Choropleth Map.
That’s it.
The Download Link
Here is the Microsoft Excel workbook including the faster version of the Choropleth Map (United States by County) for free download:
Download Choropleth Map US Counties Faster (Microsoft Excel 2007/2010 workbook, 2,505.7K)
What’s next?
I am having one or two more ideas on VBA performance and I hope to publish them soon. Also, some interesting Tableau posts are still waiting in my pipeline.
More things to come soon, so please stay tuned.