Techniques to Get Past the Limitations - Part 1: Table Calculations
Clearly and Simply proudly presents another guest post: Richard Leeke is kind enough to contribute a series of 3 articles on how to overcome the limitations of my fragmentary and imperfect Site Catchment Analysis with Tableau 6. Richard is a co-founder of Equinox Ltd, a New Zealand based IT consulting firm, where he specializes in performance engineering. And Richard is one of the most active members in the Tableau Forums with an impressive number of almost 1,300 posts so far. A real Tableau expert. Enjoy.
I read Robert’s excellent Site Catchment Analysis with Tableau 6 post discussing the pluses and minuses of Tableau 6 compared with 5.2 and despite being really pleased with what version 6 has delivered, I could immediately relate to the limitations Robert had encountered.
I was fortunate enough to be on the Technology Preview programme for the new Tableau Data Engine and the alpha programme for the rest of version 6, so I have had a few months’ head-start on most of the Tableau community in terms of learning how to get the most out of the new version. One of the things that I have come to realize is that there are lots of completely different ways to tackle the same problem, so I couldn’t resist the challenge of seeing if I could get past any of the limits which Robert described. I managed to do so, and along the way learned a lot and also developed a couple of new (to me, at least) techniques which will be invaluable in my own analysis.
Today’s article describes how to use aggregate table calculations to overcome the limitations of Robert’s site catchment analysis. As always, including the workbook for direct access here on the blog via Tableau Public and for free download.
Tableau 6 is a huge leap forwards, introducing numerous powerful new features and in many cases dramatic performance improvements. However, with that much change, it was probably inevitable that there would be a few things which would become harder, slower, or (dare I say) impossible. But we know that the Tableau folk are very good at listening to feedback, so I’m sure those areas will improve again in due course. Within a couple of days of Robert’s post there was already a comment from James Baker of Tableau saying he had passed on the feedback to others back at base. No promises of course, but we know they are listening.
In his post, Robert identified three classes of issues he had encountered: restrictions on performing further aggregation on the results of aggregate calculations, the fact that you can’t trigger an action on changing a parameter value and some areas where the performance of the version 6 workbook was not as fast as with version 5.2. In this post I will describe how I got past the multiple aggregation roadblocks that Robert hit. Some techniques I used to speed the workbook up (one of which gets it back to about the same speed as version 5.2, though it’s a bit ugly under the covers) will be covered in the next article. The issue I can’t help with is triggering actions from a parameter change. That needs a new feature, which I agree would be very useful. Of course there was no way to do that in version 5.2, either.
Aggregates of Aggregates
It’s actually unfair to say that Tableau 6 can’t do aggregates of aggregates – that is exactly what the new table calculations provide lots more ways of doing. For a start, table calculations perform aggregations across the results of database aggregate queries. There are also lots of circumstances in which you can nest multiple levels of aggregate table calculations. But there are certain restrictions on what you can do with the results of a table calculation, and it does seem quite common to run into those constraints – as a wander through the Calculations section of the Tableau Forum will confirm. Perhaps the biggest problem I have found is that when you do hit a limitation like this, it isn’t always immediately clear what the issue is. I often find myself not knowing whether what I am trying to do is simply not allowed, or whether I am just doing something wrong.
Probably the most common restrictions I have run into are:
- you cannot use various functions on the results of a table calculation,
- in particular, there are a few restrictions around the use of the TOTAL() function, and
- you can’t use table calculation results as addressing or ordering fields within a further table calculation.
The TOTAL() function seems to have a few peculiarities of its own. Whilst TOTAL() is treated as a table calculation, with the same addressing and partitioning options as the others, unlike the rest of the table calculations TOTAL() is actually evaluated in the database before returning the rows to Tableau for further processing. This leads to a few differences between the rules for using TOTAL() and the other table calculation functions. For example, you can’t use TOTAL() on the results of other table calculations (including another call of TOTAL()).
I don’t really understand in any detail the reason for the restrictions on the use of table calculations for addressing or ordering fields, but given the complexity involved in table calculations it is easy to see why such restrictions may be necessary.
In cases where you are not able to nest table calculations, if your data connection is to a SQL database, it is possible (though somewhat tricky) to perform some of the required aggregate calculations in the database, by making use of the RAWSQL functions. This allows you to pass expressions to the database for evaluation. However that option is no longer available with extracts in Tableau 6, as the Data Engine does not have a SQL interface.
In the version 5.2 workbook, Robert used RAWSQL to obtain the latitude and longitude of the location he wanted to center the analysis on. This meant that these values were available in the data source, and could be used in further calculations, such as the distance of each location from the center. The key point here is that all of those calculations were therefore able to be done in the original query. So the root cause of the restriction Robert hit here is the fact that RAWSQL functions are no longer available when using the Data Engine in version 6.
Instead of RAWSQL, Robert used some trickery with table calculation aggregations to make the latitude and longitude values for the selected center location available in every row of the results. But because he had used table calculation aggregations to do that, he was unable to do certain further calculations requiring aggregates, using the approaches that he tried.
When I first looked at the workbook I also ran into issues that seemed to prevent further calculations, so I attacked the problem differently and came up with a couple of other approaches which allow latitude and longitude to be derived without requiring the use of table calculation aggregations at all. These then allowed further analysis using aggregate table calculations without running into those restrictions.
But when I came to write this up, I went back to Robert’s original in order to describe exactly what the restrictions were – and found that I could actually achieve all of the same analysis using Robert’s original approach by just using further table calculations. I think we probably both tried to use a TOTAL() function on the results of the table calculations for latitude and longitude, but that is not allowed. Simply replacing TOTAL() with WINDOW_SUM() gives the results Robert was aiming for.
My modified version of Robert’s workbook below has three dashboards demonstrating the three distinct approaches I have used to getting round the roadblocks. I will describe each in turn. The first one in this article today, the other two in the next part of this series.
The Workbook on Tableau Public
Completing the Analysis Using Robert’s Original Approach
I’ll start by describing the extra calculated fields necessary in Robert’s original to generate the summary statistics for the covered area that Robert had done in his version 5 workbook. This is shown in tab “Dashboard (original approach)”.
Robert was trying to show the total population and number of locations within the specified radius of the chosen center and also to show that population as a percentage of the overall population of Germany. To do this I started by adding the following fields.
[Total_Covered_Locations] = WINDOW_SUM([Is_Within_Radius])
[Total_Covered_Population] = WINDOW_SUM([Covered_Population])
[Total_Locations] = WINDOW_SUM(COUNT([Population]))
[Total_Population] = WINDOW_SUM(SUM([Population]))
[Population % within Radius] = [Total_Covered_Population] / [Total_Population]
Adding these fields to the Level of Detail shelf and into the window title of the “Covered” sheet and setting the addressing so that the window functions operate over a single partition including all rows gives the answers Robert was after. But this approach is far too slow - three minutes to refresh the sheet on my very high specification laptop.
The reason that is so slow is down to an issue with the way the WINDOW_xxx() functions work. I could be kind and refer to this as a “tuning opportunity”, but let’s be honest, it’s a bug. I’ll discuss the reason for it being so slow and a workaround which brings the sheet refresh time down to about 10 seconds later, for now I’ll just focus on describing how this approach works.
The key to this approach is in the order of evaluation of the various aspects of the analysis - in particular, the behavior of the filter on [Is_Within_Radius]. The calculation of [Is_Within_Radius] depends on [Distance] which in turn depends on [Lat_Center_Global] and [Long_Center_Global], which are table calculations using the TOTAL() function. As [Is_Within_Radius] depends on table calculations, the filter on this field does not happen when the rows are initially fetched from the database; rather it is applied in Tableau after evaluation of all table calculations. So all rows for the whole of Germany are fetched and are available for other table calculations, such as the WINDOW_SUMs in the fields I have added.
[Total_Locations] and [Total_Population] just add up appropriate measures over the whole partition, including the rows outside [Radius] which will subsequently be excluded from the final result by the filter. [Total_Covered_Locations] and [Total_Covered_Population] work by factoring in the result of the [Is_Within_Radius] calculated field in the WINDOW_SUM() calculations.
Note that Tableau is being clever here in deciding the order of evaluation of fields, so that table calculations which depend on the result of other table calculations are left till last. And yes, it notices if you attempt to tie it in knots – you get the message “[field] contains a circular reference” if you try.
Deferring the evaluation of filters which depend on table calculations opens up lots of possibilities for calculated fields, but it has one big drawback. It means that all rows need to be returned to Tableau. That works fine for the 8,000 rows in Robert’s workbook, but gets progressively less manageable as the number of rows increases. This dilutes the advantages of the Data Engine’s speed more and more, and for really large data sources will simply cause Tableau to run out of memory and fail to work.
The second part of the series will demonstrate 2 further techniques: Using data blending for lookups and Parameters as a Lookup Table.
Stay tuned.