How to create and use Calculated Fields in Tableau Software, illustrated with a show case of a site analysis for Germany
As soon as your data analyses and visualizations with Tableau Software become more complex, you will soon find that sometimes your data source does not include all the measures and dimensions you need.
One of the most common examples is a data set including the measures revenues and costs, but lacking a profit field.
This is where Tableau’s Calculated Fields come into play. This functionality enables you to add new fields based on dimensions and measures from your data source and/or based on other Calculated Fields.
If you are interested in professional data analysis and visualizations using Tableau Software, you definitely won’t get out of using Calculated Fields sooner or later.
Today’s post discusses the power of Tableau’s Calculated Fields, including a definition, a short how-to tutorial and a few additional tips. A show case of a site analysis for Germany illustrates the use of different types of Calculated Fields including the Tableau packaged workbook for free download.
What are Calculated Fields in Tableau?
As mentioned in the introduction, in most cases your source data will not provide all fields you need to answer all of your questions. Tableau’s Calculated Fields will help you to create all the dimensions and measures needed for your analysis.
Within a Calculated Field you may define a hardcoded constant (like the VAT rate, for instance), do very simple mathematical operations like subtraction (e.g. revenues minus cost), use more complex mathematical formulas, create new conditional categories, do type conversions and much more.
Once defined, Calculated Fields are available across the entire workbook as long as the worksheets are using the same data source. You can use Calculated Fields in the same way you are using the dimensions and measures from your source data.
A lot of people have done their data analysis and visualizations with Microsoft Excel before they joined the Tableau community. With an Excel user’s eye view, a Calculated Field is pretty much the same as a named cell in Excel. It may contain a fixed value or a formula based on the results of other cells. And it may be used as an input parameter for formulas in other cells or simply display the result.
How to Create and Use Calculated Fields in Tableau
Here is a how-to tutorial on Calculated Fields in a nutshell:
- Click on Analysis in the main menu or right click on a dimension or measure and select Create Calculated Field
- The following dialog window appears:
- Define a name for your Calculated Field
- Type in the formula to define your Calculated Field
- Insert fields
You see all fields from the data source and all Calculated Fields (if you already defined some) in an area called “Fields” at the bottom left of the window. Double clicking on one of these fields inserts it into your formula above. - Insert a function
The area “Functions” shows all available functions for Calculated Fields. Please be advised that the basic arithmetic operators are not included here. Again a double click on a function inserts it into your formula. The dropdown list above filters the list of functions by category and the text right to it provides a short information on how to use the selected function.
- Insert fields
- Click on the “Check Formula” button to make sure your calculation is valid and can directly be used in your workbook.
- Finally click OK to create the calculated field.
Now you see it: your new Calculated Field immediately shows up in the data window on the left, either as a dimension or as a measure, depending on the result of your formula. Tableau automatically identifies whether your Calculated Field is a dimension (i.e. returns a string) or a measure (i.e. returns a number). Tableau even recognizes if your Calculated Field returns a date and shows this with the calendar symbol in the dimensions area.
That’s it. Creating a Calculated Field is no rocket science. Sure, you have to get used to the names of the available functions and how to use them. However, the function names and their usage are very often pretty similar to Microsoft Excel functions. If you can write the formula in Excel, you can also write it in a Calculated Field in Tableau.
For more details on Calculated Fields I recommend reading the Tableau Offline Manual (download the pdf-file here 16.5MB).
3 simple recommendations for the use of Calculated Fields
Here are 3 additional tips for creating and maintaining Calculated Fields:
- Use a descriptive name
After creating a Calculated Field, Tableau assigns a default name like “Calculation1”. Take a few seconds to overwrite this by a speaking, descriptive name like “Distance” or “URL_link”. This improves the readability and maintainability of your workbook.
- Comments
You can and should use comments in your Calculated Fields. A comment starts with two slash characters (//). Everything right to these two characters until the end of the line is treated as a comment, i.e. Tableau ignores this text. Like descriptive names, comments increase the readability and maintainability of your workbook.
- Squared Brackets
If you double click on a dimension or measure in the fields area, Tableau inserts this field in square brackets by default (e.g. [fieldname]). As long as the field name does not contain a blank, you can delete the square brackets. If you are writing more complex formulas using a lot of parentheses, this is sometimes helpful to improve the readability of the formula.
The Show Case – Site Analysis of Germany
One small part of a site planning project is the analysis of the catchment area of different locations. Let’s assume you are planning to build a mall, a warehouse or an amusement park and you have several options for the location. Prior to your final decision for one of these alternatives, you will compare the locations using a series of relevant characteristics, like the infrastructure, legal and tax aspects, the neighborhood, etc. A very important characteristic is also the catchment area, i.e. the number of residents living within a certain distance from the location.
Today’s show case is a visualization of the catchment areas in Germany. Let’s assume we want to find the location with a maximum number of residents living within a beeline distance of less than 150 kilometers.
Make an educated guess. What is your first estimation? Berlin? Hamburg? Frankfurt? Munich? Not quite.
Have you ever heard of Kirchhundem? No? Don’t worry, I am living in Germany and I never heard of it before. However, it seems to be the heart of Germany, at least with regards to our question of the maximum catchment area. Almost 27 million people, i.e. almost one third of Germany’s total population lives within a distance of less than 150 km from Kirchhundem (as the crow flies).
Today’s Tableau show case allows catchment area analysis like this, including different visualizations.
The Calculated Fields in the Show Case
It is definitely impossible to cover all aspects and possibilities of Calculated Fields in one example or show case. However, I tried to do my best to find an interesting example including as many types of Calculated Fields as possible.
Here are the types and the Calculated Fields used in the show case:
Definition of constants
Calculated Fields do not necessarily use a formula. You can also define a hardcoded constant by simply typing in a number or a string. Here are the constants defined in Calculated Fields in the showcase:
[Radius]
The radius is a constant used in another Calculated Field called [Distance]. It is the earth’s radius in kilometers (6371). It makes sense to have this separate Calculated Field, since it is more convenient to change the constant here (e.g. switch to miles instead of kilometers) instead of being forced to change all formulas in all Calculated Fields based on this constant.
[Center_PLZ5]
This is another constant defined in a Calculated Field and this is also used in other Calculated Fields. This time, however, we abuse the Calculated Field as an input feature for our visualization. If you want to switch to another center location, simply right click on [Center_PLZ5], choose Edit and change the postcode 57399 to whatever you want (in the only line that is not commented out). In the comments of the Calculated Field you will find some examples.
Simple mathematics
Of course you can do some simple mathematical calculations in Calculated Fields as well. In the show case, for instance, [Population in % of total] calculates the population within the selected radius in percent of the total population of Germany:
Population/Total_Population_Germany
The numerator of the fraction is the measure [Population] from the data source, the denominator is another Calculated Field called [Total_Population_Germany] (definition see below).
Type conversion
Sometimes you have to do some type conversions of the measures in your data source. Again, Calculated Fields are an easy way of doing this. In the data source of our show case the postcodes [PLZ5] are stored as numbers. This is a problem, since there are postcodes starting with a leading zero. Stored as a number, for instance the first postcode is 1067 instead of 01067 (Dresden). We can correct this with a Calculated Field. We convert the number into a string and add a leading zero if necessary:
[PLZ5_str]
If LEN(STR(INT(PLZ5)))=4 Then
"0" + STR(INT(PLZ5))
Else
STR(INT(PLZ5))
End
We use exactly the same formula in [Center_PLZ5_str] to convert the postcode of the center location into a string with 5 characters.
Create an additional field based on a field in the data source
Geocoding data analyses in Germany very often use the so called PLZ2, i.e. the first 2 digits of the German postcodes, which divides Germany into 95 regions. Since we do not have this field available in the data source, we use a Calculated Field to create it based on the Calculated Field [PLZ5_str]:
[PLZ2_str]
LEFT(PLZ5_str,2)
String concatenation
To navigate the Google Map view to the correct URL, we need a dynamic string based on the latitude and longitude of the selected location. I already described this idea in the recent article The Power of Tableau Actions.
[URL_link]
“http://maps.google.com/maps?ll=”
+ STR(Latitude)+","+STR(Longitude)+
"&spn=.05,.05&t=k&hl=en&output=embed"
2 hard coded strings at the beginning and at the end are concatenated with [Latitude] and [Longitude] converted into strings.
Creation of additional categories
If you need to divide your data into different categories that are not included in your source data, you can create a new category with a Calculated Field. In our show case we want to separate the locations (postcodes) into the location defined as the center and all other locations:
[Location_type]
If Lat_Center = Latitude And Long_Center = Longitude Then
"Center"
Else
"Other location"
End
The Calculated Fields [Lat_Center] and [Long_Center] (definition see blow) are compared to the fields [Latitude] and [Longitude] in the data source. If both are equal [Location_type] is set to “Center”, otherwise [Location_type] is “Other location”. Using [Location_type] on the color shelf of the map visualizations enables us to highlight the selected center in red.
Data manipulation
As described above, we want to highlight the center on the map. However, using [Location_type] on the color shelf isn’t enough. Since [Population] is on the size shelf, you will not see the center location if it is a relatively small city. Thus we will do a little data manipulation. We increase the population of the center to a hardcoded 100,000 residents. Thus we make sure that the center is visible on the map:
[Map_Population]
If [Location_type] = "Center" Then
100000
Else
Population
End
Data manipulation sounds evil and usually it is. But this little trick does not distort our visualizations and analysis. We are using [Map_Population] only on the size shelf of the map. All other calculations and visualization are based on the data source field [Population]. Thus, we make sure to calculate and visualize the correct numbers. Please note that we even changed the tooltip on the map to show [Population] instead of [Map_population]. You see: our dirty little data manipulation has no influence on the accuracy of our results.
More complex mathematics
Calculated Fields can also be used to do more complex mathematics. And in our show case we have to: we calculate the beeline distance between the center and any other location using the spherical law of cosines:
distance = R * acos(sin(lat1)*sin(lat2)+cos(lat1)*cos(lat2)*cos(long2−long1))
This formula is based on the (wrong) assumption of a spherical earth, i.e. no ellipsoidal effects considered. However, the resulting errors of less than 0.5% are negligible.
I will not go into the details of this formula. You can find a lot of resources on the net explaining the formula and possible alternatives.
[Distance]
ROUND([Radius] * ACOS(SIN(RADIANS(Lat_Center)) * SIN(RADIANs(Latitude)) + COS(RADIANS(Lat_Center)) * COS(RADIANS(Latitude)) * COS(RADIANS(Longitude) - RADIANS(Long_Center))),2)
Direct access to the source data via SQL
This type of Calculated Fields needs some SQL know-how. Tableau’s so called “pass through” functions allow you to directly access the data source in a Calculated Field e.g. to get around the defined filters used in your workbook.
For instance: we want to show the sum of the filtered population in % of the total population. Since we use [Distance] as a global filter, we can’t get to the total population of Germany without a direct access to the data source in a Calculated Field:
[Total_Population_Germany]
RAWSQL_REAL('SELECT SUM("Population") FROM "TableauExtract"')
This pass through statement retrieves the total number of the German population without considering the filters set in the workbook.
Furthermore, we need 2 other pass through functions to find the latitude and longitude of the selected center location. This time it is a SELECT FROM WHERE statement with [PLZ5_Center] (postcode of the center location) as the WHERE condition:
[Lat_Center]
RAWSQL_REAL
('SELECT "Latitude" FROM "TableauExtract" WHERE "PLZ5" = %1',INT(Center_PLZ5))
[Long_Center]
RAWSQL_REAL('SELECT "Longitude" FROM "TableauExtract" WHERE "PLZ5" = %1',INT(Center_PLZ5))
As I said, not every possible type of Calculated Fields is covered in the show case, but these examples should give you a jump start.
Set up the Tableau workbook and the visualizations
After creating the Calculated Fields, setting up the rest of the Tableau workbook and the visualizations is a piece of cake:
- Create a map with [Location_type] on the color shelf, [Map_population] on the size shelf and [Distance] as an at-most filter. Finally change the title of the map to include the filter and the postcode of the center location
- Make [Distance] a global filter
- Add 3 worksheets to display one single number using the text shelf (locations, population and population in % of total)
- Add 3 worksheets to visualize SUM([Population]) by state, PLZ5 (full postcode) and PLZ2 (first 2 digits of the postcode) using bar charts. Set a predefined sort order descending by SUM([Population])
- Create a dashboard to show all visualizations and the distance filter on one page. Add a web page to the dashboard to show the Google Map view
- Finally implement 3 different dashboard actions to enhance the interactivity with filtering, highlighting and updating the Google Map View on the fly. Pretty much exactly what has been described in the recent article The Power of Tableau Actions
You see: nothing new under the sun. The usual suspects of developing Tableau workbooks and dashboards.
The Downside
This Tableau show case has one major disadvantage: there is no interactive feature on the dashboard allowing the user to select another site to be analyzed (i.e. another center).
If you want to see the catchment area of another location, you have to replace the constant in the Calculated Field [Center_PLZ5] by the postcode of the desired location.
Since you need to edit the Calculated Field, you can’t do this using Tableau Reader. You need Tableau Desktop installed on your computer.
The Results – some selected Screenshots
Here are some example dashboard screenshots. For a radius of 150 kilometers, Kirchhundem wins the race for the most residents in a distance less than 150 kilometers:
At the bottom of the scale we find Dranske on the island of Rügen:
You see the difference: almost 27 million people in a radius of 150 km around Kirchhundem, but only 1.3 million residents in the same distance around Dranske.
The Download Link
I would have loved to publish this on Tableau Public and make it available for direct access here on my blog. However, Tableau Public does not support workbooks using custom SQL statements. Thus, I can only post the packaged workbook for free download:
Download Site Analysis Germany (Tableau packaged workbook, 838.6K)
To open this workbook you need the free Tableau Reader or Tableau Desktop (14-days free trial).
Acknowledgements
Many thanks go again to Giedre Aleknonyte for reviewing my workbook prior to publishing and for her inputs and suggestions – invaluable, as always. Please visit Giedre’s blog add-knowledge and follow her on Twitter. Thanks, Giedre.
I would also like to thank Emil Bruusgaard for providing me with the idea to this article. I suspect Emil didn’t even notice that I took this inspiration from us working together on his Tableau project recently. But I did and this post is the result. Thanks, Emil.
What’s next?
Today’s article was the third post on Tableau in a row. I guess the time may have come to publish some Excel stuff again. Thus, the next posts will show some techniques, charts and dashboards in Microsoft Excel. However, for the Tableau aficionados out there: I am still having one or two Tableau aces up my sleeve.
Stay tuned.
Update on November 28, 2010
With the new Tableau 6 feature Parameters, you can easily implement an input field on the dashboard to let the user define the center of your site analysis. For more details please refer to this follow up post: Site Catchment Analysis with Tableau 6.