Concatenation, Conversion, Analysis and Extraction -
44 Formulas to work with Strings in Tableau’s Calculated Fields
“String Calculations” is a somehow weird expression. Calculations on texts sounds like a contradiction in terms.
Of course you do not really calculate strings. You manipulate and analyze them like concatenating texts, changing texts (e.g. to upper, lower or proper case), converting texts or parts of texts to numbers or dates, extracting parts or analyzing them (e.g. how many words or do they contain a number), etc.
If you do not have the option to do this type of things directly in your database, you will use Calculated Fields in Tableau Software to get what you want from the text dimensions in your data source. That’s why I called this post String Calculations in Tableau.
Today’s post contains a set of 44 more or less practical examples of concatenation, conversion, analysis and extraction of texts. I will not go into the basic string functions of Tableau, like LEFT, FIND, LEN, REPLACE, etc. You can easily look up how they work in the manual or read the explanations directly in the Calculated Field editor.
I rather tried to pull together a small library of 44 more complex formulas you may find useful when you have to work with strings in Tableau, like concatenate strings and a date, convert a string to a date, reverse words in a string, extract parts of a string, remove line feeds, check if a string contains a number, count the number of words in a string and many more.
The article lists and explains all 44 formulas. I do not delude myself into believing anyone would read today’s article from start to finish. It is more a reference type of post and this is on purpose.
However, I recommend having a brief look inside, even if you are not looking for a certain string calculation in Tableau at the moment. I am starting the article with a little text visualization example and I am also providing a Tableau packaged workbook (on Tableau Public ) including all examples for free download at the end of the post.
The King James Bible goes Tableau – A (simple) Use Case
Before I am coming to the very bone-dry and lengthy explanations of string calculations in Tableau, I felt I should do something motivating to get started: a small Tableau visualization putting at least one or two string formulas to practice.
Here it is.
The Good Book goes Tableau. A Tableau dashboard to explore the King James Bible:
What can you do with this visualization?
You can select a book at the top and read all verses of this book, if you want to. You can also filter this view by searching for a word or phrase. The bar chart below the text table shows statistics of all books (respectively of all books containing the search string): count of the verses, count of words, average words per verse and (if specified) occurrences of the search string. The selected book is highlighted in red.
Play around and explore the King James Bible using Tableau. For instance, search for names like “Moses” or “Job” or other words like “good” and “evil” or whatever you like.
Nothing spectacular, I know. Just a little example to demonstrate how string calculations could be used in Tableau.
Acknowledgement
Believe it or not, I didn’t compile the data for the King James Bible visualization on my own. I simply took it from John Walkenbach’s The Spreadsheet Page. It’s funny, because John also wrote all the Excel Bible books and then he published the real Bible in an Excel spreadsheet. Arizona-type of humor, I guess.
I really doubt John will ever read this, but anyway: many thanks John, for everything you did for the Excel community in the last 20+ years. I learned a lot from you. Much appreciated.
Now on to the academic part for the Tableau developers among you: 44 string calculation examples in Tableau Software.
The Database
For the development and test of this little library of string calculations in Tableau, I created a dummy data set of 200 fake records using the Fake Name Generator. The data is made up and totally pointless, but to understand the field names used in the formulas below, you should have a quick look on what is in the data source:
If you are interested in downloading the Excel workbook with this data, see the section Download Links at the end of the post.
Concatenation and Insertion
01 - Concatenate strings
Description: Concatenate two or more strings (dimensions and hardcoded)
Example: Concatenate [GivenName], [MiddleInitial] (if applicable) and [Surname] to one string
[GivenName] + " " +IfNULL([MiddleInitial] + ". ","") + [Surname]
How does it work?
The existing dimensions are concatenated using the plus operator (+). The IFNULL function checks, if a middle initial exists. If so, the spaces and the field [MiddleIntital] with a period are inserted between [GivenName] and [Surname]. If not, the space is inserted between [GivenName] and [Surname].
02 - Concatenate strings and date
Description: Concatenate two or more strings and a date or a part of a date
Example: Concatenate [GivenName], [MiddleInitial] (if applicable), [Surname] and the year of the birth to one string
[GivenName] + " " +IFNULL([MiddleInitial] + ". ","") + [Surname] +
" born in " + STR(YEAR([Birthday]))
How does it work?
For the first part of the formula, see Calculated Field "01–Concatenate strings".
At the end a hard coded text (" born in ") is added, followed by the year of birth, converted from the dimension [Birthday] using the functions STR(expression) and YEAR(date).
03 – Insert line feed
Description: Concatenate two or more strings and insert a line feed between them
Example: Concatenate [GivenName] and [Surname] to one string and insert a line feed in between
[GivenName] + CHAR(10) + [Surname]
How does it work?
CHAR(10) inserts a line feed. 10 is the ASCII code number of the line feed character.
Instead of using CHAR(10) you can also insert a line feed directly in the Calculated Field Editor like this:
[GivenName] + "
" + [Surname]
04 – Insert quotation marks inside of strings
Description: Concatenate strings and insert quotation marks inside the string
Example: Concatenate [GivenName] and [Surname] and add the initials in brackets and quotation marks at the end
[GivenName] + " " + [Surname] +
" (""" +
LEFT([GivenName],1) +
LEFT([Surname],1) +
""")"
How does it work?
Use double quotation marks to create quotation marks inside a hardcoded string.
05 – Fill string to right
Description: Fill a string after the last character to a specified total length with a specified character
Example: Fill the field [Surname] with e.g. hyphens or pipes (the value of [STR Parameter 1]) to a maximum length of e.g. 20 or 30 (value of [INT Parameter 1])
IF LEN([Surname]) < [INT Parameter 1] THEN
[Surname]+
REPLACE(SPACE([INT Parameter 1]-LEN([Surname]))," ",LEFT([STR Parameter 1],1))
ELSE
[Surname]
END
How does it work?
SPACE(number) creates a string with the specified number of blanks. All blanks in this string are then replaced by the specified character and the string is added to the end of [Surname].
LEFT([STR Parameter 1],1) ensures the fill text has only one character. The IF clause makes sure that fill characters are only added if the length of [Surname] is smaller than the specified maximum number ([INT Parameter 1]).
In the view (the table) this only looks good, if you are using a mono spaced font like Courier. This example may not have many practical uses cases, but the technique can be helpful as it mimics Excel’s REPT function which isn’t available in Tableau. You can – for instance – use this to fake a horizontal bar chart within a text, similar to what Andy Cotgreave did here: Charts inside a tooltip.
06 – Fill string from left
Description: Fill a string left to the first character to a specified total length with a specified character
Example: Fill the field [Surname] with e.g. hyphens or pipes (value of [STR Parameter 1]) from left to a maximum length of e.g. 20 or 30 (value of [INT Parameter 1])
IF LEN([Surname])< [INT Parameter 1] THEN
REPLACE(SPACE([INT Parameter 1]-LEN([Surname]))," ",LEFT([STR Parameter 1],1))+
[Surname]
ELSE
[Surname]
END
How does it work?
See Calculated Field [05 – Fill string to right]. The only difference is adding the field [Surname] to the created string instead of adding the created string to [Surname].
As mentioned above, this only works with mono spaced font types. A more practical application for this technique would be to e.g. left pad ZIP codes with zeros to a defined total number of digits, if leading zeros were lost during the ETL process.
07 – Ordinal numbers
Description: Convert a number into a string in English ordinal numbering format
Example: Ordinal numbers of field [Number], i.e. 1 = 1st, 2 = 2nd, 3 = 3rd, etc.
STR([Number]) +
IF RIGHT(STR([Number]),2) = "11" OR
RIGHT(STR([Number]),2) = "12" OR
RIGHT(STR([Number]),2) = "13" THEN
"th"
ELSE
CASE RIGHT(STR([Number]),1)
WHEN "1" THEN "st"
WHEN "2" THEN "nd"
WHEN "3" THEN "rd"
ELSE "th"
END
END
How does it work?
STR(expression) converts a number into a string and RIGHT(string, n) returns the last n characters from a string. Depending on the last 2 respectively the last character of the string, an IF clause and a CASE statement assign the correct extension to the number:
If the last 2 characters are 11, 12 or 13, the correct extension is “th”. In all other cases, it depends on the last character: 1 = “st”, 2 = “nd”, 3 = “rd”, all others are “th”.
Conversion and Manipulation
08 – One letter upper case
Description: Extract the first character of a string and make it upper case
Example: Convert [Gender] from “male” to “M” and from “female” to “F”
UPPER(LEFT([Gender],1))
How does it work?
LEFT(string, 1) returns the first character, UPPER(string) converts it to upper case.
09 – Number to upper case character
Description: Convert a number to the corresponding upper case character in the alphabet (1=A, 2=B, ..., 26=Z)
Example: Convert the field [Number] to the corresponding character in the alphabet
CHAR(65+([Number]-1)%26)
How does it work?
Upper case letters in the ASCII table start at 65, i.e. add the calculated number minus 1 to this starting number.
CHAR(number) returns the character of the ASCII code. The modulo operator (%) transfers any positive integer number into the range from 1 to 26:
1=1,..., 26=26, 27=1, 28=2,...,52=26, 53=1,54=2 etc.
Example calculation: [Number] = 49
65 + (49-1) % 26 =
65 + 48 % 26 = // modulo operation: 48 divided by 26 leaves a remainder of 22
65 + 22 = 87
CHAR (87) = “W”
10 – Number to lower case character
Description: Convert a number to the corresponding lower case character in the alphabet (1=a, 2=b, ..., 26=z)
Example: Convert the field [Number] to the corresponding character in the alphabet
CHAR(97+([Number]-1)%26)
How does it work?
See explanation of Calculated Field [09 - Number to upper case character]. It’s the same formula and logic except for starting at ASCII code 97 instead of 65.
11 – String to Date
Description: Convert a string to a date
Example: Convert field [String Date] in format "YYYYMMDD" (e.g. 20140131) into a date (e.g. 01/31/2014)
DATEPARSE("yyyyMMdd",[String Date])
How does it work?
DATEPARSE (format, string) converts a string to a date in the specified format.
12 – Date string
Description: Convert a date into a string in a defined format (other than the date field's default format)
Example: Convert the date [Birthday] into a string in format "Month Day, Year", e.g. "February 12, 2014"
DATENAME('month',[Birthday])
+ " " +
STR(DAY([Birthday]))+
", " +
STR(YEAR([Birthday]))
How does it work?
DATENAME returns the month of [Birthday] as a string. Finally add the day and year converted to strings at the end. STR(expression), converts an expression to a string. DAY (date), MONTH (date) and YEAR(date) return the date parts.
13 – Text bins
Description: Assign strings to bins according to their first character
Example: Assign the field [Surname] according to the first character to bins of "A-K", "L-Q" and "R-Z"
IF UPPER(LEFT([Surname],1)) <= "K" THEN
"A-K"
ELSEIF UPPER(LEFT([Surname],1)) <= "Q" THEN
"L-Q"
ELSE
"R-Z"
END
How does it work?
UPPER(LEFT([Surname],1)) returns the first character of the [Surname] converted to upper case. "K" and "Q" in the IF ELSE clause define the bins A-K, L-Q and R-Z. Adjust the characters and the texts as wanted.
14 – Reverse 2 words
Description: Reverse the order of 2 words in a string
Example: Reverse the first name and surname in the field [Name, Given Name] (separated by comma): “Doe, John” is turned into “John Doe”
MID([Name, Given Name], FIND([Name, Given Name],",")+2) + " "
+ LEFT([Name, Given Name],FIND([Name, Given Name],",")-1)
How does it work?
Get the given name using the function MID. Get the substring from 2 characters after the position of the comma (detected by FIND) and return all remaining characters (optional parameter [length] of MID is omitted). Add the all characters from the beginning of the string (LEFT) to the position of the comma (FIND) minus 1 (i.e. don’t return the comma).
15 – Reverse 3 words
Description: Reverse the order of the words in a string with exactly 3 words
Example: Reverse the first name. middle initial and surname in the field [Full Name] to surname, given name, middle initial: “John F. Doe” is turned into “Doe, John F.”
01: RIGHT(
02: RIGHT([Full Name],LEN([Full Name])-FIND([Full Name]," ")),
03: LEN(RIGHT([Full Name],LEN([Full Name])-FIND([Full Name]," ")))-
04: FIND(RIGHT([Full Name],LEN([Full Name])-FIND([Full Name]," "))," "))
05: +", "+
06: LEFT([Full Name],FIND([Full Name]," ")-1)+
07: " "+
08: IF CONTAINS ([Full Name],".") THEN
09: MID([Full Name],FIND([Full Name]," ")+1,2)
10: ELSE
11: ""
12: END
Note: the red line numbers in front of each line are only for the explanation and do not belong to the formula of this Calculated Field.
How does it work?
It is hard to explain this formula in plain English, so I will try to dissect the formula line by line:
01: get the last part of the string
02: get the rest of the string after the first blank
03: detect the length of the substring from the first blank to the end
04: minus the position of the second blank
05: add a comma and a blank
06: extract and add the first name
07: add a blank
08: is there a middle name (a period)?
09: if yes, extract and add the middle initial
10: if no
11: add an empty string
12: end if
16 – Imperial to metric
Description: Convert a string with a size displayed in imperial notation into a number in metric system format, e.g. 6’2’’ is converted to 187.96 cm
Example: Convert [FeetInches] (the body height stored as a string in feet and inches) into cm
INT(
LEFT(REPLACE([FeetInches]," ",""),
FIND(REPLACE([FeetInches]," ",""),"'")-1))*
30.48+
INT(MID(REPLACE([FeetInches]," ",""),
FIND(REPLACE([FeetInches]," ",""),"'")+1,
FIND(REPLACE([FeetInches]," ",""),CHAR(34))-
1-
FIND(REPLACE([FeetInches]," ",""),"'")))
*2.54
How does it work?
The LEFT statement extracts the characters left to the first apostrophe, i.e. the foot. The REPLACE statements inside remove all blanks first. INT converts the string to a number and the multiplication by 30.48 converts foot to centimeter.
The next part of the formula uses MID to extract everything right to the first and left to the quotation mark (CHAR(34)), i.e. the inches. Again INT converts the string to a number and multiplying by 2.54 returns the inches in centimeter.
The sum of the results of both formula parts is the size in centimeter.
17 – Proper case first 3
Description: Convert the first 3 words of a string to proper case, i.e. first character is upper case, all other characters are lower case, e.g. “this is text” is converted to “This Is Text”
Example: Convert the first 3 words of the dimension [Text lower case] to proper case
IF FIND(LTRIM([Text lower case])," ",FIND(LTRIM([Text lower case])," ")+1)>0 THEN
UPPER(LEFT(LTRIM([Text lower case]),1))+
MID(LOWER(LTRIM([Text lower case])),2,FIND(LTRIM([Text lower case])," ")-2)+
UPPER(MID(LTRIM([Text lower case]),FIND(LTRIM([Text lower case])," "),2))+
MID(LOWER(LTRIM([Text lower case])),FIND(LTRIM([Text lower case])," ")+2,FIND(LTRIM([Text lower case])," ",FIND(LTRIM([Text lower case])," ")+1)-FIND(LTRIM([Text lower case])," ")-2)+
UPPER(MID(LTRIM([Text lower case]),FIND(LTRIM([Text lower case])," ",FIND(LTRIM([Text lower case])," ")+1),2))+
MID(LOWER(LTRIM([Text lower case])),FIND(LTRIM([Text lower case])," ",FIND(LTRIM([Text lower case])," ")+1)+2)
ELSEIF FIND(LTRIM([Text lower case])," ")>0 THEN
UPPER(LEFT(LTRIM([Text lower case]),1))+
MID(LOWER(LTRIM([Text lower case])),2,FIND(LTRIM([Text lower case])," ")-2)+
UPPER(MID(LTRIM([Text lower case]),FIND(LTRIM([Text lower case])," "),2))+
MID(LOWER(LTRIM([Text lower case])),FIND(LTRIM([Text lower case])," ")+2,999)
ELSE
UPPER(LEFT(LTRIM([Text lower case]),1))+MID(LOWER(LTRIM([Text lower case])),2)
END
How does it work?
Well, look at this formula. I am sorry, but I don’t think I have a chance to explain this in all details. Thus, only a few remarks to point you into the right direction:
The IF THEN ELSEIF ELSE clause checks how many words the string contains: IF = 3, ELSEIF = 2, ELSE = 1. The formulas inside the branches of the IF statement use UPPER and LOWER to change the case of the characters, LTRIM to delete leading blanks and MID and FIND to return the parts of the first, second and third word / character to be converted to either upper or lower case.
Analysis
18 – Count blanks
Description: Count the number of blanks in a string
Example: Count the number of blanks in the dimension [Text]
LEN([Text])-LEN(REPLACE([Text]," ",""))
How does it work?
REPLACE ([Text]," ","") creates a temporary string without blanks. The count of blanks in a string is the length of the original string minus the length of this temporary string with no blanks.
19 – Count non blanks
Description: Count the number of characters in a string which are no blanks
Example: Count the number of non blank characters in the dimension [Text]
LEN(REPLACE([Text]," ",""))
How does it work?
REPLACE ([Text]," ","") creates a temporary string without blanks. The length of this string is the number of characters which are no blanks.
20 – Count words
Description: Count the number of words in a string
Example: Count the number of words in dimension [Text]
Condition: Words are separated by exactly one blank
LEN(TRIM([Text]))-LEN(REPLACE(TRIM([Text])," ",""))+1
How does it work?
REPLACE ([Text]," ","") creates a temporary string without blanks. TRIM removes all leading or trailing blanks. On condition that the words are separated by exactly one blank, the number of words is the length of the original string minus the length of temporary string without blanks plus 1 (no blank after last word).
21 – Contains number
Description: Returns TRUE if a string contains a number
Example: Check if the field [Text w/ and w/o Numbers] contains a number
CONTAINS([Text w/ and w/o Numbers],"0") OR
CONTAINS([Text w/ and w/o Numbers],"1") OR
CONTAINS([Text w/ and w/o Numbers],"2") OR
CONTAINS([Text w/ and w/o Numbers],"3") OR
CONTAINS([Text w/ and w/o Numbers],"4") OR
CONTAINS([Text w/ and w/o Numbers],"5") OR
CONTAINS([Text w/ and w/o Numbers],"6") OR
CONTAINS([Text w/ and w/o Numbers],"7") OR
CONTAINS([Text w/ and w/o Numbers],"8") OR
CONTAINS([Text w/ and w/o Numbers],"9")
How does it work?
CONTAINS (string, substring) is called for every possible digit ("0" to "9").
IF CONTAINS returns TRUE for at least one digit, the Calculated Field returns TRUE, otherwise FALSE.
22 – First char is alphabetic
Description: Returns TRUE, if the first character of a string is alphabetic (“a” to “z” or “A” to “Z”)
Example: Check if the first character of [Text w/ and w/o Numbers] is alphabetic
(LTRIM([Text w/ and w/o Numbers]) >= "A" AND
LTRIM([Text w/ and w/o Numbers]) <= "Z")
OR
(LTRIM([Text w/ and w/o Numbers]) >= "a" AND
LTRIM([Text w/ and w/o Numbers]) <= "z")
How does it work?
LTRIM(string) removes leading blanks and returns the first character in the string.
Boolean expressions (greater than and less than) combined by AND and OR operators check if the first character is inside the character ranges (upper and lower cases) and return TRUE if this is the case and otherwise FALSE.
23 – First char is numeric
Description: Returns TRUE, if the first character of a string is numeric (0 to 9)
Example: Check if the first character of [Text w/ and w/o Numbers] is numeric
ASCII(LTRIM([Text w/ and w/o Numbers])) >= 48 AND
ASCII(LTRIM([Text w/ and w/o Numbers])) <= 57
How does it work?
Check if the first character is inside the range of digits "0" to "9" in the ASCI code (48 to 57).
24 – Count substrings
Description: Count occurrences of a specified substring in a string
Example: Count occurrences of parameter [STR Parameter 2] in the dimension [Text]
(LEN([Text])-LEN(REPLACE([Text],[STR Parameter 2],""))) /
LEN([STR Parameter 2])
How does it work?
REPLACE ([Text],[STR Parameter 2],"") creates a temporary copy of the string and deletes all occurrences of [STR Parameter 2] in this copy. The length of the string minus the length of the temporary copy without the specified substring returns the number of characters of all substrings ([STR Parameter 2] in the string. To get to the occurrences of the substring in the string, you have to divide this by the length of the substring.
Example: let’s say our search string is “abc” and it occurs four times in the string. The REPLACE statement deletes “abc” from the string four times, i.e. it deletes 12 characters in total and the numerator of the division (first line) returns 12. If we divide this by the length of the search string (3), we get the correct count of occurrences of “abc” in the string: 4.
25 – Count comma separated items
Description: Count the number of substrings in a string separated by a comma
Example: Count the number of substrings in the dimension [Text] which are separated by a comma
LEN([Text])-LEN(REPLACE([Text],",",""))+1
How does it work?
See explanation of Calculated Field [20 – Count words]. It is the same formula, just replacing the commas instead of the blanks.
26 – Find first number
Description: Find the position of the first number in a string. Return 0, if the string does not contain a number
Example: Find the position of the first number in [Text w/ and w/o Numbers]
IF [21-Contains number] THEN
MIN(MIN(MIN(MIN(MIN(MIN(MIN(MIN(MIN(
FIND([Text w/ and w/o Numbers]+"0123456789","0"),
FIND([Text w/ and w/o Numbers]+"0123456789","1")),
FIND([Text w/ and w/o Numbers]+"0123456789","2")),
FIND([Text w/ and w/o Numbers]+"0123456789","3")),
FIND([Text w/ and w/o Numbers]+"0123456789","4")),
FIND([Text w/ and w/o Numbers]+"0123456789","5")),
FIND([Text w/ and w/o Numbers]+"0123456789","6")),
FIND([Text w/ and w/o Numbers]+"0123456789","7")),
FIND([Text w/ and w/o Numbers]+"0123456789","8")),
FIND([Text w/ and w/o Numbers]+"0123456789","9"))
ELSE
0
END
How does it work?
Check first if the string contains a number in the IF clause, using the Calculated Field [21-Contains Number]. If the string does contain a number, create a temporary string by adding a hardcoded substring with all digits ("0123456789") at the end of the string. Search for the position of each digit ("0", "1", etc.) in this temporary string. The minimum of all positions in the extended string is the position of the first digit in the original string.
Caution: this is a very complex calculation and may seriously hit the performance of your Tableau workbook. Use this with caution, i.e. only if you really need it.
27 – Nth occurrence
Description: Find the position of the nth occurrence of a substring within a string.
E.g. the second occurrence of “one” in “one formula is one formula is one formula” is 16, the third occurrence of “one” is 31.
Example: Find the [INT Parameter 2]th occurrence of substring [STR Parameter 2] in the dimension [Text].
Limitation: The formula works only up to the fifth occurrence.
IF [INT Parameter 2] > 5 THEN
9999
ELSEIF (LEN([Text])-LEN(REPLACE([Text],[STR Parameter 2],"")))/LEN([STR Parameter 2]) < [INT Parameter 2] THEN
0
ELSE
CASE [INT Parameter 2]
WHEN 1 THEN FIND([Text], [STR Parameter 2])
WHEN 2 THEN FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2]) + 1)
WHEN 3 THEN FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2]) +1) +1)
WHEN 4 THEN FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2]) +1) +1) +1)
WHEN 5 THEN FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2]) +1) +1) +1) + 1)
END
END
How does it work?
The IF clause first checks, if the maximum of n (=5) isn’t exceeded. If it is, the formula returns 9999. The ELSEIF part of the IF clause checks if the string contains the substring at all. If it doesn’t,the formula returns 0.
If n is less or equal than 5 and the string contains the search string (ELSE part), the position of the nth occurrence is calculated by nested FIND functions in a CASE WHEN statement. If n = 1, FIND looks from the start of the string. If n = 2, FIND looks from the character after the first occurrence of the search string, etc.
A brief aside:
This formula is limited to a maximum of the 5th occurrence and nested FINDs are everything else than an elegant solution. If you ever tried to solve this in Microsoft Excel, you probably know that there is a very elegant option in Excel for finding the nth occurrence by using a combination of FIND and SUBSTITUTE. The equivalent of SUBSTITUTE in Tableau is REPLACE. However, you cannot rebuild the Excel solution in Tableau, because unlike Excel’s SUBSTITUTE, Tableau’s REPLACE does not allow to specify the instance, i.e. which occurrence of the substring shall be replaced. Therefore – unfortunately – I do not have a better way to do this in Tableau. All I am having is this “monster” above.
Extraction
28 – Extract first character
Description: Extract the first character of a string
Example: Extract the initials from [GivenName] and [Surname] and concatenate them to a string
LEFT([GivenName],1) + LEFT([Surname],1)
How does it work?
LEFT(string, 1) returns the first character of a string.
29 – Extract first word
Description: Extract the first word of a string
Example: Extract the street number from [StreetAddress]: all characters left to the first space in the string
LEFT(LTRIM([StreetAddress]), FIND(LTRIM([StreetAddress]+" ")," ") -1)
How does it work?
LTRIM(string) removes leading blanks, FIND(LTRIM([StreetAddress]+" ")," ") returns the position of the first blank and LEFT(string, n) returns the first n characters of the string, i.e. all characters left to the first blank. A blank is added to the first parameter of the FIND statement to ensure the calculation works, if the string contains only one word.
30 – Extract first 2 words
Description: Extract the first two words of a string
Example: Extract the first two words from [Text]: all characters left to the second occurrence of a blank (excluding leading blanks)
LEFT(TRIM([Text]),
FIND(TRIM([Text])+" "," ",
FIND(TRIM([Text])+" ", " ")+1)-1)
How does it work?
TRIM(string) removes leading and trailing blanks. The first FIND statement looks for a blank starting one character right to the position of the first blank (second FIND). Three blanks are added to the trimmed string to ensure a blank will be found even if there are only one or two words.
31 – Extract last part
Description: Extract all text of a string after the first word
Example: Extract street name from [StreetAddress]: all characters right to space after the first word, i.e. after the street number
RIGHT([StreetAddress], LEN([StreetAddress]) - FIND([StreetAddress]," "))
How does it work?
The length of the string minus the first occurrence of a space = remaining length of the string after the first word, i.e. the number of characters to be extracted by using the RIGHT function.
32 – Extract mid part fixed
Description: Extract a substring from somewhere in a string if the position and the length of the substring are fix
Example: Extract the the bank identification code from an IBAN (8 characters from position 5 on) and insert blanks to make it more readable
MID([IBAN],5,3) +" " +
MID([IBAN],8,3) +" " +
MID([IBAN],11,2)
How does it work?
The bank identification code in an IBAN starts at position 5 and is 8 characters long. MID(string, start, [length]) extract from string “length” characters starting at character “start”. Blanks are inserted using string concatenation to make the identification code more readable.
33 – Extract mid part variable
Description: Extract a substring from a string (variable position, fixed length)
Example: Extract the middle initial from the full name (2 characters after the first blank in the full name)
Condition: The middle initial has exactly 2 characters, i.e. initial and a period
IF CONTAINS ([Full Name], ".") THEN
MID([Full Name], FIND([Full Name], " ") + 1, 2)
ELSE
""
END
How does it work?
The IF clause and CONTAINS(string, substring) checks if there is a period in the string, i.e. [Full Name] contains a middle initial. MID returns the initial and the period (2 characters) from the character right to the position where the first blank was found (FIND function).
34 – Extract 2 similar delimiters
Description: Extract a substring between 2 given, similar delimiters from a string
Example: Extract the short form of a URL from a full URL, i.e.get “www.clearlyandsimply.com” from “https://www.clearlyandsimply.com/about.html/”.
LEFT(
MID([Website],
FIND([Website],"://")+3),
FIND(MID([Website],FIND([Website],"://")+3),"/")-1)
How does it work?
The first MID statement extracts everything (optional parameter [length] of MID is omitted) right to the third character after the first occurrence of “://”, i.e. everything from “www…” on (first FIND statement). The second FIND looks for the next slash in the remaining string. LEFT starts at the result of the first FIND statement and returns the characters from this character to the character returned by the second statement.
35 – Extract 2 different delimiters
Description: Extract a substring between 2 given different delimiters from a string
Example: Extract the last name from an email address in a format [email protected]
MID(LEFT([E-Mail Address],FIND([E-Mail Address],"@")-1),
FIND([E-Mail Address],".")+1)
How does it work?
The LEFT statement returns the substring left to the "at" sign (@). MID starts extracting at the position of the period in this substring returned by LEFT and returns everything after the period (optional parameter [length] of MID is omitted).
36 – Extract substring in parentheses
Description: Extract the first substring in parentheses, i.e. "(....)"
Example: Extract the first substring in parentheses found in the field [Text with Parentheses]
IF FIND([Text with Parentheses],"(") > 0 THEN
MID([Text with Parentheses],
FIND([Text with Parentheses],"(")+1,
FIND([Text with Parentheses],")")-FIND([Text with Parentheses],"(")-1)
ELSE
""
END
How does it work?
The IF clause checks, if the string contains a left parenthesis (first FIND). MID extracts the substring starting one position right to the first occurrence of the left parenthesis (second FIND). The number of characters to be extracted is the position of the right parenthesis (third FIND) minus the position of the first left parenthesis (fourth FIND) minus 1.
37 – Extract first word after specified word
Description: Extract the first word from a string after a specified word
Example: Extract the first word in [Text] after the first occurrence specified in parameter [STR Parameter 2]
IF FIND(UPPER(" "+[Text])+" ",UPPER(" "+[STR Parameter 2])+" ") > 0 THEN
MID(" " +[Text]+" ",
FIND(UPPER(" " +[Text]+" "),UPPER(" "+[STR Parameter 2]+" "))+LEN([STR Parameter 2])+2,
FIND(
MID(" " +[Text]+" ",
FIND(UPPER(" " +[Text]+" "),UPPER(" "+[STR Parameter 2]+" "))+LEN([STR Parameter 2])+2)," ")-1)
ELSE
""
END
How does it work?
The IF clause checks if the string contains the specified word ([STR Parameter 2], first FIND statement)). MID starts to extract two positions right to the specified word (second FIND): the position of the first character of the specified word plus the length of the specified word plus 2 (blank right to it). The third FIND statement looks in the remaining string (i.e. the text after the specified word and the following blank) for the next blank, i.e. for the end of the next word after the specified word. Subtracting 1 cuts off the blank behind the searched word. This result is the number of characters, the first MID function will return. i.e. the first word after the specified word.
Adding blanks at the beginning and the end of the string and the search string ensures that only entire words are found (i.e. not parts of a word). UPPER functions make the search case insensitive
38 – Extract last word
Description: Extract the last word from a string
Example: Extract the surname (last word) from the dimension [Full Name]
Condition: Maximum 2 blanks in the full name, i.e. the formula does not work for strings with more than 3 words.
RIGHT(
RIGHT([Full Name],LEN([Full Name])-FIND([Full Name]," ")),
LEN(RIGHT([Full Name],LEN([Full Name])-FIND([Full Name]," ")))-
FIND(RIGHT([Full Name],LEN([Full Name])-FIND([Full Name]," "))," "))
How does it work?
The second RIGHT statement returns the rest of the string after the first blank. This substring is passed to the first RIGHT statement as the string. The number of characters to be returned is the length of this substring minus the position of the first blank in this substring, i.e. the position of the second blank in the original string.
39 – Extract first n words
Description: Extract the first n word from a string
Example: Extract the n (parameter [INT Parameter 2] first words from the dimension [Text]
Condition: n is limited to a maximum of 5
IF [INT Parameter 2] > 5 THEN
"n/a"
ELSEIF (LEN([Text])-LEN(REPLACE([Text]," ","")))/LEN(" ") < [INT Parameter 2] THEN
"n/a"
ELSE
CASE [INT Parameter 2]
WHEN 0 THEN "n/a"
WHEN 1 THEN LEFT([Text], FIND([Text], " ")-1)
WHEN 2 THEN LEFT([Text], FIND([Text], " ", FIND([Text], " ") + 1)-1)
WHEN 3 THEN
LEFT([Text], FIND([Text], " ", FIND([Text], " ", FIND([Text], " ") +1) +1)-1)
WHEN 4 THEN
LEFT([Text], FIND([Text], " ", FIND([Text], " ", FIND([Text], " ", FIND([Text], " ") +1) +1) +1)-1)
WHEN 5 THEN
LEFT([Text], FIND([Text], " " , FIND([Text], " ", FIND([Text], " ", FIND([Text], " ", FIND([Text], " ") +1) +1) +1) + 1)-1)
END
END
How does it work?
The IF clause first checks, if the maximum of n (=5) isn’t exceeded. If it is, the formula returns “n/a”. The ELSEIF part of the IF clause checks if the string contains at least n words. If it doesn’t,the formula returns “n/a”.
If n is less or equal than 5 and the string consists of n or more words (ELSE part), the first n words are extracted using LEFT and nested FIND functions. The formula uses the same logic as the Calculated Field [27 – Nth occurrence] (see above).
40 – Extract date from string
Description: Extract the first date in a string and convert it to a date
Example: Extract the first date found in the field [Text with a Date]
Conditions: The date in the string is in format MM/DD/YYYY. The formula does not work if a slash is in the string left to the date and it will always extract the first date in the string
DATE(MID([Text with a Date],FIND([Text with a Date],"/") -2,10))
How does it work?
The extraction of the date string uses MID starting 2 characters left to the position of the first slash ("MM/") and returns the next 10 characters (“MM/DD/YYYY”). DATE converts the result substring to a date.
41 – Remove all spaces
Description: Remove all blanks in a text
Example: Remove all blanks from field [Text with lots of spaces]
REPLACE([Text with lots of spaces]," ","")
How does it work?
REPLACE function replaces all blanks by nothing ("").
42 – Trim spaces
Description: Clean all unnecessary blanks from a string, i.e. replace 2 or 3 or 4, etc. consecutive blanks by one blank
Example: Clean the field [Text with lots of spaces] from all unnecessary blanks
REPLACE(
REPLACE(
REPLACE(
REPLACE(
TRIM([Text with lots of spaces]),
" "," "),
" "," "),
" "," "),
" "," ")
How does it work?
Nested REPLACE statements replace all unnecessary blanks. Replace 5 consecutive blanks first, then 4 consecutive blanks, etc. The TRIM function inside the nested REPLACE statements removes all leading and trailing spaces first.
Disclaimer: The solution eliminates blank substrings up to 38 spaces. Starting the REPLACEs with 6 spaces will work for substrings up to 208 blanks.
Alternative solution
If you know for sure that a certain character (e.g. a pipe "|") is not part of the string, you can also use this formula:
REPLACE(
REPLACE(
REPLACE(
TRIM([Text with lots of spaces])," "," "+"|"),"|"+" ",""),"|","")
43 – Remove line feeds
Description: Remove the line feeds from a string
Example: Remove the line feeds from the field [Name with non-printable chars]
REPLACE([Name with non-printable chars],"
"," ")
How does it work?
REPLACE(string, substring, replacement) replaces the line feed by a blank.
Note: REPLACE([Name with non-printable chars], CHAR(10)," ") does not work in Tableau!You have to insert the line feed in the string in the Calculated Field editor between the quotation marks, i.e. the line feed shown in the formula above is essential.
44 – Extract first number
Description: Extract the first number in a string
Example: Extract the first number found in the dimension [Text w/ and w/o Numbers] and return 0 if the field does not contain a number
Condition: Numbers and words are separated by a blank, i.e. “lorem 123 ipsum” will work, but “lorem123 ipsum” and “lorem 123ipsum” will not work
IF [26-Find first number] > 0 THEN
FLOAT(
MID(
[Text w/ and w/o Numbers],
[26-Find first number],
IF FIND([Text w/ and w/o Numbers]," ", [26-Find first number]) = 0 THEN
999
ELSE
FIND([Text w/ and w/o Numbers]," ", [26-Find first number])-[26-Find first number]+1
END))
ELSE
0
END
How does it work?
The formula is based on the Calculated Field [26-Find first number].
The outer IF clause checks if there is a number in the string. If not, the formula returns 0 (ELSE). If there is a number, the MID statement extracts a substring starting at the result of “26-Find first number”. The inner IF clause checks if there is a blank following after the starting position of MID. If there isn’t a blank in the rest of the string, all remaining characters to the end are extracted (999). Otherwise all characters from the starting point to one position left to the next blank are returned. FLOAT finally converts the extracted substring to a number.
Caution: this is a very very complex calculation and will definitely hit the performance of your Tableau workbook. Use this only if you really need it.
Download Links
The Tableau packaged workbook for free download on Tableau Public:
String Calculations in Tableau (on Tableau Public)
Finally, if you are interested, here is the Microsoft Excel workbook used as the data source:
Download String Example Data (79.5K)
Stay tuned.
Update on Saturday, 28th of June, 2014
Leonid Koyfman (again!) was kind enough to thoroughly review the entire article and all formulas. He found a couple of bugs in my original Calculated Fields and provided fixes for all of them as well as some great improvements to some other formulas.
I updated the post and the workbook. Since Leonid’s suggestions affected more than a dozen formulas, I refrained from explaining the bugs, fixes and improvements. I simply replaced my formulas and explanations by Leonid’s.
Many thanks again, Leonid!