Build Network Graphs in Tableau

Visualize Relationships, Connections and Associations in Networks with Tableau Software

Network Graph TableauClearly and Simply proudly presents a new guest article: Michael Martin of Business Information Arts, Tableau Partner, Tableau Certified Consultant and leader of the Toronto Tableau User Group shows us how to visualize Network Graphs using Tableau Software. Enjoy.

Network Graphs can help us see and measure relationships and connections between people, places, and things over time. This can be expressed as identifying, measuring and understanding process flows, the mix of products in shopping carts, social network and email traffic, affinities and interests people share (or don’t share), and the “hierarchies of influence” in business and / or social systems by identifying who or what triggers events, and the impacts they have on others.

Today’s post describes how you can build Network Graphs using Tableau Software versions 6 or 7, including a detailed how-to tutorial and some information on the background of Network Theory.

What is a Network Graph? A Picture says more than 1,000 words

What are Network Graphs for? Here are just a few practical examples:

Contacts Between Philanthropic Twitter Users

Contacts Between Philanthropic Twitter Users - click to enlarge

The Organization of Hierarchical Communities

The Organization of Hierarchical Communities - click to enlargeThe Path to Products People Buy

The Path to Products People Buy - click to enlargeLast, but not least a Network Graph built in Tableau:

Association of Food Groups, Brands and Flavors

Association of Food Groups, Brands and Flavors - click to enlargeTableau’s Out of the Box Network Graphs

Tableau Desktop is one of my favorite data analysis and reporting tools. Other excellent products such as Visokio Omniscope support network graphs as one of a wide number of supported view types. But what I have always found so impressive about how Tableau is engineered is how various “loosely coupled” features can be re-assembled to create new ones. Examples of this include double axis graphs, bullet charts, and the support for bubble graphs and tree maps in the upcoming Tableau 8 release (Q1 of 2013). Tableau is a fabulous “Swiss Army Knife” for visualizing data.

Build Network Graphs with Tableau – The How to

My implementation of network graphs in Tableau leverages features that have been around since version 1, the circle and the line mark types, and support for scatter plots – and ability to draw double axis graphs (hackable for years before being officially supported as “combo charts” in version 6). With a little bit of data preparation, this is all you need to draw a network graph in Tableau.

For me, the fun really starts when other great Tableau functionalities (actions, parameters, page field animation, filtering, highlighting, size by, color by, table calculations to name just a few) are brought into play.

Data Preparation

Key to my implementation is data preparation given the requirement to connect elements in the form of a transaction and lay out the design of the network graph in the Tableau view:

Data Preparation - click to enlargeIf you want to follow the step-by-step below using my example data, here is the Excel workbook with the data for free download:

Download Network Graphs Example Data (Microsoft Excel 2007/2010, 14.3K)

Demo – A simple Network Graph

Here’s a simple network graph based on the example data.

Simple Network Graph - click to enlargeThe Step-by-Step Tutorial

Step 1 – The Basic Set Up

To get started, put the “Line Y” field to the Row Shelf and “Line X” field on the Column shelf. Tableau will automatically set the mark type to circle and render a basic scatterplot. The “Line Y” and “Line X” co-ordinates in the source data are visible via the field value headers.

Step 1 – The Basic Set Up - click to enlargeStep 2 – Dual Axis

Add the “Circle Y” to View on the rows shelf as a double axis, and synchronize the two Y axes (right click on the axis, and click on “Synchronize axis”).

Step 2 – Dual Axis - click to enlargeStep 3 – Multiple Mark Types

The next step is to format the Tableau Marks Card to show "Multiple Mark Types“. Then cycle to the "Circle Y” mark and set mark type as "Pie“. Drag the "Node Name" field to the Label Pill. You can optionally color the “Node Name” field by ID by dragging ID to Level of Detail Shelf Resize the Pie mark to make larger – each pie slice represents a Transaction ID.

Step 3 – Multiple Mark Types - click to enlargeStep 4 – Connect the Dots

Then cycle to the "Line Y" Mark. Drag the “ID” and “Relationship” fields to Level of Detail shelf. Set the mark type to “Line”. Tableau will connect the dots – and you have a simple Network Graph. Resize the Line Y series to make lines thinner and color the lines as desired.

Step 4 – Connect the Dots - click to enlargeOptionally, you can format the canvas to include grid lines and turn brushing on in the Color Legend. Uncheck the “Show Header” in the “Line X”, “Line Y”, and “Circle Y” fields in the row and column shelves.

Step 5 – The Tooltips

If you hover over nodes in the view with the mouse, you’ll see Tableau generated tooltip text:

Step 5 – The Tooltips - click to enlargeWe can do a few things to make the Tooltips more meaningful. With the “Line Y” Mark selected in the Marks Card, place the “Relationship” and “Sales vs Sales For Display” fields in the Level of Detail Shelf.

Step 5 - Level of Detail on different Marks - click to enlargeThe “Sales For Display” field is a calculated field that I will describe shortly.

Then cycle to the “Circle Y” field in the Marks Card and place the “Total Sales”, “InDegree”, “OutDegree”, “Node Name”, and “ID” fields in the Level of Detail Shelf.

Step 5 - Improved Tooltips - click to enlargeStep 6 – Calculated Fields and the Tooltips again

The next step is to define a simple calculation which I named “Sales vs Sales For Display” in my workbook:

IF Sum([Sales])=0 THEN Sum([Sales For Display]) ELSE Sum([Sales]) END

The output of this calculation is the value of either the “Sales” or the “Sales For Display” data fields associated with a single transaction. My implementation needs this calculation because without this calculation the value of the “Sales” field will change to zero (or change from zero to the value of the sale) as you pass the halfway point between two connected nodes when traversing the line between connected nodes with the mouse.

If you take another look at the source data, you’ll see that the value of the “Sales For Display” field is the same as the value for “Sales” in the previous row for a given transaction ID. As Tableau aggregates the “Sales” and “Sales For Display” metrics by Transaction ID, the value of the calculation will change as you pass the halfway point in the line connecting the nodes in the transaction with the mouse.

Then edit the tooltip text is shown in the figure below.

Step 6 – Calculated Fields and the Tooltips again - click to enlargeThe first two lines will appear when the mouse hovers over a line connecting two nodes (the “Line Y” field). The remaining four lines will display when the mouse hovers over a Node (the “Circle Y” field).

Step 7 (optional) – A Summary Table

You can optionally make a summary sales table that sums the “Sales” field by the “Node Name” field which includes creating a calculation named “Sales Label” that suppresses the display of zero values in the “Sales Field”.

Step 7 (optional) – A Summary Table - click to enlargeIf you look at the source data, you’ll see that the “Relationship” field is encoded to show who the seller and buyer were. The value Ken à Bill describes a transaction where Ken was the seller and Bill was the buyer. Ken is listed as the “Initiating Person” and Bill is listed as the “Secondary Person”. The “Direction” field explains this in another way; from Ken’s point of view as the “Initiating Person”, this is an “Out Degree” connection. From Bill’s point of view as the buyer, this is an “In Degree” connection.

Step 8 (optional) – Filter Actions

You can optionally define a Tableau Filter Action to filter data that will appear in the “Summary Sales Table” based in which transactions in the view are selected with the mouse. In my implementation, the Action is set to run “On Select” based on the values of the “ID” and “Node Name” data fields.

For more information on how to use Actions in Tableau, have a look at this how-to tutorial: The Power of Tableau Actions.

Step 9 (optional) – Animated View

You can optionally animate the view by dropping the “ID” field into the Pages Shelf and inserting the Pages Shelf into the Dashboard by selecting “Current Page”.

After you start the Page Player, transactions will come into the view sorted by the Transaction ID number. With the use of calculations and “Page History” settings, you can create very interesting animated views of transaction oriented data.

The Result

Here is the example packaged Tableau workbook for free
download:

Download
Prototype Scene Graph (Tableau 7 Packaged Workbook, 62.3K)

Network Metrics

The “Network Density” metric is commonly calculated as the number of actual possible connections divided by the number of possible connections. There are 9 actual connections and 56 possible connections in the example data, resulting in a Network Density value of .1607 which depending on the context could be considered to be low or high.

The “Network Centralization” metric tells us how “centered” the network is around the member(s) of the network with the highest number of connections. In a network with three members, this metric is of little value – but in a network with thousands or millions of connections, knowing the people or persons the network is centralized around is meaningful to our understanding of the network. In the data driving my implementation, Jane is involved in four of the nine transactions which would be commonly calculated as (4 / 9) = .444. This would be considered a high value in most cases, so you could say that the total network is highly centralized (around Jane).

The “Network Homophily” metric describes the degree that connected nodes share similar characteristics – i.e. are connected nodes largely alike? The richer the source data is, the more important and interesting this metric can be as the row count increases. This metric is of particular interest to marketers.

Switching to Node specific metrics; the “In Degree” metric is the count of in-coming connections to a Node from other nodes in the network. The “Out Degree” metric is the count of outgoing connections from a single node to other nodes in the network. These two metrics are often used to help analysts and marketers understand how “social” products within particular retail categories are with products in similar or different retail categories.

The “Betweeness” metric helps us understand how important a particular node is to the overall “performance” of the network from the perspective of a particular metric or class of metrics. The example data describes connections through “Sales”. If Sally and Roger had made huge sales to each other or to Jane, removing Jane from the network would lower the “total value” of the network because Roger and Sally are in the network by virtue of their relationships to Jane.

The “Closeness” metric helps us understand how useful a given network member is for getting a message from outside the network circulated within the network as soon as possible. If an outside person wanted to circulate a message within the network described in the example data, the go-to person is Jane because she is directly connected (one hop away) to five other network members, who in turn are a hop away from the remaining network members (Roger and Ken).

Although the “Betweenness” and “Closeness” metrics are important, they don’t necessarily predict the ranking of members in a network by the governing metric (in this example, sales). The top seller in the network is Wayne by virtue of a 20k sale to Marjory. If you size the “Node ID” field by “Sales”, you would immediately realize how important Wayne is to the network from a sales performance point of view.

The “Eigenvector Centrality” metric explains the degree to which a given node is connected to the most important node in the network. In a given network, an “introverted” member with low “in degree” and “out degree” metrics and has little or no “betweenness” or “closeness” could in fact be quite important due to its influence on members who are very well connected. If Jane is heavily influenced by Sally’s purchasing recommendations, Sally’s role in shaping the profile of the network is important given Jane’s position in the network as the most important buyer in the network.

Recommended Further Resources

There are many great resources on and off the web for learning about network theory and metrics. Here are a few that I’ve found helpful, with apologies to other great resources that I haven’t encountered yet.

University of Maryland Human / Computer Interaction Lab contains links to many interesting data visualization projects and white papers related to network data visualizations.

Node XL is an Excel add-in for visualizing network graphs.

Analyzing Social Media Networks With NodeXL by Derek Hansen, Ben Schneiderman and Mark Smith, published by Morgan Kaufman.

Gephi is an open source tool for visualizing network graphs.

Aaron Koblin provides a great visualization of airline flight patterns over North America.

About Michael

Michael Martin (email Michael) works internationally in a variety of business sectors that include market research, consumer packaged goods and retail, banking, hospitality, commercial construction, entertainment, governmental, and non-profit.

His project deliverables include business performance forecasts, strategic and operational case study white papers, operational dashboards and scorecards, associative and neural networks, customer / product segmentation and market-basket analyses.

Michael is a Tableau Partner, a Certified Tableau Consultant and leads the Tableau Toronto User Group.

Robert’s Note

A big time thank you very much to Michael for contributing this fantastic article. If you enjoyed what you have read, please drop Michael a line to say thank you by email (email link see above) or leave him a comment here.

Stay tuned.

Comments

43 responses to “Build Network Graphs in Tableau”

  1. Data Visualization Avatar

    Thanks for excellent article! I am wondering if you can share .TWBX file and/or link to Tableau Public with visualization you described above?
    Andrei

  2. Robert Avatar

    Andrei,
    thanks for your comment. Unfortunately the layout of my blog is not wide for publishing Michael’s dashboard as an interactive Tableau Public version here.
    However, I added a link to download the packaged workbook at the end of Michael’s how-to section (“The Result”, right below “Step 9”). I hope this will be helpful.

  3. Joe Mako Avatar

    Nicely done Michael!
    Here is another type of node chart in Tableau Public:
    http://public.tableausoftware.com/views/handoff-map/Dashboard
    and some details on where it came it from:
    http://community.tableausoftware.com/thread/111665
    It uses calculations to place the nodes.

  4. Steve Scivally Avatar
    Steve Scivally

    Thanks for the great article and work to explain how to achieve a network graph in Tableau. One question, do you know if there is a way to put the label on the mid-point of the line?

  5. Michael Martin Avatar

    Hi Steve: Glad you enjoyed the article. Re: your question about getting the label describing a transaction to appear at the mid point to appear at the mid point (or some other defined co-ordinate) between the nodes – I think this would involve configuring a third node (with its own X Y co-ordinates) to lay at the midpoint (or somewhere else) between two nodes that define a transaction. You would then put the value of the “Relationship” field in the “Node Name” field. This solution would probably also involve using shape files (sized by another special metric) so that the size of the new “relationship node” (mid way between the two nodes in a transaction) would be much much smaller than the two nodes that define the transaction. As is often the case, you can usually get Tableau to do something given extra data preparation. Bear in mind that the larger the number of connections, more screen real estate will be needed to display the extra labels. Best wishes, Michael

  6. Michael Martin Avatar

    As per my earlier post above, you wouldn’t need to use shape files as long as you had another metric you could size nodes by when you plot the x y co-ordinates for the “relationship” label for a given transaction. The two nodes that define the transaction would have a much higher value in this ‘size by” field than the node placed between the two “transaction nodes” that describes the transaction – thus ensuring that the “relationship label” node is either very tiny or functionally invisible. As mentioned in the last post, you would place the value of the “Relationship” field in the “Node Name” field for the row in the data source that describes the transaction (i.e. Wayne –> Mary 14k)

  7. Marko Avatar
    Marko

    Robert
    This is amazing. Would you be able to minic this in Excel? I do routing for my truck drivers, so it would be great to utilize this feature in Excel for their daily routes?
    Apprecite it.

  8. Mobile Apps for Business Avatar

    Any mobile application you know which can use the same to show interactive view of this map.

  9. Michael Martin Avatar

    You could take a look at Tableau Mobile (for iPad and Android) from Tableau Software which requires an installation of Tableau Server to serve Web Pages. Alternatively, if you have Tableau Server, you could use a mobile browser (Safari, Chrome, etc.) to browse the html that Tableau Server generates instead of Tableau Mobile.
    Cheers, Michael Martin

  10. Stefan Avatar
    Stefan

    Hello Robert,
    You can do this as well in Excel with this wonderful add-in:
    http://nodexl.codeplex.com/
    best regards,
    Stefan

  11. Robert Avatar

    Marko,
    thanks for your comment and sorry for the late reply.
    Sure, you can bluff most of this in Excel with some data set up and (tweaked) Excel standard charts. However, the easiest way of creating network graphs in Excel is using the free add-in NodeXL Michael mentions in the “Recommended Further Resources” section of the article (link see Michael’s article above).

  12. Robert Avatar

    Stefan,
    thanks for your comment.
    I fully agree, NodeXL is a great Excel Add-In to analyze and visualize network data. Michael also mentioned it in the “Recommended Further Resources” section of the article. Anyway, thanks for the hint.

  13. Network Management Software Avatar

    Excellent post and wonderful blog, I really like this type of interesting articles keep it up.

  14. Marko Avatar
    Marko

    Robert!
    Huge fan of your work/site.
    No update for the past few months….any chance you will be writing any items (hopefully Excel related :D)
    all the best,
    Marko

  15. Robert Avatar

    Marko,
    I know, my blog activities slowed down to a crawl in the past few months. Same old lame excuse: heavy workload in my paid projects. I am hoping to revive the blog during the next few weeks. I can’t promise it will be Excel related posts, though. I hope you and all other regular readers will stay tuned.

  16. Andrew Avatar
    Andrew

    How did you determine what values went in your Line X, Line Y, and Circle Y fields?

  17. Michael Martin Avatar

    Hi Andrew: The “Line x / Line Y” fields would typically be programmed output based upon the relationships between the items in a view that you want to emphasize. In the case of the example I provided, I determined them manually, based on the relationship I wanted to highlight – which was Jane’s central role in the Network. I used a 1-10k scale along the X and Y axis, but it could have just as easily been, 1-10, 1-100, or 1-1000. There are a variety of ways to generate scene graphs or you can write one of your own. There are some great resources re: this at http://www.codeplex.nodexl.com. Also a Google earch under “code to render network scene graphs” leads to other rexources. Best wishes, Michael Martin

  18. Kevin Avatar
    Kevin

    Looking to build something like the “The Organization of Hierarchical Communities” chart you show above. Do you have a tableau twbx file for that? or directions on how to?

  19. Michael Martin Avatar

    Hi Kevin: Sorry, Kevin, I don’t have a pre-baked Tableau workbook that will automatically lay out a network graph for you as described. Building the type of network graph you have in mind involves laying out the data as nodes in X/Y co-ordinate space. To start, I would suggest that you download and experiment with the sample data I provided to build the network map in the article – and then change the values of the “Line X”, “Line Y”, and “Circle Y” fields, refresh the Tableau view, and note what happens. You’ll see the pattern right away. As I mentioned in my answer to Andrew directly above, you can use whatever X/Y numeric scale you want, as long as the scale for the X and Y axis is the same. For example, if you use a 1-10,000 scale, and give the “Line X”, “Line Y” and “Circle Y” fields the value 5,000, a node would appear in the middle of the Tableau canvas. If you have many hundreds of nodes, manually assigning a set of co-ordinates to every node is time consuming (it can be cone of course) – but here is wheere you need some help from software. I would suggest that you take a good look at NodeXL (an add-in for Excel) at the codeplex.com web site, as they provide a environment you could use to layout the graph you have in mind manually and capture the X/Y co-ordinates – plus they provide a programming interface for the automated creation of network graphs. This would be a good place to start given hopw much data lives within Excel. Hope this helps, and best wishes. Michael Martin

  20. rishimaths@gmail.com Avatar
    rishimaths@gmail.com

    Hi
    What is LINE X , LINE Y & Circle y value?
    Is it to and from data? Or
    Plz tell me how to calculate this value?
    Regards
    R

  21. Robert Avatar

    R,
    LineX, LineY and CircleY are measures in the data source. Have a look at the Excel workbook provided in the article (in the section Data Preparation).
    Please have also a look at Michael’s comment right above yours.

  22. David Avatar
    David

    Hi: Very nice work. With regard to the LineX, LineY etc. discussion just above, it appears to me that you need to know ahead of time what you want to highlight — it is based on that “bias” towards the data that one builds the network. Do you suppose it is possible to create a viz that enabled a more ‘exploratory’ map, i.e., without a preconceived notion of what the network is or how one node affects another? thx. David

  23. Michael Martin Avatar

    Hi David:
    If I understand you correctly, you’re asking about the possibility of creating an “unsupervised” environment (in Tableau) where the goal is not to create a “structured” presentation with of one or more findings as an outputs to an experiment – but rather a viz in which the nodes are more or less randomly “laid out” in a diagram for further exploration and (perhaps) manipulation.
    As we know, Tableau is a “read only” container for visualizing data, usually with a clear cut goal in mind (show me monthly sales, and the profit margin for the last 5 years). That said, one could build a view controlled by threshold settings and calculations by which random inputs are cyclically generated – that in turn could seed other calculations that would provide a variety of visual perspectives of a collection of nodes that could be combined in one or more dashboards via sizing, colouring, the use of text, parameters. Filtering could be used to serve as threshold settings under which the range of values feeding the calculations are constrained, if desired.
    One approach could be a Bubble Chart driven by Page Fields (which would cycle through one or more series of dimensional attributes, (including the X / Y position attributes) to provide animation – which could emulate changes of state in response to input.
    Best wishes, Michael Martin

  24. Dh Avatar
    Dh

    Hi,
    Thanks for providing assistance in this.I am still wondering how you assign Line x, Line y values to each individual rows.And what is the role of those columns in this solution. C

  25. Robert Avatar

    Dh,
    have you seen Michael’s replies to Andrew’s and Kevin’s comments above? I think he answers your question there.

  26. asa letourneau Avatar
    asa letourneau

    Just wondering where the x and y values come from?
    Thanks,
    Asa

  27. Robert Avatar

    asa,
    have a look at Michael’s answers to Andrew’s and Kevin’s comments above.

  28. Paul H Avatar
    Paul H

    Thanks so much for this article. It has been very helpful to me!
    Do you happen to know a method to allow users to highlight a full math. For example, if cash is transfered from Node 1 to Node 2 to Node 3 and I click on Node 3, the full path from Nodes 1 to 3 highlight?

  29. Robert Avatar

    Paul,
    if you have a data source containing all predecessors for each node (i.e. in your example 1 and 2 would be predecessors of node 3), a highlight action should do the job.

  30. Creately Avatar

    hypothetically I think this tool has an amazing potential and it can draw good looking charts as well.

  31. Gabriel Avatar
    Gabriel

    So I get that Node XL can be used to get the X/Y Coordinates. But how? I’ve been trying to do this with no success.
    Thanks!

  32. Robert Avatar

    Gabriel,
    have a look at Allan Walker’s post here:

    Easy network graphs for Tableau with NodeXL

  33. keyur Avatar
    keyur

    in the example above Mary is connected to Jane so while preparing the record set id 2 will have one row as nodename Mary and relationship as Mary –> Jane and other row with same id 2, nodename Jane and relationship as Mary–>Jane.
    If mary is connected to 4 person and if try to connect the dots then only one relationship is connected

  34. JKB Avatar
    JKB

    Hi! Fantastic stuff here. Do you have any examples that are dependency or timeline based by chance? Any direction would be greatly appreciated!

  35. Robert Avatar

    JKB,
    I am sorry, I do not understand your question. What do you exactly mean by “dependency or timeline based”?

  36. JKB Avatar
    JKB

    Sorry Robert for the delay. Let’s say Jane is dependent on Mary to install hardwoods in the house and Wayne is dependent on Mary to FINISH the hardwoods in Jane’s house so he can paint. Mary needs Sally to ship the hardwoods to her, but they are delayed by 2 weeks. That now shifts the time line of the HW installation and painting of the house by 2 weeks.
    1. Can Network Graphs not only their relationships/connections to each other but shifts in timelines like in my example?
    2. Can there be an arrow at the end of the line depicting the direction of the connection? For example, Jane would have arrows pointing to Mary and Wayne. Wayne has a line with an arrow that points to Mary. Mary has a line with an arrow point to Sally.
    Hopefully that was a better question!

  37. Robert Avatar

    JKB,
    what you are describing is looking like a Project Plan displayed in the Network Diagram view. As much as I love working with Tableau Software, I wouldn’t use it to create this view. A dedicated Project Planning software like Microsoft Project would make this task much easier and more efficient than trying to create a data source which would display a Network Graph in Tableau.

  38. J Avatar
    J

    Hi
    what if, for example, other than sale from Jane to Bill, there is also a sale from Bill to Jane. How would it be reflected? Is it possible?

  39. Kyle Avatar
    Kyle

    Hi, I’ve found this really useful in my own work, but do not quite understand how this works. Could you explain how the relationships work? i.e. the ‘Relationship’ column seems to be doing all the work but I don’t quite understand how Tableau uses this to form the line connecting them.
    Is Tableau pre-programmed to recognize the ‘–>’

  40. Robert Avatar

    Kyle,
    the relationships, i.e. the lines between the dots / pie charts are plotted by setting the Mark Type of [Line Y] to “Line” and by dragging [ID] to the Details Card of the Marks Shelf. The dimension [Relationship] is for informational purposes only (used in the Tooltips).
    Please have a look at section “Step 4 – Connect the Dots” in the article and the sheet [step 4] in the workbook posted for download.

  41. SAhuja Avatar
    SAhuja

    Thanks for this amazing tutorial. I have another question. I want to show different weights on the lines. So for example, if the connection between A and B is stronger than the connection between A and C, I want to show the line between A and B to be darker(or bolder) or may be of a different color than the line between A and C.

  42. Robert Avatar

    SAhuja,
    simply drag the measure which represents the weight between the connections to the Color Shelf or (maybe even better) to the Size Shelf of the Line Chart.

  43. SAhuja Avatar
    SAhuja

    Omg That was so simple ! Thank you !!

Leave a Reply to keyur Cancel reply

Your email address will not be published. Required fields are marked *