Filled maps and low bandwidth connections

I regularly work from my home office which is where my demo server is also located. Having a gigabit Ethernet connection to the server means network bandwidth/latency isn’t something I pay any attention to most of the time. However, the other day I was doing a demo at a customer’s office (my laptop connected to the internet via my phone’s hotspot) and noticed that some of my workbooks were considerably slower to open than I remembered them to be. Not all, but certainly some – and with further investigation I narrowed it down to a couple of dashboards:


Notice anything in common? Yep – the both have filled maps. I wondered if that had anything to do with the slow performance?

Once I got home, I opened up the offending dashboards again and the response time was nice and quick. Not as quick as other workbooks (which can be almost instant) but still fast enough not to concern me. Thinking this was something to do with the network I used Chrome’s developer tools (press CTRL-SHIFT-I) to look at the network traffic for the first dashboard above. There was the answer:

Because we are using client-side rendering, the bootstrap package has to pass all the data to the browser so it can draw it locally. This includes the marks which for filled maps are complex polygons requiring much more data to describe than a simple line or bar chart. Consequently, the bootstrap payload for a dashboard with filled maps can be ~1000x the size of an equivalent bar chart or even symbol map!

To really test this, I created a simple workbook showing all 2647 postcodes in Australia. One view displayed the data as a simple tabular list, another used a symbol map and the third used a filled map. I published this workbook to my demo server and use the Chrome developer tools to record the bootstrap package size and the total data transfer to the browser. I also used the developer tools to throttle the network to simulate a 4G connection (a really neat tool!) and recorded the time to last byte. The results were:

  • Tabular list : bootstrap package = 2.6K; total transfer = 38KB; total time = ~1.5s

  • Symbol map : bootstrap package = 61.9K; total transfer = 182KB; total time = ~3.5s (including fetching map tiles)

  • Filled map : bootstrap package = a whopping 2.2M; total transfer = 2.3M; total time = ~10s (including fetching map tiles)

Clearly, the use of filled maps requires a much larger data transfer to the client and therefore should be used sparingly in deployments where you have limited bandwidth. Or you could switch to server-side rendering – here are the results for the same workbook using the ?:render=no URL parameter – much faster!

  • Tabular list : bootstrap package = 2.6K; total transfer = 38K; total time = ~1.5s
  • Symbol map : bootstrap package = 3K; total transfer = 111K; total time = ~2s
  • Filled map : bootstrap package = 3K; total transfer = 139K; total time = ~2s


Posted in Uncategorized | 3 Comments

Updated ABS boundaries for 2016

Edit July 2017 – It seems a lot of people are still coming to this page from search engines. If this is you, hi! If you are using Tableau 10.2 or later let me be clear… you no longer need these files nor do you need to use the techniques described to show custom regions. Tableau can now read the ESRI boundary files from the ABS website natively. This gives you a much better experience and makes your workbooks much simpler. Check out the Tableau blog post on this new feature here.

One of the most popular posts on my blog is Australian region boundaries in Tableau. Many people link through it to find the ASGS boundary data files I have converted for use in Tableau. The Australian Bureau of Statistics recently released an update to the ASGS main structure with changes to:

  • Mesh blocks
  • Statistical Area Levels 1 – 4 (SA1 – SA4)
  • Greater Capital City Statistical Areas (GCCSA)

You can read the details about the update here.

I’ve updated the boundary files for these structures and put them in the Dropbox folder, here.

I’ve also generated the boundaries at different resolutions – 10m, 100m and 200m. The finer the resolution, the more data points (e.g. the SA1 file has 781K points @ 200m, 1M points @ 100m, and 2.6M points @ 10m). Use the version that best suits your needs – if you are showing the whole country, use the 200m resolution file; if you are zoomed in on a small area use the 10m resolution file.



Posted in Uncategorized | 31 Comments

Description fields that work across record types

[Today’s post is just a little trick – not that hard to work out. But I liked it, and it’s my blog, so I’m writing about it. Also, after all the efficient workbook and densification stuff I could use the break.]

I’ve recently been working on some dashboards that run against the Tableau Server audit tables so I can see what my troublesome colleagues are up to on my demo server. One of the views I created was a timeline showing what is being done:

Event Timeline

I wanted the tooltip to show the name of the object that was being accessed/created/deleted/modified, but the name is in a different field depending on the object type – e.g. workbook, view, data source, data connection, etc. And only one or two of those fields would be valid for any event record. Additionally, some events require a concatenation of the names – e.g. if I’m accessing a view I want to also know the name of the workbook that it comes from:

Simply concatenating the fields together doesn’t work as some of the values will always be NULL, and any computation against a NULL results in a NULL. So we dutifully pull out the IFNULL() function and get to work:

ifnull([DC-Name], "") + ";" +
ifnull([DS-Name], "") + ";" +
ifnull([WB-Name], "") + ";" +
ifnull([VW-Name], "")

The problem with this is, if we look at a data connection the label shows:

Exasol 1GB TPC H;;;

Or a view shows:

;;Server Audit Data; Event Timeline;

So – combining our previous finding (computation with NULL returns NULL) we can modify our calculation to be:

ifnull([DC-Name] + "; ", "") +
ifnull([DS-Name] + "; ", "") +
ifnull([WB-Name] + "; ", "") +
ifnull([VW-Name], "")

And voila! We have what we want.


Posted in Uncategorized | Leave a comment

Manufacturing a polygon from a single lat/lon coordinate

A little while ago I wrote a blog post on how to use pivot and densification to create great arc lines on a map without having to read the source data multiple times. In that example, each record had two fields I could pivot to create the two records – the from/to airport codes. But what if I don’t have a field that I can pivot? What if my data looks like this?


Can I create this? (I think you can probably guess the answer…)


The universal rule is that in order to draw a densified line/polygon we need two records – one marking the start point and the other marking the end point. We then use densification techniques to fabricate the intervening points and calculate their location.

You might think that this data set doesn’t have fields to pivot, but of course it does – lat and lon! In order to convert my single record into two records I pivot on these fields:


We then use the pivot field name to generate out Point ID:

Point ID:
if [Lat&Lon] = "Latitude" then 1 else [NumPoints] end

And we unpivot our lat/lon values:

Unpivot Lat:
if [Lat&Lon] = "Latitude" then [Lat&Lon Value] end

Unpivot Lon:
if [Lat&Lon] = "Longitude" then [Lat&Lon Value] end


We then replicate the lat/lon values to fill in the NULLs via LOD calculations:

{fixed [Record ID] : max([Unpivot Lon])}

{fixed [Record ID] : max([Unpivot Lon])}

And that’s the hard work done! We now have two records with a Point ID that ranges from 1 to a parameter-controlled value. We now create a bin on this Point ID field of width 1 that will drive our densification when used with table calculations like this:

(6.28318 / [NumPoints]) * (index()-1)

Circle Lat:
window_avg(min([Lat])) + [SectorLength]*cos([Angle (circle)])

Circle Lon: 
window_avg(min([Lon])) + [SectorLength]*sin([Angle (circle)])

Now we can assemble our viz:


What started me down this path was revisiting my old mobile towers viz to get it off the previous custom SQL UNION approach. It now runs faster as it only does a single pass over the data. The logic is the same as the example above but the calculations are (obviously) different as we are plotting a sector, not a full circle:

2016-06-17_13-34-25And finally (because I sometimes go a little nuts) I used the technique discussed in this post to eliminate the projection distortion from the circles in my original example, and allow the user to set the radius in real-world measurements.

All the working is available in the workbook, which you can download here.


Posted in Uncategorized | 8 Comments

Best Practices for Designing Efficient Tableau Workbooks, the V10.0 edition

With the impending release of Tableau 10, I have been refreshing my whitepaper on how to author efficient Tableau workbooks.


Once again, I would like to acknowledge that the document is a distillation of materials written by many authors. Much of what I have done is to bring their work together into a single document and apply some structure. Some people reading it will recognise their fingerprints across sections (in fact some will recognise entire swathes of text). To all of you I give thanks because without your continued excellent work and lack of copyright infringement claims, the document would not exist.

I would also like to thank the many people who have reviewed this document for accuracy and readability. Your attention to detail, and patient explanations have made this into a much more legible document than I could ever have produced on my own.

Herewith, I present…

Best Practices for Designing Efficient Workbooks, the V10.0 edition.


Posted in Uncategorized | 4 Comments

You Can Never Be Too Hexy…

Some time ago I wrote a post about how to create hexagonal heat maps. Central to the technique is the need for a “path” field to plot the 6 vertices around the polygon marks. As this is a synthetic value it won’t exist in the source data, so we need to create it. The simplest way would be to just read the data multiple times, once for each PointID value:

SELECT …fields…, 1 as [PointID]
FROM Table
SELECT …fields…, 2 as [PointID]
FROM Table
SELECT …fields…, 3 as [PointID]
FROM Table
SELECT …fields…, 4 as [PointID]
FROM Table
SELECT …fields…, 5 as [PointID]
FROM Table
SELECT …fields…, 6 as [PointID]
FROM Table

Clearly this isn’t a great approach as it requires us to read the data 6 times, and for anything other than trivial volumes this becomes an issue. Another way is to simply generate the PointID value endpoints (1 and 6) and use a feature in Tableau called data densification to populate the interpolated values:

SELECT …fields…, 1 as [PointID]
FROM Table
SELECT …fields…, 6 as [PointID]
FROM Table

This is better as now we only need to read the data twice, but we’re still duplicating data. What if we could do it with just a single pass over the data? That would be even better…

One of my colleagues, Mary Solbrig, commented on the original blog post that if you have a field in your data that has at least 2 values within each hexbin, then you can synthesise the PointID field without multiple passes. At the time I nodded and thought to myself “yeah, that makes sense” but it wasn’t until someone asked me about this in more detail that I actually looked to see how it would be done.

With this approach, the PointID field is a calculation (not an output of custom SQL) and the secret is to use a LOD expression:

IF [ID] = {FIXED [Hex Lat],[Hex Lon] : MIN([ID])} THEN 1 ELSE 6 END

What this does is for each hexbin (uniquely identified by the calculated hexbin centroid) it produces two values – a 1 and a 6. It does this without needing to duplicate the data so it’s much faster and more efficient – both of which are highly desirable characteristics.

We can then build the densification scaffolding by creating a 1 unit wide bin over the PointID calculation, and then it’s the same approach as before to plot the viz:


I’m surprised that it’s taken me this long to internalise this approach and to recognise the brilliance. So a belated thank you, Mary – it might be late but it is heartfelt.

You can download an example workbook for this here.

Posted in Uncategorized | 11 Comments

Choosing the right calculation type… encore!

A couple of days ago, Keith Helfrich (@KeithHelfrich posted a question to twitter about updating an old calculation selection diagram Bethany Lyon and I presented in a hands-on training session at Tableau Conference a couple of years ago. I responded with this diagram from my earlier post that outlines my current view.

Jonathan Drummey (@jonathandrummey) then jumped in and posted this absolute killer link to a discussion on the Tableau forums. He covers the options in way more detail, looking at the challenge from multiple angles such as performance, data volumes and complexity of maintenance.

This is a must-read article. Thanks for sharing, Jonathan!

Posted in Uncategorized | Leave a comment