Updated ABS boundaries for 2016

One of the most popular posts on my blog is Australian region boundaries in Tableau. Many people link through it to find the ASGS boundary data files I have converted for use in Tableau. The Australian Bureau of Statistics recently released an update to the ASGS main structure with changes to:

  • Mesh blocks
  • Statistical Area Levels 1 – 4 (SA1 – SA4)
  • Greater Capital City Statistical Areas (GCCSA)

You can read the details about the update here.

I’ve updated the boundary files for these structures and put them in the Dropbox folder, here.

I’ve also generated the boundaries at different resolutions – 10m, 100m and 200m. The finer the resolution, the more data points (e.g. the SA1 file has 781K points @ 200m, 1M points @ 100m, and 2.6M points @ 10m). Use the version that best suits your needs – if you are showing the whole country, use the 200m resolution file; if you are zoomed in on a small area use the 10m resolution file.



Posted in Uncategorized | 5 Comments

Description fields that work across record types

[Today’s post is just a little trick – not that hard to work out. But I liked it, and it’s my blog, so I’m writing about it. Also, after all the efficient workbook and densification stuff I could use the break.]

I’ve recently been working on some dashboards that run against the Tableau Server audit tables so I can see what my troublesome colleagues are up to on my demo server. One of the views I created was a timeline showing what is being done:

Event Timeline

I wanted the tooltip to show the name of the object that was being accessed/created/deleted/modified, but the name is in a different field depending on the object type – e.g. workbook, view, data source, data connection, etc. And only one or two of those fields would be valid for any event record. Additionally, some events require a concatenation of the names – e.g. if I’m accessing a view I want to also know the name of the workbook that it comes from:

Simply concatenating the fields together doesn’t work as some of the values will always be NULL, and any computation against a NULL results in a NULL. So we dutifully pull out the IFNULL() function and get to work:

ifnull([DC-Name], "") + ";" +
ifnull([DS-Name], "") + ";" +
ifnull([WB-Name], "") + ";" +
ifnull([VW-Name], "")

The problem with this is, if we look at a data connection the label shows:

Exasol 1GB TPC H;;;

Or a view shows:

;;Server Audit Data; Event Timeline;

So – combining our previous finding (computation with NULL returns NULL) we can modify our calculation to be:

ifnull([DC-Name] + "; ", "") +
ifnull([DS-Name] + "; ", "") +
ifnull([WB-Name] + "; ", "") +
ifnull([VW-Name], "")

And voila! We have what we want.


Posted in Uncategorized | Leave a comment

Manufacturing a polygon from a single lat/lon coordinate

A little while ago I wrote a blog post on how to use pivot and densification to create great arc lines on a map without having to read the source data multiple times. In that example, each record had two fields I could pivot to create the two records – the from/to airport codes. But what if I don’t have a field that I can pivot? What if my data looks like this?


Can I create this? (I think you can probably guess the answer…)


The universal rule is that in order to draw a densified line/polygon we need two records – one marking the start point and the other marking the end point. We then use densification techniques to fabricate the intervening points and calculate their location.

You might think that this data set doesn’t have fields to pivot, but of course it does – lat and lon! In order to convert my single record into two records I pivot on these fields:


We then use the pivot field name to generate out Point ID:

Point ID:
if [Lat&Lon] = "Latitude" then 1 else [NumPoints] end

And we unpivot our lat/lon values:

Unpivot Lat:
if [Lat&Lon] = "Latitude" then [Lat&Lon Value] end

Unpivot Lon:
if [Lat&Lon] = "Longitude" then [Lat&Lon Value] end


We then replicate the lat/lon values to fill in the NULLs via LOD calculations:

{fixed [Record ID] : max([Unpivot Lon])}

{fixed [Record ID] : max([Unpivot Lon])}

And that’s the hard work done! We now have two records with a Point ID that ranges from 1 to a parameter-controlled value. We now create a bin on this Point ID field of width 1 that will drive our densification when used with table calculations like this:

(6.28318 / [NumPoints]) * (index()-1)

Circle Lat:
window_avg(min([Lat])) + [SectorLength]*cos([Angle (circle)])

Circle Lon: 
window_avg(min([Lon])) + [SectorLength]*sin([Angle (circle)])

Now we can assemble our viz:


What started me down this path was revisiting my old mobile towers viz to get it off the previous custom SQL UNION approach. It now runs faster as it only does a single pass over the data. The logic is the same as the example above but the calculations are (obviously) different as we are plotting a sector, not a full circle:

2016-06-17_13-34-25And finally (because I sometimes go a little nuts) I used the technique discussed in this post to eliminate the projection distortion from the circles in my original example, and allow the user to set the radius in real-world measurements.

All the working is available in the workbook, which you can download here.


Posted in Uncategorized | 3 Comments

Best Practices for Designing Efficient Tableau Workbooks, the V10.0 edition

With the impending release of Tableau 10, I have been refreshing my whitepaper on how to author efficient Tableau workbooks.


Once again, I would like to acknowledge that the document is a distillation of materials written by many authors. Much of what I have done is to bring their work together into a single document and apply some structure. Some people reading it will recognise their fingerprints across sections (in fact some will recognise entire swathes of text). To all of you I give thanks because without your continued excellent work and lack of copyright infringement claims, the document would not exist.

I would also like to thank the many people who have reviewed this document for accuracy and readability. Your attention to detail, and patient explanations have made this into a much more legible document than I could ever have produced on my own.

Herewith, I present…

Best Practices for Designing Efficient Workbooks, the V10.0 edition.


Posted in Uncategorized | 3 Comments

You Can Never Be Too Hexy…

Some time ago I wrote a post about how to create hexagonal heat maps. Central to the technique is the need for a “path” field to plot the 6 vertices around the polygon marks. As this is a synthetic value it won’t exist in the source data, so we need to create it. The simplest way would be to just read the data multiple times, once for each PointID value:

SELECT …fields…, 1 as [PointID]
FROM Table
SELECT …fields…, 2 as [PointID]
FROM Table
SELECT …fields…, 3 as [PointID]
FROM Table
SELECT …fields…, 4 as [PointID]
FROM Table
SELECT …fields…, 5 as [PointID]
FROM Table
SELECT …fields…, 6 as [PointID]
FROM Table

Clearly this isn’t a great approach as it requires us to read the data 6 times, and for anything other than trivial volumes this becomes an issue. Another way is to simply generate the PointID value endpoints (1 and 6) and use a feature in Tableau called data densification to populate the interpolated values:

SELECT …fields…, 1 as [PointID]
FROM Table
SELECT …fields…, 6 as [PointID]
FROM Table

This is better as now we only need to read the data twice, but we’re still duplicating data. What if we could do it with just a single pass over the data? That would be even better…

One of my colleagues, Mary Solbrig, commented on the original blog post that if you have a field in your data that has at least 2 values within each hexbin, then you can synthesise the PointID field without multiple passes. At the time I nodded and thought to myself “yeah, that makes sense” but it wasn’t until someone asked me about this in more detail that I actually looked to see how it would be done.

With this approach, the PointID field is a calculation (not an output of custom SQL) and the secret is to use a LOD expression:

IF [ID] = {FIXED [Hex Lat],[Hex Lon] : MIN([ID])} THEN 1 ELSE 6 END

What this does is for each hexbin (uniquely identified by the calculated hexbin centroid) it produces two values – a 1 and a 6. It does this without needing to duplicate the data so it’s much faster and more efficient – both of which are highly desirable characteristics.

We can then build the densification scaffolding by creating a 1 unit wide bin over the PointID calculation, and then it’s the same approach as before to plot the viz:


I’m surprised that it’s taken me this long to internalise this approach and to recognise the brilliance. So a belated thank you, Mary – it might be late but it is heartfelt.

You can download an example workbook for this here.

Posted in Uncategorized | 11 Comments

Choosing the right calculation type… encore!

A couple of days ago, Keith Helfrich (@KeithHelfrich posted a question to twitter about updating an old calculation selection diagram Bethany Lyon and I presented in a hands-on training session at Tableau Conference a couple of years ago. I responded with this diagram from my earlier post that outlines my current view.

Jonathan Drummey (@jonathandrummey) then jumped in and posted this absolute killer link to a discussion on the Tableau forums. He covers the options in way more detail, looking at the challenge from multiple angles such as performance, data volumes and complexity of maintenance.

This is a must-read article. Thanks for sharing, Jonathan!

Posted in Uncategorized | Leave a comment

45 degree reference line with dynamic axes

I read a great blog post today from Andy Kriebel where he shows how to create a 45-degree reference line on a scatter plot:


His solution is very elegant, using a trend line against a secondary axis to create the reference line, however one limitation he points out in his video is that if the X and Y axes have different scales then it doesn’t work.

You either need to live with the visual confusion or fix the axes, however the latter option isn’t a good solution where you have a filter that will cause the range of the data to change significantly. A better solution is to ensure that both axes always show the same range and one way to do this is via reference lines.

First, we need to determine the maximum value on either of the two axes – we can do this with the MAX() function as follows:

WINDOW_MAX(SUM([Selected Year Sales])),
WINDOW_MAX(SUM([Prev Year Sales]))

We can then use this to plot a reference line on both axes – as you can see this forces them to have the same range:

By formatting the reference lines to have no line and no label they are effectively invisible, but they still control the maximum range of the axes:

You can download the workbook with an example of this technique here:



Posted in Uncategorized | 1 Comment