Letting go of custom SQL

I thought it might be helpful to share with you a recent experience where I was helping a customer who was complaining of slow performing workbooks. When I arrived their Tableau analyst showed me a couple of workbooks that were performing slowly. What do we mean by slowly? Well, for one of the workbooks, just opening it took ~5 mins!

Image result for cookie monster gif

OK – time to start looking for problems, but where to begin? Fortunately, I’m presenting a session at our customer conference next week (designing efficient workbooks: live on stage) and in it I propose a framework for identifying performance issues.

Basically the top layers are where you are most likely to find performance issues, so you should start looking at the design of the dashboards and calculations (which are easy to fix) before you start looking at the data model or the underlying platform (where often you have less ability to change things).

With this framework in mind, I started reviewing the workbook.

Presentation

I’m can’t include a screenshot of the report, but let me describe it for you:

It was a simple design – a single sheet (not even a dashboard!) with a straightforward layout. But… it was a great big crosstab!

Image result for frustrated reaction gifs
(Me when I see great big crosstabs…)

That could be part of the problem as big crosstabs are not the most efficient viz type to render. However, I noticed that the main delay when opening the workbook was with “executing query” and not “computing layout” so this probably wasn’t the main issue we needed to fix.

Image result for these aren't the droids gifs

Analytics

Note: I’ve created a mock-up of the data, anonymised to avoid any embarrassment for the client. All the screenshots you will see from here on in reflect the original situation but are totally synthetic.

The analytics layer is concerned with the complexity of calculations and how you combine and manipulate the data (e.g. blending). Surprisingly this was extremely straightforward for the report:

sdf

Image result for nope gifs

Data

The next layer down is concerned with the data sources and data connections – how you are retrieving the data from the original location. When I opened up the very simple data source I saw this:

1

OK – that explain why there were no calculations in the data source and why it was so simple. All the logic is wrapped up in a custom SQL query. It’s also a bit of a red flag… a live connection using custom SQL isn’t a recommended design pattern. Let’s take a look inside and see what we’re dealing with.

2

Image result for surprised reaction gifs

Oh my! I think we have found the problem.

The customer’s database development team had provided the Tableau analyst with several SQL statements like this from their existing reporting solution. It’s not optimal SQL and I’m sure it could be improved if we tried, but the analyst (like many Tableau users) didn’t have the time or the SQL knowledge to do this so simply cut&paste the query into a custom SQL connection. The result? The whole query needed to run before we could then sub-select off the interim result set and the experience was very slow.

What to do? Well, we could follow basic best practice and make this an extract instead of a live connection so we only run the slow query once – after that everything points to the extract. But the customer wanted to maintain up-to-date views of their data so a live connection was preferable. Even more so once we determined that the extract would require frequent intra-day refreshes and was potentially very large.

So, working with the analyst we managed to reverse engineer the relationship between the tables as this:

3

We then captured the logic for the CATEGORY and the MEASURE fields into a couple of Tableau calculated fields:

4

5

And the result? We reduced the time taken to render the dashboard from ~5 mins
to < 10 secs! A 30x improvement which the customer was absolutely delighted with. They are now going to take the learnings from this exercise and apply them to their other workbooks.

Image result for clapping gifs

The moral of the story? It reinforces that custom SQL can be a serious impediment to performance and should be used with caution. Leveraging the native capabilities of Tableau allows us to generate much more efficient queries which is good for response times and also good for the query load on the DBMS. If you must use custom SQL, try to wrap it in a view in the DBMS if you can or pull it into an extract as a last resort.

Posted in Uncategorized | 10 Comments

I am sooo not worthy…

[Sarah Battersby has on previous occasions shown how outrageously smart she is, but she has blown my mind today. I am sooo not worthy…]

I’ve been asked on occasion if we can show point data from one data source (containing explicit lat/lon fields) along with custom shapes drawn from a shapefile. My response in the past as been that you can if you convert the explicit lat/lon fields to a point shape so you can just change the GEOMETRY field on a dual axis map. The reason for this thinking was that I believed that you need to use the Latitude (generated) and Longitude (generated) fields for the GEOMETRY field and you couldn’t dual axis this with other lat/lon fields.

Well, it turns out I was wrong. I found this out when reading Sarah’s blog post about dual axis mapping (and specifically the section on combining a shapefile with a CSV):

https://community.tableau.com/people/sarah.battersby.0/blog/2017/06/24/dual-axis-mapping-many-ways

There were two particular tricks she used in that section that caught my eye. First was the “faux union” done by a full outer join on the criteria of 1=0. What a neat trick – I’d never thought of doing that before! But the real winner was how she used the ZN(Latitude) and ZN(Longitude) to allow us to plot the GEOMETRY fields when the lat/lon values were NULL. I had no idea you could do that. Forest for the trees, or something like that…

Anyhow – the short of it is that I now know that it’s relatively easy to create a viz like this:

Dots&Polys.png

Thanks, Sarah! Genius!

Posted in Uncategorized | Leave a comment

Tableau Data Day Out – Recorded Sessions

As a follow-up to my last post, we now have all the presentation recordings from our Data Day Out event in Melbourne available for viewing.

TITLE LENGTH
Ideas that Should Die
32 MIN
Scribbles and Lines
20 MIN
Playing it Safe: Freedom and Governance in a Self-Service Analytics Environment
26 MIN
More Than Dots on a Map
26 MIN
So You Want to be a Data Scientist?
30 MIN
Traditional BI versus Modern BI – Does it Matter?
29 MIN
What’s the Story?
28 MIN

Congratulations and thanks to all the presales team members who helped make this event such a success!

Posted in Uncategorized | Leave a comment

More than Dots on a Map

At the recent Tableau Data Day Out in Melbourne, I presented a session entitled “More than Dots on a Map”. The idea was to show how Tableau could allow users to do spatial analysis of their data – irrespective of whether it had implicit or explicit location fields. By this I mean to go beyond just plotting shapes and empower the user to interact in powerful ways, asking lots of different questions that require the data to be shaped and visualised differently.

Below is a recording I made of the session – enjoy!

Posted in Uncategorized | 4 Comments

More on spatial file rendering

Following from my post yesterday I had a colleague ask me: “So what happens when you force client-side rendering at the detailed level? Does it blow up with too much data?”

Interesting question. Let’s find out.

(For those of you who don’t want to read the rest of the post… the answer is “no – we’re too clever for that!”)

As outlined in the documentation here, you can adjust the complexity threshold where we switch between client-side and server-side rendering, or you can force either using “:render=true” or “:render=false” on the URL. So here is my workbook from yesterday with the “:render=true” flag set at the raw level – notice that the map is being rendered in the browser:

Clearly it didn’t blow up. This is because Tableau Server does dynamic simplification of the polygons based on the zoom level of the viewer. We reduce the complexity of the data we deliver to the browser, removing data points that can’t be seen. Clever, huh? This means that for the above map we only ship 7.2MB of data to the browser (see the bootstrap package in the debug window) so there is no danger of the runtime exploding due to too many points.

Performance was acceptable for initial view but I’m local on GB Ethernet to my server. As previously documented, it would be slower on a low bandwidth connection, but once the map is rendered responsiveness is great! You can test it yourself here:

https://tab.databender.net/views/ShapefileRenderingTest/GCCMap?:embed=y&:render=true&:jsdebug=true

The simplification of the polygons is done on the server and this becomes apparent when you zoom. From the above map, I zoomed right in to Port Melbourne – a much closer view. Initially the map renders with the outline of the polygons fuzzy – it’s just zoomed the existing client-side rendered image:

However, the server is busy in the background computing the new model. Watching the trace, the pane-zoom event sits in “pending” state for a few seconds, then it starts to stream down the more granular polygon definitions and the boundaries sharpen up:

Additionally, to manage the data we ship to the browser, we also clip the data to the viewport so we only transfer the vertices we need to render. You can see the pane-pan-server event in pending state after I pan:

The viewport is updated once the data has arrived:

So – you can safely force client-side rendering even with very complex shapefiles however there are performance trade-offs as the server must compute simplified polygons for each visual model. The upside is that you can zoom in a long way and still have detailed boundaries.

Kudos to our product and development teams for developing this feature in such a robust way!

Posted in Uncategorized | Leave a comment

Spatial files and client/server rendering

In the pre-10.3 world, if we wanted to show custom regions on a map we had to use custom polygons. This approach was less than optimal for a number of reasons (probably the most painful of which was changing the LOD of the viz) but one little-known one was that the polygon mark type forced server-side rendering when the workbook was published to Tableau Server. See here for the reference.

In the post-10.3 world we can now use spatial files to display custom regions which is a great big bucket of awesome sauce. However, the question came to my mind – “these are still effectively polygons, so do they trigger server-side rendering too?”

The answer, I’m pleased to report, is not necessarily.

Check out this workbook:
https://tab.databender.net/views/ShapefileRenderingTest/GCCMap?:embed=y&:jsdebug=true

This workbook shows Greater Capital City statistical areas for Australia – selected because it only has a few polygons but the source data is very detailed (~1.96M vertices). When you view the raw data from the shapefile it causes server-side rendering:

I generated (via Alteryx) some generalised versions of the same polygons – at 250m, 100m, 50m, 10m and 1m resolution:

This results in polygons with less vertices which means a simpler data set to render. The number of vertices in the data set for each resolution was:

Detail (Kms)

Number of Vertices

0.25

50,905

0.1

110,332

0.05

188,466

0.01

483,807

0.001

1,256,083

raw

1,960,032

When we select the simpler polygons for our map, we see that the rendering mode flips over to client-side rendering. Which is awesome as it gives a much smoother experience for tooltips, selections and highlighting.

In this exercise I found that when viewing all of the polygons (i.e. the whole of Australia) the cutover between client-side and server-side rendering was between 10m and 1m resolution – i.e. between 483K and 1.25M vertices. However, I also noticed that when viewing the raw shapefile resolution, if I filtered the data set to reduce the number of polygons (e.g. by selecting Victoria only – ~222K vertices) this brought me back under the threshold and allowed rendering to go back to client-side:

So the short of all of this is that working with shapefile regions works in a similar way to other dashboards when it comes to client/server-side rendering. If the complexity of the viz is over the complexity threshold then we use server-side rendering. If below, we use client-side rendering. The takeaway from this from a performance perspective is – if you are working with shapefiles and find yourself experiencing server-side rendering, consider either filtering the number of polygons or try to use a lower-resolution version of the shapefile.

As I also pointed out in my previous blog post about filled maps and low bandwidth connections, the polygon data for client-side rendering can add significantly to the size of the bootstrap package so in low bandwidth environments it might be preferable to trade responsiveness for rendering time. For the above dashboard the client-side rendering bootstrap package was 2.3M for all of Australia @ 10m resolution, vs. 5.3K for the bootstrap and 182K for the image tile when using server-side rendering.

Enjoy!

Posted in Uncategorized | 1 Comment

Beyond shape files…

[This is one of those moments when you realise you haven’t been seeing the big picture. Digging around the edges of a new concept you suddenly see the foundations are much deeper than you thought. So – hats off to our wonderful dev team for being several steps ahead…]

I finally had a few moments of spare time the other day, so I got to watching some internal training videos for Tableau 10.2. These particular videos are what we call WINK (what I need to know) training and are deep dive sessions on new features we have released. One of them immediately caught my eye with the following abstract:

“Extract API supports geospatial data”

Wait… what!?!

Sure enough – when I went looking I found one of the new features in 10.2 is that the extract API now supports the spatial data type. You can find more about this feature in the Tableau SDK Reference. The really cool part of this is that it’s super simple to use – all you have to do is insert spatial data in WKT format. This means you can easily fabricate your own spatial data or import it from a spatial database using a function like ST_AsText().

It’s been a long time since I flexed my coding muscles but my Google-fu is mighty, so without too much hassle I was able to install Python, install our Tableau API and fiddle with our SDK sample. The code was easy (once I realised that indenting is apparently important in Python J) and the relevant lines are highlighted:

# Insert Data
row = Row( schema )
row.setDateTime( 0, 2012, 7, 3, 11, 40, 12, 4550 )    # Purchased
row.setCharString( 1, 'Beans' )                # Product
row.setString( 2, u'uniBeans'    )            # Unicode Product
row.setDouble( 3, 1.08 )                 # Price
row.setDate( 6, 2029, 1, 1 )                 # Expiration Date
row.setCharString( 7, 'Bohnen' )            # Produkt
for i in range( 10 ):
    row.setInteger( 4, i * 10 )                # Quantity
    row.setBoolean( 5, i % 2 == 1 )             # Taxed
    inner = str(i * 3)
    outer = str(i * 5)
    row.setSpatial( 8, "POLYGON ((" + inner + " " + inner + ", " + inner + " " +
outer + ", " + outer + " " + outer + ", " + outer + " " + inner + ", " + inner +
" " + outer + "))"
    table.insert( row )

The result was this:

I was also able to load the file with mixed spatial types:

# Insert Data
row = Row( schema )
row.setDateTime( 0, 2012, 7, 3, 11, 40, 12, 4550 )  # Purchased
row.setCharString( 1, 'Beans' )                     # Product
row.setString( 2, u'uniBeans'    )                  # Unicode Product
row.setDouble( 3, 1.08 )                            # Price
row.setDate( 6, 2029, 1, 1 )                        # Expiration Date
row.setCharString( 7, 'Bohnen' )                    # Produkt
for i in range( 10 ):
    row.setInteger( 4, i * 10 )                     # Quantity
    row.setBoolean( 5, i % 2 == 1 )                 # Taxed
    inner = str(i * 3)
    outer = str(i * 5)
    if ( i % 2 == 0 ):
           row.setSpatial( 8, 'POINT(' + inner + ' ' + inner + ')' )
    else:
           row.setSpatial( 8, 'POLYGON ((' + inner + ' ' + inner + ', ' + inner +
' ' + outer + ', ' + outer + ' ' + outer + ', ' + outer + ' ' + inner + ', ' +
inner + ' ' + outer + '))' )
    table.insert( row )

Note that this isn’t fully supported in Tableau but you can use them if you are careful not to mix them in the same mark. Get it wrong and you’ll see this:

Get it right and you’ll see this:

The result of all this is that we are not limited to just bringing in spatial data from spatial files – we can bring it from anywhere with a little bit of effort. This is very exciting and I look forward to seeing what you all create.

Posted in Uncategorized | Leave a comment