I thought it might be helpful to share with you a recent experience where I was helping a customer who was complaining of slow performing workbooks. When I arrived their Tableau analyst showed me a couple of workbooks that were performing slowly. What do we mean by slowly? Well, for one of the workbooks, just opening it took ~5 mins!
OK – time to start looking for problems, but where to begin? Fortunately, I’m presenting a session at our customer conference next week (designing efficient workbooks: live on stage) and in it I propose a framework for identifying performance issues.
Basically the top layers are where you are most likely to find performance issues, so you should start looking at the design of the dashboards and calculations (which are easy to fix) before you start looking at the data model or the underlying platform (where often you have less ability to change things).
With this framework in mind, I started reviewing the workbook.
I’m can’t include a screenshot of the report, but let me describe it for you:
It was a simple design – a single sheet (not even a dashboard!) with a straightforward layout. But… it was a great big crosstab!
(Me when I see great big crosstabs…)
That could be part of the problem as big crosstabs are not the most efficient viz type to render. However, I noticed that the main delay when opening the workbook was with “executing query” and not “computing layout” so this probably wasn’t the main issue we needed to fix.
Note: I’ve created a mock-up of the data, anonymised to avoid any embarrassment for the client. All the screenshots you will see from here on in reflect the original situation but are totally synthetic.
The analytics layer is concerned with the complexity of calculations and how you combine and manipulate the data (e.g. blending). Surprisingly this was extremely straightforward for the report:
The next layer down is concerned with the data sources and data connections – how you are retrieving the data from the original location. When I opened up the very simple data source I saw this:
OK – that explain why there were no calculations in the data source and why it was so simple. All the logic is wrapped up in a custom SQL query. It’s also a bit of a red flag… a live connection using custom SQL isn’t a recommended design pattern. Let’s take a look inside and see what we’re dealing with.
Oh my! I think we have found the problem.
The customer’s database development team had provided the Tableau analyst with several SQL statements like this from their existing reporting solution. It’s not optimal SQL and I’m sure it could be improved if we tried, but the analyst (like many Tableau users) didn’t have the time or the SQL knowledge to do this so simply cut&paste the query into a custom SQL connection. The result? The whole query needed to run before we could then sub-select off the interim result set and the experience was very slow.
What to do? Well, we could follow basic best practice and make this an extract instead of a live connection so we only run the slow query once – after that everything points to the extract. But the customer wanted to maintain up-to-date views of their data so a live connection was preferable. Even more so once we determined that the extract would require frequent intra-day refreshes and was potentially very large.
So, working with the analyst we managed to reverse engineer the relationship between the tables as this:
We then captured the logic for the CATEGORY and the MEASURE fields into a couple of Tableau calculated fields:
And the result? We reduced the time taken to render the dashboard from ~5 mins
to < 10 secs! A 30x improvement which the customer was absolutely delighted with. They are now going to take the learnings from this exercise and apply them to their other workbooks.
The moral of the story? It reinforces that custom SQL can be a serious impediment to performance and should be used with caution. Leveraging the native capabilities of Tableau allows us to generate much more efficient queries which is good for response times and also good for the query load on the DBMS. If you must use custom SQL, try to wrap it in a view in the DBMS if you can or pull it into an extract as a last resort.