I thought it might be helpful to share with you a recent experience where I was helping a customer who was complaining of slow performing workbooks. When I arrived their Tableau analyst showed me a couple of workbooks that were performing slowly. What do we mean by slowly? Well, for one of the workbooks, just opening it took ~5 mins!
OK – time to start looking for problems, but where to begin? Fortunately, I’m presenting a session at our customer conference next week (designing efficient workbooks: live on stage) and in it I propose a framework for identifying performance issues.
Basically the top layers are where you are most likely to find performance issues, so you should start looking at the design of the dashboards and calculations (which are easy to fix) before you start looking at the data model or the underlying platform (where often you have less ability to change things).
With this framework in mind, I started reviewing the workbook.
I’m can’t include a screenshot of the report, but let me describe it for you:
It was a simple design – a single sheet (not even a dashboard!) with a straightforward layout. But… it was a great big crosstab!
(Me when I see great big crosstabs…)
That could be part of the problem as big crosstabs are not the most efficient viz type to render. However, I noticed that the main delay when opening the workbook was with “executing query” and not “computing layout” so this probably wasn’t the main issue we needed to fix.
Note: I’ve created a mock-up of the data, anonymised to avoid any embarrassment for the client. All the screenshots you will see from here on in reflect the original situation but are totally synthetic.
The analytics layer is concerned with the complexity of calculations and how you combine and manipulate the data (e.g. blending). Surprisingly this was extremely straightforward for the report:
The next layer down is concerned with the data sources and data connections – how you are retrieving the data from the original location. When I opened up the very simple data source I saw this:
OK – that explain why there were no calculations in the data source and why it was so simple. All the logic is wrapped up in a custom SQL query. It’s also a bit of a red flag… a live connection using custom SQL isn’t a recommended design pattern. Let’s take a look inside and see what we’re dealing with.
Oh my! I think we have found the problem.
The customer’s database development team had provided the Tableau analyst with several SQL statements like this from their existing reporting solution. It’s not optimal SQL and I’m sure it could be improved if we tried, but the analyst (like many Tableau users) didn’t have the time or the SQL knowledge to do this so simply cut&paste the query into a custom SQL connection. The result? The whole query needed to run before we could then sub-select off the interim result set and the experience was very slow.
What to do? Well, we could follow basic best practice and make this an extract instead of a live connection so we only run the slow query once – after that everything points to the extract. But the customer wanted to maintain up-to-date views of their data so a live connection was preferable. Even more so once we determined that the extract would require frequent intra-day refreshes and was potentially very large.
So, working with the analyst we managed to reverse engineer the relationship between the tables as this:
We then captured the logic for the CATEGORY and the MEASURE fields into a couple of Tableau calculated fields:
And the result? We reduced the time taken to render the dashboard from ~5 mins
to < 10 secs! A 30x improvement which the customer was absolutely delighted with. They are now going to take the learnings from this exercise and apply them to their other workbooks.
The moral of the story? It reinforces that custom SQL can be a serious impediment to performance and should be used with caution. Leveraging the native capabilities of Tableau allows us to generate much more efficient queries which is good for response times and also good for the query load on the DBMS. If you must use custom SQL, try to wrap it in a view in the DBMS if you can or pull it into an extract as a last resort.
Thanks for the article on performance. This is one of the few issues with Tableau. If custom SQL is a “serious impediment to performance”, perhaps Tableau is not the tool for us. All our data is in databases and needs to be queried. It’s not practical to create a view in the DBMS for every report. Extracts are always used because of the length of time the queries take. I would have liked to have heard more evidence that Tableau can calculate and manipulate data faster than SQL, and more efficiency tips when using custom SQL as a data source. Thx again.
Hi KJoyce – thanks for your comment. The intent of the article was not to say that live connections to databases (i.e. SQL connections) are bad. Quite the contrary! Live connections to databases (when created like I show in the article) are highly efficient as Tableau will create an optimised query for each element that needs to read the database (e.g. a chart, a filter, etc).
The message here is that if you use custom SQL as the basis of a database connection we will not decompose it. This means Tableau’s ability to optimise the queries passed to the DBMS is limited – all we do is wrap it in an outer SELECT-FROM-WHERE. If the final query takes a long time to run (like in the example I discuss) then we will potentially run it multiple times which can lead to slow performance.
So what can you do? The easiest solution is to simply make the connection an extract instead of a live connection. This means the query is run once, the results are persisted into a TDE (soon to be Hyper) file and all subsequent interactions are optimal queries against the extract. This is the one-click, magic go-fast button approach, but if you have many millions of records or if you need real-time data (as this customer did) then this can be complex to maintain compared to a live connection.
The approach I outline in this post is to reinterpret the custom SQL into a native Tableau connection. This allows you to keep the live connection because Tableau can now create optimal queries, not just SELECT-FROM-WHERE statements wrapped around the inner custom SQL. The recommendation for DBMS views would be more applicable if a) the logic in the custom SQL statement was not able to be replicated as a native Tableau data connection; or b) you wanted to reuse this logic in reporting/analytic workflows outside of Tableau.
For more information on the topic of efficient workbooks, I refer you to my whitepaper on the subject. It provides detailed information on best practices, including working with custom SQL.
(Note this is the V10 edition of the whitepaper and I’m in the process of updating it for all the new features we’ve delivered since last year.)
I hope this helps better explain the intent of the article.
that was super helpful 🙂
Alan, will this Best Practices workbook still be updated? Since you’ve gone to Snowflake, I’m wondering if you’ll finish adding to it, or who will take this project on?
Very good piece and most helpful. I’ve contracted for a few clients who have gone down the whole custom SQL route, with painfully slow dashboards as a result. Ideally, in a big data environment the data engineers will be aligned with your aims on Tableau; so they should create views in SQL.
Pingback: Letting go of custom SQL | The Last Data Bender | The Data Diaries
Perfect timing, Alan, sending to a number off my clients.
Can i convince you to join the USCA team? 🙂
Only if you move it to the other side of the Pacific. The commute would kill me! 🙂
Thanks for this post. It has already helped me think through our live connections and how to create calculated fields.
Reblogged this on datatoffee.