## Hexbin Scatterplot in Tableau

An interesting tweet came across my Twitter-stream the other day, showing a hexbin scatterplot chart type for Power BI:

Having just presented a session at TC17 on working with dense data where Sarah Battersby and I covered (among other things) hexbinning in Tableau, I was intrigued by this viz type and wondered if it could be created in Tableau. I was a little wary as mixing polygons and points together can be complicated, but I hoped it could be done.

Let’s just say that I’m glad I was bald when I started this exercise because it involved quite a bit of hair-pulling. But after a few hours of trial and error and a well-timed break to go sit in the sun and ruminate, I managed to produce this little beauty:

I started with Alberto Cairo’s Datasaurus dataset – a group of datasets that behave similarly to Anscombe’s quartet. Really I was just being lazy as I had it lying around and therefore didn’t need to mock up my own sample scatterplots. The source data looks like this:

 dataset record id x y dino 1 55.3846 97.1795 dino 2 51.5385 96.0256 dino 3 46.1538 94.4872 dino 4 42.8205 91.4103 dino 5 40.7692 88.3333 dino 6 38.7179 84.8718 dino 7 35.641 79.8718 … … … …

With the data in this format there are two approaches for generating the hexbins – one uses densification to generate the polygon vertex records, and the other generates them through a join to a scaffolding table. I opted to use the scaffolding approach as a) I have a manageable amount of data and b) it makes life easier when you have hexbins that contain just a single point. The scaffold table looks like this:

 Point ID 0 1 2 3 4 5 6

And the join of these tables in Tableau looks like this (the join simulates a Cartesian product of the two tables):

The result of this is 7 rows of data for each point on the scatterplot:

I’ll use one of these (PointID=0) to plot the actual point location, and the other 6 to plot the hexagon shape. I’ve blogged on several occasions on how to generate a dynamic hexbin polygon and we’re going to use the same techniques here:

Generate the hexbin center point:
```[HexbinX]: HEXBINX([X]/[Hexbin Size], [Y]/[Hexbin Size]) * [Hexbin Size]
[HexbinY]: HEXBINY([X]/[Hexbin Size], [Y]/[Hexbin Size]) * [Hexbin Size]```

Generate a unique identifier for each hexbin. As you may know, I’m an advocate for efficiency so I use a numeric function for this (based on Cantor’s pairing function) instead of a string function:
`[HexbinID]: ([HexbinX]^3 + 3*[HexbinX] + 2*[HexbinX]*[HexbinY] + [HexbinY] + [HexbinY]^2)/2`

Generate the actual plot points keeping the original location when PointID=0 and using trigonometry to generate the hexagon vertices when PointID=(1..6):
```[PointType]: IF [Point ID] = 0 THEN 0 ELSE 1 END
[Angle]: (1.047198 * INDEX())
[PlotX]: IF MIN([PointType]) = 0 THEN MIN([X]) ELSE WINDOW_AVG(MIN([HexbinX])) + [Hexbin Size]*COS([Angle]) END
[PlotY]: IF MIN([PointType]) = 0 THEN MIN([Y]) ELSE WINDOW_AVG(MIN([HexbinY])) + [Hexbin Size]*SIN([Angle]) END```

We can now start plotting our viz – first let’s just get the points up:

You can see that the blue marks are the original data points and the orange points are the vertices for the hexagons. Because we want two marks types (a polygon and a point) we need a dual axis chart:

We need to isolate the orange marks on one side and the blue marks on the other. We can’t filter them, so we have to make some clever use of the “hide” function. I duplicated the [PointType] calculation from before so I can use one to colour one axis and the other to colour the other:

We then hide the marks we don’t need on each axis (right-click on the colour swatch in each legend and select “Hide”):

We can now make the hexagon marks on one axis, and circle marks on the other. Tidy up the colours and other formatting:

Finally, we set the axis to be “dual axis”, synchronise and hide the unwanted top axis, and voila:

The last couple of steps I put in were to a) colour the hexbins by the number of points they contain, b) tidy up the tooltips for each mark type, and c) set up a hover action to highlight the elements in a hexbin:

This ended up being quite a challenging viz and required quite a few techniques to get it done. But being able to do it at all reinforces for me that an expressive presentation model that allows you to natively create complex chart types (i.e. the Tableau approach) is faster and more reliable than a model where you are reliant on a developer to write a custom chart widget (i.e. the Power BI model). Even accounting for the trial and error needed to nut out the final successful method, Tableau allowed me to achieve the result much faster than a solution based on coding.

And of course, now that I know how, I can reproduce this solution in minutes.

PS. I couldn’t help myself. The workbook now includes solution examples using both the scaffolding and the densification approaches.

It was a mental itch that needed scratching.

Hi. I'm Alan. By day I manage the APAC sales engineering team for Snowflake Computing. By night, I'm a caped crusader. Or sleeping. Most often it's sleeping.
This entry was posted in Uncategorized. Bookmark the permalink.

### 6 Responses to Hexbin Scatterplot in Tableau

1. David says:

Love the depth of explanation for this. Thanks for sharing

2. Andrew says:

Thanks Alan, I don’t know when you had the time to do this during TC17 but good to know this type of Viz can be done in Tableau !

3. Hi Alan, glad to see you pushing ever forward!! And happy to see the dinosaur 🙂

I’m curious in which real-world scenarios is the hexbin scatterplot useful? The hex are useful for binning density. And the circles show that density, in all the dense detail. David’s tweet also doesn’t really answer, to what useful end do we marry the two together? Apart from the fact that densification is fun 🙂 Is there a real-world use case for which this chart type prevails?

• Hey Keith. Thanks for stopping by. It’s a good question.

I think it’s a useful chart type as it allows you to simultaneously see the summary view of the data (via the hexbins) as well as the detail view (via the points). This allows you to quickly identify areas of interest (either dark hexes for lots of data or light hexes for outliers, depending on your interest) where you could then zoom in to examine the data in more detail via the individual point marks. This combination allows you the benefit of both sub chart types at the same time.

My \$0.02.

Cheers,
Alan

4. Yes, agree. And it suppose it also depends upon the density of the underlying marks? When that density is just right, seems like a lovely chart type. And also easy, I can imagine, to wind up in the spot where data is so dense we can’t see it and this is the reason for binning in the first place?

Anyway, thank you for continuing to pioneer us forward! 🙂