Even Hexier than Before…

[This blog post is a result of working with Sarah Battersby (Spatial Overlord and all-around Crazy Map Lady) from our Seattle office. We worked together for a TC15 presentation “Go Deep: Interpreting Dense Data with Tableau” where she presented the technique for creating uniformly spaced hexbins. Thanks for doing all the hard work, Sarah! 🙂 ]

At TC15 I had the pleasure of presenting a session with Sarah Battersby, a research scientist from the Seattle Tableau office who is a specialist in hex. We talked about dealing with dense data and in her session she covered heat maps which naturally lead to a discussion around hexbins.

Back in March I wrote a couple of posts that showed how to create polygons that wrapped around the center points of hexbins calculated using the HEXBINX() and HEXBINY() functions. Here’s the output:

If you look closely, you can see that the shape of the polygons is gradually distorting as we move away from the Equator. By the time we reach Tasmania they are visibly taller than they are wide. This is due to the distortion effects of the Web Mercator projection we use in Tableau.

For some customers, it may be preferable to have polygons that are uniformly sized no matter where they are on the map. I’m not going to go into the pros/cons of doing this and how you might be distorting your data as the hexbins will not actually cover the same amount of area of the Earth’s surface – I’ll leave the details of that for Sarah if/when she ever finds this post. In any case, Sarah presented an elegant solution to this problem:

• convert your lat/lon data into Web Mercator projection coordinates
• create the hexbins in this coordinate system
• calculate the vertex coordinates in WM coordinates
• convert back to lat/lon and draw on the Tableau map

She outlines the maths for this in her slides:

Go Deep: Interpreting Dense Data with Tableau

You can see the results here – notice the hexagons are now uniformly sized no matter how far south we travel:

You can download the workbook from here to see the solution in action. Enjoy!

Hi. I'm Alan. By day I manage the APAC sales engineering team for Snowflake Computing. By night, I'm a caped crusader. Or sleeping. Most often it's sleeping.
This entry was posted in Uncategorized. Bookmark the permalink.

15 Responses to Even Hexier than Before…

1. Alan,

Great work! Thanks for this contribution to the community. Tableau has the best community in software. I wanted to see your presentation at TC15, but there was too many other options at that time.

• Thanks for the kind comments, Kris. And no worries you couldn’t make the session – it was recorded and is available on the TC15 website.

2. Stuart Dunlap says:

Alan,

Thanks for the post and the great supporting example! Your numbering of the different formulas made it very easy to follow a complex process. I saw the TC15 recorded session you and Sarah presented. (I had to go back to catch some things I missed – and it was well worth the time.)

I have a few questions if you have time to elaborate on this approach…

1. In the ‘Map Pins’ worksheet of the sample workbook you posted, if I have Scale set to ‘Km’ and ‘Hexbin Side Length’ set to 200, the top of the range of SUM(Num Volunteers (bin)) calculates to 2,784. When I change the ‘Hexbin Side Length’ to something smaller, say 150 Km, then the top of the range of SUM(Num Volunteers (bin)) actually increases to 3,012. My first thought was that a smaller Hexbin Side Length would translate to bins with fewer volunteers. I hope my question makes sense??? It just seems like if the bin represents a smaller area of geography, then you should have fewer volunteers for all bins. (unless the hexbin centroids shift and you end up with a hexbin that is positioned to capture more volunteers).

2. When I try to make a dual axis map – using a hexbin for one of the maps and geography pie charts on another map, when I synch the axes, the viz is smaller. It’s as if the scale is changed. Have you ever tried to synch axes for a map with hexbins?

Thanks again for a great post.

• Hi Stuart,

In response to your first question, it’s just that as you change the hexbin size, the boundaries are moving across a high density region (Sydney) and this means that some volunteers are shifting between bins. If you watch – you will go from having two adjacent bins with moderate user counts @ 200Km scale to one bin with a high count surrounded by bins of low count. Seems odd at first glance but it makes sense.

For your second question, I haven’t messed around with it too much. I’m happy to have a look at your workbook if that would help… you can email me directly at aeldridge at tableau dot com.

Cheers,
Alan

3. Pingback: Custom Map – Data & Web

4. Ross says:

Hi Stuart,
Thanks…this is all new to me so forgive me if these questions appear stupid.

I’ve been trying to create my based on the “Binned in Projection Space” dashboard, but I’m getting confused as you’ve linked 2 copies of the same data to generate the bins. Is this duplication really necessary?

I understand that the PointID (Bin) needs 6 segments to make the polygon, but I’m unsure how this is created from the source. The only other datasets have seen using polygons have a Lat/long for each PointId, hence the join-the-dots produces the polygon.

Is there any way to simplify this by using only a single dataset but still having the Hex side length & scale?

• Hi Ross,

There are two connections in the workbook but only one is used at a time. Each one is used for a different example/scenario.

This workbook uses a data densification technique to manufacture the polygon vertices – because the PointID (bin) is a Tableau bin field it gets populated with the values between 1 and 6. We can then use math to calculate the 6 vertex points around the centerpoint of each hex.

So to your final question, the example does only use one dataset, and the hex size is scalable (controlled by a parameter).

Cheers,
Alan

• Ross says:

Thanks Alan,

So I duplicated the data by outer joining the data with a point column with values 1 & 6 so I could create the bins.
Traps I found:
1. Needed to muck around with the Number format for the Lat / Lon, especially to generate the proper Unique Hex Id
2. virtual point ID’s were hidden. All I needed to do was to place the point Id (Bin) onto the Rows & select “Show Missing Values”.

Would be great if there was a way where I didn’t need to double the size of the data as the extracts are getting quite big, otherwise this is really good.

Regards
Ross

5. Hi Ross,

There are techniques that don’t require you to duplicate the data, but they only work if you know that there will be more than 1 unique value per hexbin. They don’t work for scenarios where you only have one record in a hexbin.

It’s outlined here: https://blog.databender.net/2016/05/10/you-can-never-be-too-hexy/

Cheers,
Alan

6. aseemrehman says:

Hi,
Thank you for super work shared for Hexbin with tableau,

I have a query where I have 2 different lat/long one is customer and other is my locations. I have already created hexbins for customers and its working great. I want to create another layer over customer polygons with my location similar as you did in (Hexbins with Map Pin) sheet.

I tried to do the same but its not appearing onto a single sheet as two layers, can you help with that.

Regards

• Right now in Tableau it is very hard to plot multiple layers. Getting different marks (the polygon and the pin) requires you to have a dual-axis map approach and to selectively show/hide the relevant points on each axis. This is a workaround because today you cannot have layer-specific filters. I believe multiple layers is something Tableau are working on for a future release.

Cheers,
Alan

7. CM says:

Hi can you explain point ID, as I don’t have point ID in my dataset but do have a distinct ID?

• Hi CM. The PointID is a synthetic field that is needed as the field on the Path shelf. This tells Tableau which order to plot the points of the polygon. It’s also used in the calculations to generate the lat/lon or X/Y of the interpolated values. It is not normally in your source data… in the examples, it is generated using data densification (you start with a min and max value and the “show missing values” will generate the values in between).

Hope this helps.