Better Data Visualization Using Beeswarm Chart

栏目: IT技术 · 发布时间: 4年前

内容简介：A single dataset can be used to convey a lot of different information to the viewer. It all depends on how you visualize the data. In other words — it depends on which kind of chart or plot you choose. Most of the time people just grab a bar chart or pie c

Let’s create an interactive Beeswarm chart using D3.js to better visualize your data.

Martin Heinz

Jul 10 ·9min read

A single dataset can be used to convey a lot of different information to the viewer. It all depends on how you visualize the data. In other words — it depends on which kind of chart or plot you choose. Most of the time people just grab a bar chart or pie chart. There are, however, more interesting charts or plots you can use to communicate information from your data to your audience — one of them being Beeswarm Chart .

Note: All the source code (including documentation) from this article can found at https://github.com/MartinHeinz/charts .

Better Data Visualization Using Beeswarm Chart — *Live Demo is Available at* *https://martinheinz.github.io/charts/beeswarm/*

Bee-what?

First time hearing about beeswarm chart? Alright, let’s first talk about what it actually is:

Beeswarm chart is a one-dimensional chart (or plot) — or in other words — a chart that shows all the information on a single-axis (usually X axis). It displays values as a collection of points similar to scatter plot.

This kind of chart is very useful when you want to display a lot of data points at once — e.g. one node for each country — which would be a problem with a bar chart or pie chart. Just imagine a pie chart with 150 wedges — no thanks.

Additionally, it makes it easy to spot outliers as they will not be part of the swarm .

Another feature of this chart is that you can nicely visualize different scales (linear and logarithmic) and transition between them as well as color the points to add additional dimension (e.g. continent on country).

Enough talking though, let’s see an example:

What is this dataset we are going to be using here, actually? Well, it’s WHO Suicide Statistics Data which can be found on kaggle.com . Odd choice maybe, eh? Well, it’s real data that fits this type of chart quite well. So, let’s see how well we can use it!

What We Will Need

Before diving into the code, let’s look at the libraries that we will use:

For all the plotting and visualization we will use D3.js and plain old Javascript. In case you are not familiar with D3.js — it stands for Data Driven Documents and is Javascript library for manipulating data. The main advantage of D3.js is its flexibility — all it gives you are functions to manipulate data efficiently.

In this article, we will use D3.js version 5 and all you need to start using is to include <script src="https://d3js.org/d3.v5.min.js"> in your HTML (Complete code listing here ).

Apart from D3.js we will also use Material Design Lite (MDL) to bring a nicer user experience. This is very much optional, but everybody likes some fancy material design buttons and dropdowns, right?

Similarly to D3.js , we just need to include one script tag to start using it - <script defer src="https://code.getmdl.io/1.3.0/material.min.js"></script> (Complete code listing here ).

The Code

Setting The Stage

Before we start manipulating any data, we first need to do some initial setup:

First, we define some global variables for width , height and margin as well as 3 data structures for scale, measure of data and plot legend, which we will use throughout the rest of the code. We also use those to define the initial state of the chart, which is stored in chartState variable.

Next thing we define, are colors for all the nodes (circles) of the plot:

To create a coloring scheme we use d3.scaleOrdinal which creates mapping from a domain (continent names) to range (color codes). Then we apply these colors to CSS IDs, which are given to checkboxes in the HTML GUI.

Now we are getting into code for the actual chart. Following lines will prepare the SVG which will be our drawing area:

The first call above that creates the svg variable finds the <div> with svganchor ID and appends SVG element to it with width and height which we defined earlier. Next, we create a function called xScale - this function is very similar d3.scaleOrdinal used earlier. It also creates mapping between domain and range but with continuous domain rather than a discrete one. You probably already noticed, but we didn't specify domain here - that's because we don't know the extent of our dataset yet so we left it to its default ( [0, 1] ) for the time being.

After that, we append <g> element container to the existing SVG element. This element will be used as a container for the X axis and its ticks - those will be appended later when we actually render the line. We can, however, set its CSS styles and move it to the bottom of the SVG now, so that we don't have to deal with it later.

The final part of this snippet creates a line that connects node and point on the X axis while hovering over said circle. You can see that on the image below:

The last thing that we want to do before we jump into manipulating the dataset is to create simple noes tooltip:

For the time being the tooltip is just a <div> that we put into anchor of our chart. We also make it invisible for now as we will dynamically set its content and opacity when we deal with mouse move events (hovering).

Loading The Data

Now is finally time to load the data. We do that using d3.csv function. This function uses fetch API to get CSV file from URL and returns Promise , which requires following structure of code:

All our remaining code belongs in the body of the above anonymous function, as that is where the loaded data is available to us.

Here are also examples of the data before and after it is loaded to better visualize its structure:

Before:

After:

Listeners

Before processing the data any further, let’s first set up listeners that will react to button clicks in the GUI. We want to make it possible for the user to be able to switch between visualization with “total” or “per capita” measurement as well as with linear or logarithmic scale.

Our HTML GUI (source can be found here ) contains 2 sets of buttons. First of those sets — responsible for switching between “total” and “per capita” visualization has CSS class .measure attached. We use this class to query this group of buttons, as you can see above. When the click on one of these 2 buttons occurs, we take the value of clicked button and change chart state accordingly as well as legend text, which shows the type of measure used.

The second set (pair) of buttons which switches between linear and logarithmic scale, also has CSS class attached (called .scale ) and similar to previous one - updates the state of chart based on which button is clicked.

Both of these listeners also trigger the redrawing of the whole chart to reflect the configuration change. This is performed using the redraw function, which we will go over in the next section.

Apart from those 4 buttons, we also have a few checkboxes in the GUI. Clicking on those filters which continents’ countries are displayed.

Handling these checkbox clicks is responsibility of listener above. All it does is trigger filter function, which adds/removes nodes from selection based on which checkboxes are checked and which are not.

Last event listener we have here is a big one. It takes care of showing and hiding the tooltips when hovering over country circles:

The code above might look complicated, but it’s actually pretty straightforward. We first select all the nodes using .countries CSS class. We then bind the mousemove event to all of these nodes. During the event we set HTML of tooltip to show information about this node (country name, death count). Also, we change its opacity so that it's visible while user points at the circle and we set its position to be on the right of mouse cursor.

The rest of the body of this function renders dashed line connecting the circle and X axis to highlight where the value belongs on the scale.

We also need to handle events for when we move the mouse out of the circles, otherwise, the tooltip and line would be always visible, which is what the mouseout event handler takes care of - it sets opacity of these elements to 0 , to make them invisible.

These event listeners are nice and all, but we need to actually process and draw the data to make any use of them. So, let’s do just that!

Drawing It All

Majority of the data processing is done in one function called redraw , which we invoke when the page is loaded for the first time and during various events, which we saw in the previous section.

This function uses chartState to decide how it should draw the chart. At the beginning, it sets type of scale to linear or logarithmic based on chartState.scale and decides the extent of the chart domain by finding min/max value in dataset's total or perCapita column based on the value of chartState.measure :

Another thing we need to create based on chartState is X axis. Considering the orientation of the chart, we will use bottom axis ( axisBottom ) and give it 10 ticks. If we are visualizing total numbers we will go with format that uses decimal notation with an SI prefix ( s ) with 1 significant digit ( .1 ). Otherwise, it will be fixed-point notation ( f ), also with one significant digit.

When the axis and scale are prepared, we execute a transition that takes 1 second. During this 1 second the bottom axis is generated by .call(xAxis) by executing the axisBottom generator.

What follows, is the simulation for moving the nodes along the X and Y axis to their desired position:

This is one of the more complicated snippets in this article, so let’s go over it line by line. On the first line, we create simulation with the specified dataset. To this simulation, we apply positioning force to push nodes towards desired position along X axis. This desired position is returned by the xScale function which calculates it by mapping "total" or "perCapita" column to physical size (range) of chart. After that, we increase velocity of the simulation using strength function.

The same way we applied force along X axis, we also need to apply force along Y axis, this time pushing nodes towards middle line of chart. Last force we apply is collision force, which keeps the nodes from colliding — more specifically — it keeps their centers 9 pixels apart. Finally, we call stop function to stop the simulation from running automatically and instead execute it in for loop on the lines below it.

We created and ran the simulation, but against what? Well, the nodes (circles) created by following code:

Here, we begin by querying all the nodes and joining country names from the dataset to them. Next 2 calls to the exit and enter selections respectively deal with situation when nodes are removed and added to selection (e.g. when checkboxes are ticked/unticked or when page is loaded). First, for the exit selection, we create transition that takes 1 second and set center point on X axis to zero and center point on Y axis to middle of the chart. This way, when these nodes are added back into chart, they will come out from single point, like you can see when clicking checkboxes in demo. After transition finishes, then nodes are removed.

The remainder of the snippet — the enter selection — is what actually sets all the attributes of the nodes. We set its CSS class, it’s X and Y axis center points, its radius and fill it with color based on the continent it belongs to. Then we merge this selection into rest of the nodes (circles) and create transition that moves them to desired X and Y coordinate over next 2 seconds.

Conclusion

In this article, we went deep into implementing a beeswarm chart with D3.js . The takeaway from this article though shouldn’t be this specific implementation, but the fact that you might want to consider non-traditional types of charts and plots next time you are visualizing your data, as it might help you better communicate desired information to your audience.

If you want to check out complete code listing from this article, please visit my repository here: https://github.com/MartinHeinz/charts . In this repo you can also find used datasets and sources, as well as other charts and plots implemented with D3.js , like this parallel coordinate chart (next article :wink:):

以上所述就是小编给大家介绍的《Better Data Visualization Using Beeswarm Chart》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

Better Data Visualization Using Beeswarm Chart

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

高扩展性网站的50条原则

[美] Martin L. Abbott、[美]Michael T. Fisher / 张欣、杨海玲 / 人民邮电出版社 / 2012-6-3 / 35.00元

《高扩展性网站的50条原则》给出了设计高扩展网站的50条原则，如不要过度设计、设计时就考虑扩展性、把方案简化3倍以上、减少DNS查找、尽可能减少对象等，每个原则都与不同的主题绑定在一起。大部分原则是面向技术的，只有少量原则解决的是与关键习惯和方法有关的问题，当然，每个原则都对构建可扩展的产品至关重要。主要内容包括：通过克隆、复制、分离功能和拆分数据集提高网站扩展性；采用横向......一起来看看《高扩展性网站的50条原则》这本书的介绍吧!

码农工具