D3.js

Datawrapper’s policy on bad graphs

Datawrapper is a tool that lets you turn a dataset into a decent-looking chart within minutes. In an interview, co-founder Mirko Lorenz said Datawrapper is designed to prevent people making misleading graphs:

With Datawrapper, we try to make it as hard as possible to take data and create misleading charts with it. For example, it’s not possible to create bar charts with cropped axes. From time to time, users ask us to add this feature, but we never have and we never will. (via)

This may sound a bit paternalistic but it makes sense: Datawrapper’s philosophy is to offer a simple, robust way to quickly create a chart. If you don’t like the limitations, learn to code D3.js.

But Lorenz’ remark made me curious: would there be more design options, besides bar charts with cropped axes, that Datawrapper deems unacceptable? And are they limited to chart designs that are outright misleading, or do they more generally ban designs that result in ineffective or inaccurate data communication? Here’s an exploration of Datawrapper’s bad graph policy.

Y-axis not starting at zero
Datawrapper disapproves of y-axes that don’t start at zero in bar and column charts, but it allows them in line charts. I think this is consistent with the consensus on the topic.[1]

Spaghetti chart

I’m using the term spaghetti chart in the non-technical sense, meaning a chart with many lines that create an indecipherable mess.[2] Datawrapper doesn’t ban spaghetti charts.

Pie chart
Long the chart type we all loved to hate, the pie chart has recently been sort of rehabilitated. I think many people would now agree that pie charts are a legitimate way to represent proportions. That said, 3D and exploding pie charts are still suspect. Datawrapper allows pie (and donut) charts, but doesn’t seem to allow 3D or exploding pie charts.

3D
Using perspective to create a 3D effect will make it difficult to compare the sizes of elements in a chart. Fortunately Datawrapper doesn’t seem to allow any type of 3D chart.

Stacked bar chart
The rehabilitation of the pie chart coincided with a renewed critique of stacked bar charts: «basic bar charts are clearly better than pie charts, but stack them and they’re worse!». Which, by the way, doesn’t mean that it’s always wrong to use stacked bar charts.[3] Datawrapper allows them.

Dual y-axes
Some charts have have a secondary y-axis, so different scales can be used in one chart (here’s an awkward example, source). There may be situations where this is defensible, but in general it shouldn’t be considered good practice. Datawrapper doesn’t seem to allow this.

Pictograms instead of bars
Some designers try to jazz up bar charts using pictograms instead of bars, forgetting to take into account that if you double the height of the pictogram, its area increases fourfold. The distortion is even worse when the pictograms are drawn to appear three-dimensional. Datawrapper doesn’t seem to allow replacing bars with pictograms.


  1. The most well-known example of y-axes not starting at zero are cropped or truncated axes which start at a value higher than zero, but there are also examples of axes starting at a negative value. Edward Tufte points this out in The Visual Display of Quantitative Information, using a chart from an annual report as an illustration: «A careful look at the middle panel reveals a negative income in 1970, which is diguised by having the bars begin at the bottom at approximately minus $4.200.000».  ↩

  2. You can make a spaghetti chart interactive, for example let users click a label and the corresponding line will be highlighted. But this may still be an awkward solution, especially on mobile.  ↩

  3. «They can be useful when the point is to show that a value is the sum of other values, but you’re only interested in comparing the totals. They also work if you only need to show one section and can make that the one on the bottom. Then the bars are comparable and work well. But just throwing values into a stacked bar chart is a bad idea», Robert Kosara argued. Here’s how Dutch minister Jeroen Dijsselbloem messed up.  ↩

Is it still ok to ridicule pie charts

Workers without job security as a percentage of all working people in the Netherlands. The pink slice shows the percentage in 2003; the red slice how much this has increased since. Data Statistics Netherlands, chart dirkmjk.nl. Relaunch animation.

In a series of articles that caused a bit of a commotion among chart geeks, Robert Kosara summarised the findings of a number of studies on pie charts. In one of the articles, he observes:

Pie charts are generally looked down on in visualization, and many people pride themselves on saying mean things about them and the people who use them.

I guess I’m one of those people who look down on pie charts. Sure, I’m not as outspoken as the respected Edward Tufte, who famously wrote that «the only worse design than a pie chart is several of them». I’m not always against pie charts and I’ve even experimented with animated pie charts to illustrate change in a proportion. But I’m not above making lame jokes about pie charts either. My rule of thumb would be: don’t use pie charts - unless you can come up with a good reason why you should use one in a particular situation.

Kosara describes a number of studies in which he measured how accurately people interpret pie charts and other charts showing a proportion, e.g. 27%. According to his findings, exploded pie charts are doing worse than regular pie charts (phew!) and square pie charts are doing better. Interestingly, a stacked bar chart appears to be doing worse than a regular pie chart (note that a stacked bar chart depicting a single proportion amounts to something that looks like a progress bar).

It’ll be interesting to see how this holds up in future studies. But for now, the finding that (stacked) bar charts are doing worse than pie charts may come as a bit of a shock, for there appears to be a sort of consensus that bar charts are generally better than pie charts. Question is, better at what?

Workers without job security as a percentage of all working people in the Netherlands. Data Statistics Netherlands, chart dirkmjk.nl.

A bar chart is quite good at showing that the level of workers without job security in the Netherlands was higher in 2015 than in 2014. But which chart type is better at showing how much the share has increased between 2003 and 2015? Until recently I would have said «the bar chart» without hesitation, but now I’m not so sure anymore.

That said - I think it’s still ok to ridicule 3D exploded pie charts.

Robert Kosara summarises his findings here and here. The recent studies were done in collaboration with Drew Skau; an older study in collaboration with Caroline Ziemkiewicz. The Tufte quote is from his book The Visual Display of Quantitative Information. The charts above show workers with permanent jobs and a fixed number of hours per week, as a percentage of all working people in the Netherlands (not just employees), source CBS.

Embedding D3.js charts in a responsive website

UPDATE - better approach here.

For a number of reasons, I like to use D3.js for my charts. However, I’ve been struggling for a while to get them to behave properly on my blog which has a responsive theme. I’ve tried quite a few solutions from Stack Overflow and elsewhere but none seemed to work.

I want to embed the chart using an iframe. The width of the iframe should adapt to the column width and the height to the width of the iframe, maintaining the aspect ratio of the chart. The chart itself should fill up the iframe. Preferably, when people rotate their phone, the size of the iframe and its contents should update without the need to reload the entire page.

Styling the iframe

Smashing Magazine has described a solution for embedding videos. You enclose the iframe in a div and use css to add a padding of, say, 40% to that div (the percentage depending on the aspect ratio you want). You can then set both width and height of the iframe itself to 100%. Here’s an adapted version of the code:

<style>
.container_chart_1 {
    position: relative;
    padding-bottom: 40%;
    height: 0;
    overflow: hidden;
}
 
.container_chart_1 iframe {
    position: absolute;
    top:0;
    left: 0;
    width: 100%;
    height: 100%;
}
</style>
 
<div class ='container_chart_1'>
<iframe src='https://dirkmjk.nl/2016/embed_d3/chart_1.html' frameborder='0' scrolling = 'no' id = 'iframe_chart_1'>
</iframe>
</div>

Making the chart adapt to the iframe size

The next question is how to make the D3 chart adapt to the dimensions of the iframe. Here’s what I thought might work but didn’t: in the chart, obtain the dimensions of the iframe using window.innerWidth and window.innerHeight (minus 16px - something to do with scrollbars apparently?) and use those to define the size of your chart.

Using innerWidth and innerHeight seemed to work - until I tested it on my iPhone. Upon loading a page it starts out OK, but then the update function increases the size of the chart until only a small detail is visible in the iframe (rotate your phone to replicate this). Apparently, iOS returns not the dimensions of the iframe but something else when innerWidth and innerHeight are used. I didn’t have that problem when I tested on an Android phone.

Adapt to the iframe size: Alternative solution

Here’s an alternative approach for making the D3 chart adapt to the dimensions of the iframe. Set width to the width of the div that the chart is appended to (or to the width of the body) and set height to width * aspect ratio. Here’s the relevant code:

var aspect_ratio = 0.4;
var frame_width = $('#chart_2').width();
var frame_height = aspect_ratio * frame_width;

The disadvantage of this approach is that you’ll have to set the aspect ratio in two places: both in the css for the div containing the iframe and in the html-page that is loaded in the iframe. So if you decide to change the aspect ratio, you’ll have to change it in both places. Other than that, it appears to work.

Reloading the chart upon window resize

Then write a function that reloads the iframe content upon window resize, so as to adapt the size of the chart when people rotate their phone. Note that on mobile devices, scrolling may trigger the window resize. You don’t want to reload the contents of the iframe each time someone scrolls the page. To prevent this, you may add a check whether the window width has changed (a trick I picked up here). Also note that with Drupal, you need to use jQuery instead of $.

width = jQuery(window).width;
jQuery(window).resize(function(){
    if(jQuery(window).width() != width){
        document.getElementById('iframe_chart_1').src = document.getElementById('iframe_chart_1').src;
        width = jQuery(window).width;
    }
});

In case you know a better way - do let me know!

FYI, here’s the chart used as illustration in its original context.

My entry for the Best Worst Viz competition

Number of tweets with hashtag #BestWorstViz, per date of the month April 2016 and time of the day. Times are UTC, 18 April is the deadline. Data updates every hour; clear browser history to refresh. Entry for Best Worst Viz competition, created by dirkmjk.

I love to hate bad graphs (who doesn’t), and I think Andy Kirk’s idea to organise a Best Worst Viz competition is quite brilliant. As he explains, there’s something fair about creating your own bad graph rather than criticising somebody else’s:

[..] picking on bad visualisation involves work by other people who we might never meet or have a chance to learn about what the true circumstances and intent of a project were. The essence of this challenge is based on your best worst visualisation - the best worst visualisation you can possibly make.

I had to give it a try. But how? An exploding 3D pie chart, truncated y-axis, out-of-control spaghetti chart - it all seemed a bit too obvious. I aimed for something different, drawing inspiration from the blink element of the early days of web design. The shifting colours of the stacked bar chart pointlessly illustrate the direction of time - or whatever. I think it’s pretty bad.

Standalone version of graph here.

Power and buzz: Analysing trade union HQ locations by closeness to power and by convenience store score

When Hans Spekman ran for chairman of the Dutch Social-Democrat party in 2011, he said he wanted to move the party’s headquarters from the posh office at the Herengracht in Amsterdam to a «normal district, a neighbourhood where things happen, like Bos en Lommer». Bos en Lommmer is a multicultural neighbourhood in the west of the city, in transition from deprived to gentrified.

I agree with Spekman (at least on this matter) and I think his ideas about locations should also apply to trade union headquarters. Out of curiosity I decided to analyse the headquarters locations of European trade unions, using two criteria. First: closeness to power, operationalised pragmatically as the walking distance from the union office to the national parliament. And second: the liveliness of the neighbourhood. For measuring this I propose the convenience store score, which assumes that the number of convenience stores within half a kilometer gives a rough indication of how lively a neighbourhood is. Convenience stores could be for example 7-Eleven or AH to go stores and some ethnic shops will also be classified as convenience stores.

The chart below shows the scores for each union. You can also see the locations of union offices, parliaments and convenience stores on an interactive map, but note that the map may take a while to load - it’s not very suitable for viewing on a smartphone.

The median union headquarters is within 2km walking distance from parliament. For about three-quarters of unions, the distance is below 5km. The general pattern thus seems to be that unions have their national offices close to the institutions of political power. There are exceptions though. Officials of the major Dutch federations FNV and CNV would have to walk 15 to 68km to reach parliament. And sometimes the distance is even longer: a Basque union has its HQ in Bilbao; a Turkish union in Istanbul and Polish union Solidarnosz has its HQ near the port of Gdansk, where it originated. But all in all, the large Dutch unions are quite exceptional in that they don’t have their headquarters near the centre of political power.

As for liveliness: the median number of convenience stores within half a kilometer from union headquarters is 2, but about one in three unions have no convenience stores nearby at all. Some of the most lively union office locations are in countries like Romania, Hungary and Bulgaria. Other examples are CFDT (France), TUC (UK), SAK (Finland) and UGT (Spain). Dutch unions are at the other end of the spectrum and have rather dull headquarters locations - judging by the convenience store score.

So where should a union be? I’d say that influencing the government is one of the tasks unions should be doing, and an important one at that. However, this doesn’t depend on having a headquarters close to parliament, but rather on the ability to mobilise workers. I’d argue that the convenience store score is a far better criterium to judge headquarters locations by.

In case you were wondering: Spekman was successful in his bid for the chairmanship of the Social-Democrat party. The party’s headquarters is still at the Herengracht, though: it turned out the lease doesn’t expire until 2018.

Full disclosure: I work at the FNV, at the former FNV Bondgenoten location.

Method

This analysis turned out to be quite a bit more challenging than I initially thought, but it was very instructive. I’m especially happy that I now have a basic understanding of the Overpass API that you can use to retrieve Open Street Map data. OSM has always been a bit of a black box to me but the Overpass API turns out to be a valuable tool.

Measuring neighbourhood characteristics

Initially I wanted to use Eurostat regional stats to analyse neighbourhood characteristics, but Eurostat doesn’t have data beyond the NUTS 3 level (I should’ve known). Level 3 areas may comprise entire cities and are useless for analysing neighbourhoods, so I had to look for alternatives.

Subsequently, I tried getting the name of the smallest area a location is in using the Mapit tool (based on Open Street Map). I thought I might then be able to construct a Wikipedia url by adding the name to https://en.wikipedia.org/wiki/. This turned out to work pretty well, not least because Wikipedia is quite good at handling different variants of geographical names. However, while Wikepedia articles tend to be informative, they do not contain a lot of uniform statistical information. Often population, area and population density will be included, but not much beyond that. In addition, the fact that the size of the areas varies poses problems. For example, the population density of a small area cannot be meaningfully compared to the density of a large area. In the end I did add the Wikipedia links to the popups on the map, but I continued looking for other ways to analyse neighbourhood characteristics.

One of the measures I ended up using is closeness to power, operationalised as the walking distance to the national parliament (in countries with a bicameral parliament, I used the location of the lower house). This was a pragmatic choice. An alternative would have been to use the location of ministries, but then I’d have to come up with a way to pick the relevant ministry.

For measuring the liveliness of a neighbourhood, I used the number of convenience stores within half a kilometer, using data from Open Street Map. Obviously there are some limitations to this method. For example, some countries will be mapped in more detail than others. Also, there will be inconsistencies in how shops are classified (cf this discussion in Dutch about how to classify stores of chains like Blokker).

Obviously, the convenience store score has not been properly validated. I’m not even sure whether objective measures of a neighbourhood’s liveliness exist. I checked this list of «coolest» neighbourhoods in Europe and all but one (Amsterdam Noord) have convenience stores nearby, but then again coolness isn’t the same as liveliness (I guess a neighbourhood can be uncool yet lively). Furthermore, being on a list of cool neighbourhoods isn’t necessarily an indicator of coolness.

Ideally I think a proper assessment of the convenience store score should include a comparison with measurements of criteria derived from Jane Jacob’s The death and life of great American cities: mixed primary uses, short blocks, buildings of various ages and density. I guess it should be possible to measure some of these with OSM data (especially the first two). However, that would require a deeper understanding of OSM classifications than I currently have.

Getting the data

While some of the data was obtained by good old-fashioned googling, some of it could be automated.

The starting point for the analysis was the list of affiliates of the European Trade Union Confederation (ETUC). Note that this includes unions in non-EU countries such as Turkey. Also note that I use the word union but most are in fact union federations (the FNV is a bit more complicated; a recent merger has partly done away with the federation structure).

The ETUC doesn’t seem to have a list of addresses on their website. They do provide urls for most of their affiliates. Still, looking up addresses was a bit of an adventure, especially for countries which use non-Latin alphabets (let me know if you find any errors).

For walking distances I used the Bing API. In a number of cases Bing couldn’t find a walking route or the distance seemed wrong. In those cases I manually looked up the distance in Google Maps. Here’s a sample url for getting information from the Bing API (replace KEY with API key).

I used the Overpass API (demo) of Open Street Map to get all nodes within 500m from the union HQs, which I used for counting the number of convenience stores. I also used the API for getting the coordinates of all convenience stores in all countries where the ETUC has affiliates. Here’s a sample url for getting all nodes within 500m of a location, and here for getting all convenience stores in a country.

A few unions are missing in the final results because of missing data. For example, I couldn’t figure out what the main office of the Belgian ACV is and I couldn’t find the exact location of the parliament of Malta (somewhere along Republic Street, Valletta).

Calculating scores

I calculated scores as either walking distance to parliament in kilometers or the number of nearby convenience stores. In both cases I took the log10 of the value + 1. To arrive at a 0 to 10 scale, I multiplied by 10 and divided by the maximum score for each variable. For the distance to power measure I converted the score to 10 minus the score, so that a higher score means closer to power.

Mapping

I used Leaflet and D3.js to map the locations of HQs, parliaments and convenience stores. There are over 60,000 convenience stores in the dataset. This turned out to be a bit too much and the browser all but crashed. I found this script that deals with exactly this problem. While I managed to figure out what I needed to change to make the script work with my data, I’m afraid I don’t fully understand how it works. It’s still too slow for mobile, though.

Pages