champagne anarchist | armchair activist

Bad Graphs

Dutch government drops 3D pie charts

The Dutch government has replaced pie charts with bar charts in it’s annual reports, someone noted on Twitter (via @bokami). Pie charts aren’t always a bad choice - contrary to the view of some adherents of the stricter school in data visualisation. But 3D pie charts are really hard to justify and it’s a bit awkward they were still used one year ago.

The charts are from the 2016 and 2017 reports of the Department of Social Affairs and Employment.

bullshit #dataviz

Donald Trump won because Hillary Clinton failed to get the vote out. At least, that’s the story this heavily retweeted chart seems to tell (click the chart for a larger version). But according to data visualisation expert Alberto Cairo it’s an example of the kind of bullshit #dataviz we need to fight against. In fact, many people have criticised the chart, for a number of reasons, including:

  • The y-axis, obviously.[1] The chart suggests Clinton got about half as many votes in 2016 as Obama in 2012, which of course isn’t true. Some have argued that truncating the y-axis is justifiable in this case because otherwise small differences wouldn’t show. However, with a y-axis starting at zero, you can still see what’s going on.
  • Why is it showing only the latest three presidential elections? Add data for elections before 2008, and the picture becomes quite different.
  • Not all votes have been counted yet. At some point, Nate Cohn of the NYT has predicted that Trump will get 61.2 million votes and Clinton 63.4 million, when all votes are counted. That would also change the picture considerably.

So who created this bullshit #dataviz and why? The earliest version I could find is by Economics Professor D Yanagizawa-Drott.[2] My guess is that he created the chart as a quick-and-dirty attempt to understand what happened on 8 November, never expecting it to go viral, and that he never gave much thought to its execution.[3] While the chart design is problematic, the idea behind it - explore how turnout affected the outcome of the election - makes sense.

Meanwhile, the post-election dataviz deluge highlighted another problem. People post charts without indicating the source of the data they used. To make matters worse, other people will simply copy and post that chart without saying who they got it from. There should be a rule that if you post a chart, you should indicate the data source and who created the chart - or at least where you found it.

  1. Jonathan Webber, who was among the people to make the chart popular, has a bio that says Trolling y-axis mavens since 2016 (I assume he added this line in response to criticism of the chart).  ↩

  2. I wonder whether it’s possible to systematically search for images on Twitter?  ↩

  3. He introduced the chart as «A quick look at turnout data». When someone said the y-axis should start at zero, he responded: «True. Also contact Microsoft Excel, let them know the default y-axis is simply unacceptable; lazy people like me need nudging.»  ↩

Datawrapper’s policy on bad graphs

Datawrapper is a tool that lets you turn a dataset into a decent-looking chart within minutes. In an interview, co-founder Mirko Lorenz said Datawrapper is designed to prevent people making misleading graphs:

With Datawrapper, we try to make it as hard as possible to take data and create misleading charts with it. For example, it’s not possible to create bar charts with cropped axes. From time to time, users ask us to add this feature, but we never have and we never will. (via)

This may sound a bit paternalistic but it makes sense: Datawrapper’s philosophy is to offer a simple, robust way to quickly create a chart. If you don’t like the limitations, learn to code D3.js.

But Lorenz’ remark made me curious: would there be more design options, besides bar charts with cropped axes, that Datawrapper deems unacceptable? And are they limited to chart designs that are outright misleading, or do they more generally ban designs that result in ineffective or inaccurate data communication? Here’s an exploration of Datawrapper’s bad graph policy.

Y-axis not starting at zero
Datawrapper disapproves of y-axes that don’t start at zero in bar and column charts, but it allows them in line charts. I think this is consistent with the consensus on the topic.[1]

Spaghetti chart

I’m using the term spaghetti chart in the non-technical sense, meaning a chart with many lines that create an indecipherable mess.[2] Datawrapper doesn’t ban spaghetti charts.

Pie chart
Long the chart type we all loved to hate, the pie chart has recently been sort of rehabilitated. I think many people would now agree that pie charts are a legitimate way to represent proportions. That said, 3D and exploding pie charts are still suspect. Datawrapper allows pie (and donut) charts, but doesn’t seem to allow 3D or exploding pie charts.

Using perspective to create a 3D effect will make it difficult to compare the sizes of elements in a chart. Fortunately Datawrapper doesn’t seem to allow any type of 3D chart.

Stacked bar chart
The rehabilitation of the pie chart coincided with a renewed critique of stacked bar charts: «basic bar charts are clearly better than pie charts, but stack them and they’re worse!». Which, by the way, doesn’t mean that it’s always wrong to use stacked bar charts.[3] Datawrapper allows them.

Dual y-axes
Some charts have have a secondary y-axis, so different scales can be used in one chart (here’s an awkward example, source). There may be situations where this is defensible, but in general it shouldn’t be considered good practice. Datawrapper doesn’t seem to allow this.

Pictograms instead of bars
Some designers try to jazz up bar charts using pictograms instead of bars, forgetting to take into account that if you double the height of the pictogram, its area increases fourfold. The distortion is even worse when the pictograms are drawn to appear three-dimensional. Datawrapper doesn’t seem to allow replacing bars with pictograms.

  1. The most well-known example of y-axes not starting at zero are cropped or truncated axes which start at a value higher than zero, but there are also examples of axes starting at a negative value. Edward Tufte points this out in The Visual Display of Quantitative Information, using a chart from an annual report as an illustration: «A careful look at the middle panel reveals a negative income in 1970, which is diguised by having the bars begin at the bottom at approximately minus $4.200.000».  ↩

  2. You can make a spaghetti chart interactive, for example let users click a label and the corresponding line will be highlighted. But this may still be an awkward solution, especially on mobile.  ↩

  3. «They can be useful when the point is to show that a value is the sum of other values, but you’re only interested in comparing the totals. They also work if you only need to show one section and can make that the one on the bottom. Then the bars are comparable and work well. But just throwing values into a stacked bar chart is a bad idea», Robert Kosara argued. Here’s how Dutch minister Jeroen Dijsselbloem messed up.  ↩

Is it still ok to ridicule pie charts

Workers without job security as a percentage of all working people in the Netherlands. The pink slice shows the percentage in 2003; the red slice how much this has increased since. Data Statistics Netherlands, chart Relaunch animation.

In a series of articles that caused a bit of a commotion among chart geeks, Robert Kosara summarised the findings of a number of studies on pie charts. In one of the articles, he observes:

Pie charts are generally looked down on in visualization, and many people pride themselves on saying mean things about them and the people who use them.

I guess I’m one of those people who look down on pie charts. Sure, I’m not as outspoken as the respected Edward Tufte, who famously wrote that «the only worse design than a pie chart is several of them». I’m not always against pie charts and I’ve even experimented with animated pie charts to illustrate change in a proportion. But I’m not above making lame jokes about pie charts either. My rule of thumb would be: don’t use pie charts - unless you can come up with a good reason why you should use one in a particular situation.

Kosara describes a number of studies in which he measured how accurately people interpret pie charts and other charts showing a proportion, e.g. 27%. According to his findings, exploded pie charts are doing worse than regular pie charts (phew!) and square pie charts are doing better. Interestingly, a stacked bar chart appears to be doing worse than a regular pie chart (note that a stacked bar chart depicting a single proportion amounts to something that looks like a progress bar).

It’ll be interesting to see how this holds up in future studies. But for now, the finding that (stacked) bar charts are doing worse than pie charts may come as a bit of a shock, for there appears to be a sort of consensus that bar charts are generally better than pie charts. Question is, better at what?

Workers without job security as a percentage of all working people in the Netherlands. Data Statistics Netherlands, chart

A bar chart is quite good at showing that the level of workers without job security in the Netherlands was higher in 2015 than in 2014. But which chart type is better at showing how much the share has increased between 2003 and 2015? Until recently I would have said «the bar chart» without hesitation, but now I’m not so sure anymore.

That said - I think it’s still ok to ridicule 3D exploded pie charts.

Robert Kosara summarises his findings here and here. The recent studies were done in collaboration with Drew Skau; an older study in collaboration with Caroline Ziemkiewicz. The Tufte quote is from his book The Visual Display of Quantitative Information. The charts above show workers with permanent jobs and a fixed number of hours per week, as a percentage of all working people in the Netherlands (not just employees), source CBS.

My entry for the Best Worst Viz competition

Number of tweets with hashtag #BestWorstViz, per date of the month April 2016 and time of the day. Times are UTC, 18 April is the deadline. Data updates every hour; clear browser history to refresh. Entry for Best Worst Viz competition, created by dirkmjk.

I love to hate bad graphs (who doesn’t), and I think Andy Kirk’s idea to organise a Best Worst Viz competition is quite brilliant. As he explains, there’s something fair about creating your own bad graph rather than criticising somebody else’s:

[..] picking on bad visualisation involves work by other people who we might never meet or have a chance to learn about what the true circumstances and intent of a project were. The essence of this challenge is based on your best worst visualisation - the best worst visualisation you can possibly make.

I had to give it a try. But how? An exploding 3D pie chart, truncated y-axis, out-of-control spaghetti chart - it all seemed a bit too obvious. I aimed for something different, drawing inspiration from the blink element of the early days of web design. The shifting colours of the stacked bar chart pointlessly illustrate the direction of time - or whatever. I think it’s pretty bad.

Standalone version of graph here.