champagne anarchist | armchair activist


Ongelijkheid bij verkiezingen

Er is veel te doen geweest over de opkomst bij de Amerikaanse presidentsverkiezing, maar ook in Nederland is de opkomst ongelijk en het kan geen kwaad daar af en toe bij stil te staan. Jongeren, laag opgeleiden en mensen met lage inkomens gaan minder vaak stemmen, wellicht omdat ze minder vertrouwen hebben dat de politiek rekening houdt met hun belangen.

Inkomen, opkomst en stemgedrag verschillen per buurt. De confettiplot hieronder illustreert dat aan de hand van de Tweede Kamerverkiezing van 2012 in Amsterdam.

Het beeld is duidelijk: in rijke buurten gaan meer mensen stemmen en ze stemmen vaak op VVD of D66 - partijen die staan voor een liberaal economisch beleid. In armere buurten zijn PvdA en SP populairder, maar hier gaan relatief weinig mensen naar de stembus.

Het feit dat er zulke grote verschillen in opkomst zijn roept de vraag op waarom er in Nederland nauwelijks serieuze campagnes worden gevoerd om de opkomst te bevorderen. Er is genoeg wetenschappelijk onderzoek naar de effectiviteit van zulke campagnes.

Klik op de links onder de grafiek om opkomst, linkse stemmen of liberale stemmen te zien. Hier is een grotere versie van de grafiek - al zal het op een mobiel scherm niet zoveel verschil maken.


Het vergelijken van verkiezingsuitslagen op buurtniveau met inkomensgegevens van de inwoners van die buurten gaat niet helemaal goed, omdat kiezers niet verplicht zijn om in hun eigen buurt te stemmen. Enkele buurten zoals Station-Zuid WTC en omgeving heb ik buiten beschouwing gelaten omdat er in die buurten stembureaus op stations zijn waar relatief veel mensen van buiten de buurt gaan stemmen.

De correlaties zijn vrij robuust. Je vindt ze ook als je naar het stemgedrag in Amsterdamse buurten bij de gemeenteraadsverkiezing van 2014 kijkt, of naar verschillen tussen gemeenten in heel Nederland bij de Tweede Kamerverkiezing in 2012 (bij dat laatste voorbeeld zijn de correlaties wat minder sterk). Data en scripts hier.

Inequality in elections

There’s been a bit of fuss about turnout in the American presidential election, but turnout inequality is an issue in the Netherlands too. Youth, low-educated people and people with lower incomes are less likely to vote, possibly because they have little faith politicians will take their interests at heart.

Income, turnout and voting behaviour vary across neighbourhoods as shown by the confetti plot below, which uses the Amsterdam results of the 2012 Lower House election as an illustration.

The picture is clear: in rich neighbourhoods, more people vote, and they’re more likely to vote VVD or D66 - parties that favour free-market economics. In poorer neighbourhoods, the social-democrat PvdA and the socialist SP are more popular, but fewer people turn out to vote.

Given the large differences in turnout, it’s surprising that hardly any serious turnout campaigns have been run in the Netherlands. There’s ample scientific research on the effectiveness of such campaigns.

Click the urls below the chart to show turnout, left votes or liberal votes. Here is a larger version of the chart - even though this may not make much difference on a mobile screen.


Comparing neighbourhood-level election results with income data on the residents of these neighbourhoods is somewhat problematic because voters aren’t required to vote in their own neighbourhood. I have excluded a few neighbourhoods, including Station-Zuid WTC en omgeving, because they have polling stations at railway stations where relatively many people from other neighbourhoods vote.

The correlations are pretty robust. You’ll also find them by analysing voting behaviour in Amsterdam neighbourhoods in the 2014 city council election, or differences between municipalities across the Netherlands in the 2012 Lower House election (in the latter case, correlations are somewhat weaker). Data and scripts here.

bullshit #dataviz

Donald Trump won because Hillary Clinton failed to get the vote out. At least, that’s the story this heavily retweeted chart seems to tell (click the chart for a larger version). But according to data visualisation expert Alberto Cairo it’s an example of the kind of bullshit #dataviz we need to fight against. In fact, many people have criticised the chart, for a number of reasons, including:

  • The y-axis, obviously.[1] The chart suggests Clinton got about half as many votes in 2016 as Obama in 2012, which of course isn’t true. Some have argued that truncating the y-axis is justifiable in this case because otherwise small differences wouldn’t show. However, with a y-axis starting at zero, you can still see what’s going on.
  • Why is it showing only the latest three presidential elections? Add data for elections before 2008, and the picture becomes quite different.
  • Not all votes have been counted yet. At some point, Nate Cohn of the NYT has predicted that Trump will get 61.2 million votes and Clinton 63.4 million, when all votes are counted. That would also change the picture considerably.

So who created this bullshit #dataviz and why? The earliest version I could find is by Economics Professor D Yanagizawa-Drott.[2] My guess is that he created the chart as a quick-and-dirty attempt to understand what happened on 8 November, never expecting it to go viral, and that he never gave much thought to its execution.[3] While the chart design is problematic, the idea behind it - explore how turnout affected the outcome of the election - makes sense.

Meanwhile, the post-election dataviz deluge highlighted another problem. People post charts without indicating the source of the data they used. To make matters worse, other people will simply copy and post that chart without saying who they got it from. There should be a rule that if you post a chart, you should indicate the data source and who created the chart - or at least where you found it.

  1. Jonathan Webber, who was among the people to make the chart popular, has a bio that says Trolling y-axis mavens since 2016 (I assume he added this line in response to criticism of the chart).  ↩

  2. I wonder whether it’s possible to systematically search for images on Twitter?  ↩

  3. He introduced the chart as «A quick look at turnout data». When someone said the y-axis should start at zero, he responded: «True. Also contact Microsoft Excel, let them know the default y-axis is simply unacceptable; lazy people like me need nudging.»  ↩

Illustrating Amsterdam on Wikipedia

Wikipedia has pages on Amsterdam in 197 languages, 149 of which have at least one image. The editors of different language versions do not generally use the same images to illustrate their page. The graph below shows how often images are used.[1]

Most images appear on only one language page and only a few appear on more than 10 pages.

Popular images

Here are the most popular images and the number of pages they appear on: amsterdamluchtfotobmz: 29; amsterdam airphoto: 26; keizersgrachtreguliersgrachtamsterdam: 26; amsterdam canals - july 2006: 20; river amstel by night - frans koppelaar: 17; zuidasamsterdamthenetherlands: 15; canals of amsterdam - jordaan area: 15; amsterdamdamsquar: 14; amsterdam333: 13; amsterdam 4.89943e 52.37109n: 12; sights in amsterdam: 10; cornelis anthonisz vogelvluchtkaart amsterdam: 10; amsterdam red light district 24–7–2003: 10; view of amsterdam: 10; sint-nicolaaskerk (amsterdam): 9.

The filenames do not always reflect the subject matter of the images, but still they may give an impression. Of the 149 language pages with images, 72 have at least one image with canal in its filename. Other terms include museum (29), church (26), the Zuidas business district (25), red light district (15), bicycle (10) and hash, marihuana or coffee shop (9).[2] This suggests a preference for traditional topics.

Unique images

More interesting than the most popular photos are unique photos - photos used on only one version of the Amsterdam page. Here is a list of language versions with the number of unique photos they contain: Français: 19; Lëtzebuergesch: 16; Italiano: 13; Limburgs: 12; Brezhoneg: 12; English: 10; Nederlands: 9; Čeština: 7; ქართული: 7; Polski: 7; Deutsch: 7; Svenska: 7; Slovenčina: 6; Español: 6; Ελληνικά: 5.

Aperçu de la gamme de bières artisanales de la Brouwerij ’t IJ, en 2008. By Aloxe, CC BY-SA 3.0

The French page is carefully-edited and has some cool photos, including a photochrome of Dam Square and a selection of beer bottles of Brouwerij ’t IJ;[3] the latter illustrates a section on Beer and breweries: from craft to multinationals. As for the Luxembourg page - it appears that one editor has simply dumped his or her holiday pics there.

  1. I only looked at jpg images because svg’s and png’s tend to be pictograms, maps and flags.  ↩

  2. I checked for variations, e.g. fiets, cycle or bike.  ↩

  3. As of 2008; both the bottle and the labels have changed since.  ↩

Datawrapper’s policy on bad graphs

Datawrapper is a tool that lets you turn a dataset into a decent-looking chart within minutes. In an interview, co-founder Mirko Lorenz said Datawrapper is designed to prevent people making misleading graphs:

With Datawrapper, we try to make it as hard as possible to take data and create misleading charts with it. For example, it’s not possible to create bar charts with cropped axes. From time to time, users ask us to add this feature, but we never have and we never will. (via)

This may sound a bit paternalistic but it makes sense: Datawrapper’s philosophy is to offer a simple, robust way to quickly create a chart. If you don’t like the limitations, learn to code D3.js.

But Lorenz’ remark made me curious: would there be more design options, besides bar charts with cropped axes, that Datawrapper deems unacceptable? And are they limited to chart designs that are outright misleading, or do they more generally ban designs that result in ineffective or inaccurate data communication? Here’s an exploration of Datawrapper’s bad graph policy.

Y-axis not starting at zero
Datawrapper disapproves of y-axes that don’t start at zero in bar and column charts, but it allows them in line charts. I think this is consistent with the consensus on the topic.[1]

Spaghetti chart

I’m using the term spaghetti chart in the non-technical sense, meaning a chart with many lines that create an indecipherable mess.[2] Datawrapper doesn’t ban spaghetti charts.

Pie chart
Long the chart type we all loved to hate, the pie chart has recently been sort of rehabilitated. I think many people would now agree that pie charts are a legitimate way to represent proportions. That said, 3D and exploding pie charts are still suspect. Datawrapper allows pie (and donut) charts, but doesn’t seem to allow 3D or exploding pie charts.

Using perspective to create a 3D effect will make it difficult to compare the sizes of elements in a chart. Fortunately Datawrapper doesn’t seem to allow any type of 3D chart.

Stacked bar chart
The rehabilitation of the pie chart coincided with a renewed critique of stacked bar charts: «basic bar charts are clearly better than pie charts, but stack them and they’re worse!». Which, by the way, doesn’t mean that it’s always wrong to use stacked bar charts.[3] Datawrapper allows them.

Dual y-axes
Some charts have have a secondary y-axis, so different scales can be used in one chart (here’s an awkward example, source). There may be situations where this is defensible, but in general it shouldn’t be considered good practice. Datawrapper doesn’t seem to allow this.

Pictograms instead of bars
Some designers try to jazz up bar charts using pictograms instead of bars, forgetting to take into account that if you double the height of the pictogram, its area increases fourfold. The distortion is even worse when the pictograms are drawn to appear three-dimensional. Datawrapper doesn’t seem to allow replacing bars with pictograms.

  1. The most well-known example of y-axes not starting at zero are cropped or truncated axes which start at a value higher than zero, but there are also examples of axes starting at a negative value. Edward Tufte points this out in The Visual Display of Quantitative Information, using a chart from an annual report as an illustration: «A careful look at the middle panel reveals a negative income in 1970, which is diguised by having the bars begin at the bottom at approximately minus $4.200.000».  ↩

  2. You can make a spaghetti chart interactive, for example let users click a label and the corresponding line will be highlighted. But this may still be an awkward solution, especially on mobile.  ↩

  3. «They can be useful when the point is to show that a value is the sum of other values, but you’re only interested in comparing the totals. They also work if you only need to show one section and can make that the one on the bottom. Then the bars are comparable and work well. But just throwing values into a stacked bar chart is a bad idea», Robert Kosara argued. Here’s how Dutch minister Jeroen Dijsselbloem messed up.  ↩