champagne anarchist | armchair activist

Data

bullshit #dataviz

Donald Trump won because Hillary Clinton failed to get the vote out. At least, that’s the story this heavily retweeted chart seems to tell (click the chart for a larger version). But according to data visualisation expert Alberto Cairo it’s an example of the kind of bullshit #dataviz we need to fight against. In fact, many people have criticised the chart, for a number of reasons, including:

  • The y-axis, obviously.[1] The chart suggests Clinton got about half as many votes in 2016 as Obama in 2012, which of course isn’t true. Some have argued that truncating the y-axis is justifiable in this case because otherwise small differences wouldn’t show. However, with a y-axis starting at zero, you can still see what’s going on.
  • Why is it showing only the latest three presidential elections? Add data for elections before 2008, and the picture becomes quite different.
  • Not all votes have been counted yet. At some point, Nate Cohn of the NYT has predicted that Trump will get 61.2 million votes and Clinton 63.4 million, when all votes are counted. That would also change the picture considerably.

So who created this bullshit #dataviz and why? The earliest version I could find is by Economics Professor D Yanagizawa-Drott.[2] My guess is that he created the chart as a quick-and-dirty attempt to understand what happened on 8 November, never expecting it to go viral, and that he never gave much thought to its execution.[3] While the chart design is problematic, the idea behind it - explore how turnout affected the outcome of the election - makes sense.

Meanwhile, the post-election dataviz deluge highlighted another problem. People post charts without indicating the source of the data they used. To make matters worse, other people will simply copy and post that chart without saying who they got it from. There should be a rule that if you post a chart, you should indicate the data source and who created the chart - or at least where you found it.


  1. Jonathan Webber, who was among the people to make the chart popular, has a bio that says Trolling y-axis mavens since 2016 (I assume he added this line in response to criticism of the chart).  ↩

  2. I wonder whether it’s possible to systematically search for images on Twitter?  ↩

  3. He introduced the chart as «A quick look at turnout data». When someone said the y-axis should start at zero, he responded: «True. Also contact Microsoft Excel, let them know the default y-axis is simply unacceptable; lazy people like me need nudging.»  ↩

Illustrating Amsterdam on Wikipedia

Wikipedia has pages on Amsterdam in 197 languages, 149 of which have at least one image. The editors of different language versions do not generally use the same images to illustrate their page. The graph below shows how often images are used.[1]

Most images appear on only one language page and only a few appear on more than 10 pages.

Popular images

Here are the most popular images and the number of pages they appear on: amsterdamluchtfotobmz: 29; amsterdam airphoto: 26; keizersgrachtreguliersgrachtamsterdam: 26; amsterdam canals - july 2006: 20; river amstel by night - frans koppelaar: 17; zuidasamsterdamthenetherlands: 15; canals of amsterdam - jordaan area: 15; amsterdamdamsquar: 14; amsterdam333: 13; amsterdam 4.89943e 52.37109n: 12; sights in amsterdam: 10; cornelis anthonisz vogelvluchtkaart amsterdam: 10; amsterdam red light district 24–7–2003: 10; view of amsterdam: 10; sint-nicolaaskerk (amsterdam): 9.

The filenames do not always reflect the subject matter of the images, but still they may give an impression. Of the 149 language pages with images, 72 have at least one image with canal in its filename. Other terms include museum (29), church (26), the Zuidas business district (25), red light district (15), bicycle (10) and hash, marihuana or coffee shop (9).[2] This suggests a preference for traditional topics.

Unique images

More interesting than the most popular photos are unique photos - photos used on only one version of the Amsterdam page. Here is a list of language versions with the number of unique photos they contain: Français: 19; Lëtzebuergesch: 16; Italiano: 13; Limburgs: 12; Brezhoneg: 12; English: 10; Nederlands: 9; Čeština: 7; ქართული: 7; Polski: 7; Deutsch: 7; Svenska: 7; Slovenčina: 6; Español: 6; Ελληνικά: 5.

Aperçu de la gamme de bières artisanales de la Brouwerij ’t IJ, en 2008. By Aloxe, CC BY-SA 3.0

The French page is carefully-edited and has some cool photos, including a photochrome of Dam Square and a selection of beer bottles of Brouwerij ’t IJ;[3] the latter illustrates a section on Beer and breweries: from craft to multinationals. As for the Luxembourg page - it appears that one editor has simply dumped his or her holiday pics there.


  1. I only looked at jpg images because svg’s and png’s tend to be pictograms, maps and flags.  ↩

  2. I checked for variations, e.g. fiets, cycle or bike.  ↩

  3. As of 2008; both the bottle and the labels have changed since.  ↩

Datawrapper’s policy on bad graphs

Datawrapper is a tool that lets you turn a dataset into a decent-looking chart within minutes. In an interview, co-founder Mirko Lorenz said Datawrapper is designed to prevent people making misleading graphs:

With Datawrapper, we try to make it as hard as possible to take data and create misleading charts with it. For example, it’s not possible to create bar charts with cropped axes. From time to time, users ask us to add this feature, but we never have and we never will. (via)

This may sound a bit paternalistic but it makes sense: Datawrapper’s philosophy is to offer a simple, robust way to quickly create a chart. If you don’t like the limitations, learn to code D3.js.

But Lorenz’ remark made me curious: would there be more design options, besides bar charts with cropped axes, that Datawrapper deems unacceptable? And are they limited to chart designs that are outright misleading, or do they more generally ban designs that result in ineffective or inaccurate data communication? Here’s an exploration of Datawrapper’s bad graph policy.

Y-axis not starting at zero
Datawrapper disapproves of y-axes that don’t start at zero in bar and column charts, but it allows them in line charts. I think this is consistent with the consensus on the topic.[1]

Spaghetti chart

I’m using the term spaghetti chart in the non-technical sense, meaning a chart with many lines that create an indecipherable mess.[2] Datawrapper doesn’t ban spaghetti charts.

Pie chart
Long the chart type we all loved to hate, the pie chart has recently been sort of rehabilitated. I think many people would now agree that pie charts are a legitimate way to represent proportions. That said, 3D and exploding pie charts are still suspect. Datawrapper allows pie (and donut) charts, but doesn’t seem to allow 3D or exploding pie charts.

3D
Using perspective to create a 3D effect will make it difficult to compare the sizes of elements in a chart. Fortunately Datawrapper doesn’t seem to allow any type of 3D chart.

Stacked bar chart
The rehabilitation of the pie chart coincided with a renewed critique of stacked bar charts: «basic bar charts are clearly better than pie charts, but stack them and they’re worse!». Which, by the way, doesn’t mean that it’s always wrong to use stacked bar charts.[3] Datawrapper allows them.

Dual y-axes
Some charts have have a secondary y-axis, so different scales can be used in one chart (here’s an awkward example, source). There may be situations where this is defensible, but in general it shouldn’t be considered good practice. Datawrapper doesn’t seem to allow this.

Pictograms instead of bars
Some designers try to jazz up bar charts using pictograms instead of bars, forgetting to take into account that if you double the height of the pictogram, its area increases fourfold. The distortion is even worse when the pictograms are drawn to appear three-dimensional. Datawrapper doesn’t seem to allow replacing bars with pictograms.


  1. The most well-known example of y-axes not starting at zero are cropped or truncated axes which start at a value higher than zero, but there are also examples of axes starting at a negative value. Edward Tufte points this out in The Visual Display of Quantitative Information, using a chart from an annual report as an illustration: «A careful look at the middle panel reveals a negative income in 1970, which is diguised by having the bars begin at the bottom at approximately minus $4.200.000».  ↩

  2. You can make a spaghetti chart interactive, for example let users click a label and the corresponding line will be highlighted. But this may still be an awkward solution, especially on mobile.  ↩

  3. «They can be useful when the point is to show that a value is the sum of other values, but you’re only interested in comparing the totals. They also work if you only need to show one section and can make that the one on the bottom. Then the bars are comparable and work well. But just throwing values into a stacked bar chart is a bad idea», Robert Kosara argued. Here’s how Dutch minister Jeroen Dijsselbloem messed up.  ↩

Linkse samenwerking in de Amsterdamse gemeenteraad

Afgelopen week nam de Amsterdamse gemeenteraad een initiatiefvoorstel aan om belastingontwijking aan te pakken. Op dezelfde dag werd een motie aangenomen waarin het stadsbestuur opdracht krijgt om 30% sociale woningbouw te realiseren op de Zuidas. Goede initiatieven, allebei het resultaat van linkse samenwerking in de raad.

Het stadsbestuur in Amsterdam is over het algemeen relatief rechts; daarom is het belangrijk dat linkse partijen in de raad het beleid bijsturen. Maar lukt het linkse partijen om samen te werken en durft de SP als coalitiepartij afstand te nemen van het college? Een deel van het antwoord op die vragen valt te vinden door te kijken naar moties en amendementen.

De grafiek laat zien hoeveel moties en amendementen zijn aangenomen ondanks het feit dat tenminste één coalitiepartij tegenstemde. De grijze kolommen tonen de totaalaantallen; de donkerrode laten zien hoeveel van die voorstellen zijn aangenomen dankzij linkse samenwerking.

Ik moet zeggen dat het me alles meevalt. Na de verkiezing van maart 2014 heeft de nieuwe raad een paar maanden de kat uit de boom gekeken. Vanaf eind 2014 stelt de raad zich kritischer op en werken linkse partijen vaker samen om het collegebeleid bij te sturen.

Ik heb ook gekeken welke partijen het initiatief nemen voor linkse samenwerking.[1] Dat levert voor de huidige raadsperiode het volgende beeld op:

  • GroenLinks: 62
  • SP: 30
  • PvdA: 28
  • D66: 18
  • PvdD: 10
  • CDA: 1

GroenLinks lijkt dus vaak het voortouw te nemen. Tijdens de vorige raadsperiode, toen GroenLinks in een coalitie zat met PvdA en VVD, was het beeld overigens niet veel anders.

Methode

Linkse samenwerking heb ik gemakshalve geoperationaliseerd als voorstellen die zijn aangenomen met steun van SP, PvdA en GroenLinks en met tegenstemmen van tenminste één coalitiepartij.

De gemeente heeft uitslagen gepubliceerd vanaf begin 2013 tot en met half september 2016. Het is dus mogelijk dat er nog moties en amendementen uit de tweede helft van september 2016 ontbreken. De uitslagen zijn gepubliceerd in een excelbestand dat ik al eens eerder heb gebruikt. Destijds schreef ik erover:

Aan de ene kant is het fantastisch dat deze informatie beschikbaar wordt gesteld. Aan de andere kant is dit bestand een beest dat slechts met flink wat regels code getemd kan worden. […] Gezien de complexiteit van het bestand valt niet voor honderd procent uit te sluiten dat er een keer iets mis kan zijn gegaan met het classificeren van de voorstellen.

Dat is nog steeds van toepassing.[2]


  1. Althans, de partij van de eerste indiener van de motie of het amendement. Misschien doe ik daarmee niet helemaal recht aan raadsleden die achter de schermen een rol spelen bij de totstandkoming van samenwerking.
     ↩

  2. Hier is een bijgewerkte versie van het script waarmee ik de gegevens heb verwerkt.  ↩

Amsterdam houses on the Chinese market

In the Guardian, London mayor Sadiq Khan has announced the launch of an investigation into the effect of foreign investment on the London housing market. In Amsterdam, concerns have been voiced over super rich Russians and Chinese buying up property, although this phenomenon is probably in its infancy compared to London. Newspaper het Parool reported that 15 expensive houses have been sold to rich Chinese and Russians in 2014 and identified a canal house, asking price 6.7 million euros, that had been sold to an ‘international investor’.[1]

The Guardian article mentions Juwai.com, «a website that aims to pair Chinese investors with property developers overseas». If Amsterdam property is offered for sale to Chinese buyers, it might be listed there, although houses may also be sold through less transparent channels.

It turns out the site currently contains some thirty Amsterdam houses, offered for sale by Christie’s, Sotheby’s and other agents. The median asking price is about 1.6 million euros, with a maximum of 7.9 million euros. Unsurprisingly, they tend to be located in the posh areas of town: Canal Belt, Vondelpark, Zuidas. Incidentally, the website also contains quite a few houses in affluent villages like Aerdenhout.

Juwai has published lists of most-viewed cities. In Q4 2015 Amsterdam was the 4th most popular European city among Chinese prospective buyers. In Q1 2016 it dropped to position 8, which suggests the ranking is rather volatile.

Mind you, I have no problem per se with Chinese or Russians buying Amsterdam houses. I do think it’s a problem when rich people - be they Dutch or foreign - use houses as an investment object and drive up housing prices. But this is part of a broader problem, to do with issues like wealth inequality and the social housing sell-off.

While the scale of speculation and unoccupied houses isn’t anywhere near what’s happening in London, the Amsterdam city government warns that «it cannot be ruled out that such developments will also take place in Amsterdam.» But as De Groene argued, that doesn’t depend on Chinese and Russians buying up canal houses, but on what we’re willing to do about our housing market.

Method

A practical issue I ran into was how to search a website in Chinese. In a variation on a trick I learned from Henk van Ess, I used Google Translate to look up the Simplified Chinese translation of ‘Amsterdam’. Then I searched for 阿姆斯特丹 site:juwai.com. One of the first search results was the page NLproperty, which, as you’d expect, lists property in the Netherlands. From there it was easy to find the property in Amsterdam.


  1. Parool. According to city government data (pdf), 6 houses were sold to foreigners at a price above 1 million euros in 2014, which suggests the sources quoted by het Parool have a lower threshold for expensive housing.  ↩

Pages