DuckDuckGo shows code examples

Because of Google’s new privacy warning, I finally changed my default search engine to DuckDuckGo.[1] So far, I’m quite happy with it. I was especially pleased when I noticed they sometimes show code snippets or excerpts from documentation on the results page.

Apparently, DDG has decided that it wants to be «the best search engine for programmers». One feature they’re using are the instant answers that are sometimes shown in addition to the ‘normal’ search results. These instant answers may get their contents from DDGs own databases - examples include cheat sheets created for the purpose - or they may use external APIs, such as the Stack Overflow API. Currently, volunteers are working to improve search results for the top 15 programming languages, including Javascript, Python and R.

One could argue that instant answers promote the wrong kind of laziness - copying code from the search results page rather than visit the original post on Stack Overflow. But for quickly looking up trivial stuff, I think this is perfect.


  1. I assume the contents of the privacy warning could have been reason to switch search engines, but what triggered me was the intrusive warning that Google shows in each new browsers session - basically punishing you for having your browser throw away cookies.  ↩

Embedding D3.js charts in a responsive website - a better solution

I often use D3.js to create charts which I embed on my website (the chart below is included merely as an illustration; it was copied from here). Normally you set the width and height of the embedded page in the embed code, but with a responsive layout it’s not so simple. The challenge is to adapt the iframe width to varying screen sizes and change the height so that the chart still fits.

After struggling with this issue for quite a while I thought I had come across the solution and wrote an article about it. However, this solution has two problems:

  • You have to define the aspect ratio in both the embed code and the D3 code; ideally, you shouldn’t have to do that in more than one place;
  • More importantly, it doesn’t take the height into account of any title and captions that are not part of the D3-created svg. You could handle this by making the title and caption part of the svg itself, but this is a bit awkward, especially with multiline captions.

A while ago, I came across a different approach which uses HTML5’s postMessage. The embedded page posts a message containing it’s own height to the parent page. The parent page picks up the message and changes the iframe height accordingly.

A smart variant has the embedded page not only send its height, but also its url to the parent page. That way, you can identify the corresponding iframe by its src attribute and thus make sure the right iframe gets updated - which is nice if you have more than one iframe on a web page.

Here’s how it works. In the D3 code, set the width of the chart to the width of the div the svg is attached to and use the aspect ratio to calculate the chart height. Also add the following code to the embedded page. It will send its height and url to the parent page:

function sendHeight() {
  var height = $('body').height();
  window.parent.postMessage({
    'height': height,
    'location': window.location.href
  }, "*");
}
 
$(window).on('resize', function() {
  sendHeight();
}).resize();

And here’s the code for the parent page. It will pick up the message, identify the corresponding iframe and update its height (note that Drupal requires jQuery instead of $):

window.addEventListener('message', function(event) {
    if (event.origin !== 'https://dirkmjk.nl') return;
    var data = event.data;
    var height = data.height + 32;
    jQuery('iframe[src^="' + data.location + '"]').css('height', height + 'px');
}, false);

In the second line, the domain should be replaced with the domain where the embedded page is hosted (the line checks for the origin of the posted message for security reasons).

I haven’t extensively checked this but it works on iOS and Android. Since it uses postMessage, it will not work on some older browsers. Then again, D3.js won’t work on some older browsers either.

Credits go to thomax and Jan Werkhoven.

Tags: 

Exploring traffic lights with location data from cyclists’ phones

In 2006, Amsterdammers voted Frederiksplein the location with the most irritating traffic light. Now, ten years later, data from the Fietstelweek (Bicycle Counting Week) offer a unique opportunity to map how much time cyclists lose at traffic lights. During Fietstelweek, over 40,000 Dutch cyclists have shared their location data using a smartphone app. Some of the findings are summarised on the map above, which shows quite a few red dots - locations where cyclists lose on average 30 seconds or more.

Some of those bottlenecks also featured in the 2006 top-ten of irritating traffic lights, including the ‘winner’ of the time, the Frederiksplein. And many red dots are on the Plusnet Fiets, a network of essential cycling routes where the municipality would prefer an average delay of at most 20 or 30 seconds.[1]

The data only allows for a general exploration of cycling bottlenecks. In order to understand more precisely what’s going on, one would have to analyse each crossing separately. At a few locations, average delays of over two minutes have been observed - perhaps traffic lights are not the sole explanation of those delays.

The data from the Fietstelweek were collected in September. The situation may well have changed since at some locations. A good example is the Muntplein, where cycling is pretty smooth now - thanks to Alderman Litjens who banned most cars and removed traffic lights. A change that occured before the Fietstelweek is the removal of traffic lights at the Alexanderplein. And it shows: all dots are green there.

Cyclists’ organisation Fietsersbond wants traffic lights adjusted to create shorter waiting times for cyslists. Research has shown this to be a measure that is very effective and relatively easy and cheap to implement. But it’s not just about technical improvements; future policies should make ‘radical choices’ in favour of bicycle and pedestrian traffic, in order to prevent the city coming to a standstill due to congestion.

This seemed like a good occasion to organise a follow-up poll on traffic lights. Click here to vote for Amsterdam’s most irritating traffic light - 2016 edition.

Method

The Fietstelweek is an initiative of cyclists’ organisation Fietsersbond and a number of consultancies and research organisations. Between 19 and 25 September 2016, over 40,000 cyclists have used an app to share their location data. The Fietstelweek data has been made available (thanks!) on condition that derived products are also made available as open data. The processed data of my analysis is here and the code for processing the data here and here.

The Fietstelweek data is available in the form of routes, links (intensity and speed) and nodes (delays). The nodes data contains a variable tijd (time). This is the delay along the trajectory between 50m before and 50m after the node, relative to the time the cyclist would normally take to cycle 100m (thanks Dirk Bussche of NHTV Breda university of Applied Sciences for details on how the data was processed).

The dataset contains over 750,000 nodes. I filtered them in three steps: only nodes that are within a square around Amsterdam; only nodes near traffic lights and only nodes with at least 50 observations. This resulted in 1,845 nodes with almost 400,000 observations. For details see the scripts.

Data on traffic lights is from the municipality.


  1. In a new policy to be decided early 2017, the municipality indicates that the average waiting time for cyclists, measured at the busiest hour, should not exceed 45 seconds. At the Plusnet Fiets, it is further deemed desirable that the maximum delay doesn’t exceed 20 seconds at busy crossings an 30 seconds elsewhere. Delay times include the effect of slowing down and accelerating.  ↩

Time on the y-axis

Normally, charts have time on the x-axis, moving from left to right. Earlier this year, Alberto Cairo wrote an article on charts that have time on the y-axis. This may make practical sense if you want to show developments over time on a political left-right scale. He also pointed to the use of mobile screens:

As a final note, here’s a prediction: as a majority of readers are accessing their news through smartphones […] which are usually held upright and navigated by scrolling vertically, vertical time-series charts with time on the Y-axis will become more common in the next few years. Will we witness a new visual convention being born?

Now Kaiser Fung discusses a few charts by the Washington Post (aptly described as troll hair charts) and the New York Times that also have time on the y-axis. They’ve made different choices regarding the direction of time: «The Post’s choice of top to bottom seems more natural to me than the Times’s reverse order but I am guessing some of you may have different inclinations.» Which suggests that the convention of showing time on the y-axis hasn’t crystallised yet.

Based on the connection with scrolling on mobile screens, the Washington Post’s top-to-bottom approach may well emerge as the standard approach.

Tags: 

Inequality in elections

There’s been a bit of fuss about turnout in the American presidential election, but turnout inequality is an issue in the Netherlands too. Youth, low-educated people and people with lower incomes are less likely to vote, possibly because they have little faith politicians will take their interests at heart.

Income, turnout and voting behaviour vary across neighbourhoods as shown by the confetti plot below, which uses the Amsterdam results of the 2012 Lower House election as an illustration.

The picture is clear: in rich neighbourhoods, more people vote, and they’re more likely to vote VVD or D66 - parties that favour free-market economics. In poorer neighbourhoods, the social-democrat PvdA and the socialist SP are more popular, but fewer people turn out to vote.

Given the large differences in turnout, it’s surprising that hardly any serious turnout campaigns have been run in the Netherlands. There’s ample scientific research on the effectiveness of such campaigns.

Click the urls below the chart to show turnout, left votes or liberal votes. Here is a larger version of the chart - even though this may not make much difference on a mobile screen.

Method

Comparing neighbourhood-level election results with income data on the residents of these neighbourhoods is somewhat problematic because voters aren’t required to vote in their own neighbourhood. I have excluded a few neighbourhoods, including Station-Zuid WTC en omgeving, because they have polling stations at railway stations where relatively many people from other neighbourhoods vote.

The correlations are pretty robust. You’ll also find them by analysing voting behaviour in Amsterdam neighbourhoods in the 2014 city council election, or differences between municipalities across the Netherlands in the 2012 Lower House election (in the latter case, correlations are somewhat weaker). Data and scripts here.

Tags: 

bullshit #dataviz

Donald Trump won because Hillary Clinton failed to get the vote out. At least, that’s the story this heavily retweeted chart seems to tell (click the chart for a larger version). But according to data visualisation expert Alberto Cairo it’s an example of the kind of bullshit #dataviz we need to fight against. In fact, many people have criticised the chart, for a number of reasons, including:

  • The y-axis, obviously.[1] The chart suggests Clinton got about half as many votes in 2016 as Obama in 2012, which of course isn’t true. Some have argued that truncating the y-axis is justifiable in this case because otherwise small differences wouldn’t show. However, with a y-axis starting at zero, you can still see what’s going on.
  • Why is it showing only the latest three presidential elections? Add data for elections before 2008, and the picture becomes quite different.
  • Not all votes have been counted yet. At some point, Nate Cohn of the NYT has predicted that Trump will get 61.2 million votes and Clinton 63.4 million, when all votes are counted. That would also change the picture considerably.

So who created this bullshit #dataviz and why? The earliest version I could find is by Economics Professor D Yanagizawa-Drott.[2] My guess is that he created the chart as a quick-and-dirty attempt to understand what happened on 8 November, never expecting it to go viral, and that he never gave much thought to its execution.[3] While the chart design is problematic, the idea behind it - explore how turnout affected the outcome of the election - makes sense.

Meanwhile, the post-election dataviz deluge highlighted another problem. People post charts without indicating the source of the data they used. To make matters worse, other people will simply copy and post that chart without saying who they got it from. There should be a rule that if you post a chart, you should indicate the data source and who created the chart - or at least where you found it.


  1. Jonathan Webber, who was among the people to make the chart popular, has a bio that says Trolling y-axis mavens since 2016 (I assume he added this line in response to criticism of the chart).  ↩

  2. I wonder whether it’s possible to systematically search for images on Twitter?  ↩

  3. He introduced the chart as «A quick look at turnout data». When someone said the y-axis should start at zero, he responded: «True. Also contact Microsoft Excel, let them know the default y-axis is simply unacceptable; lazy people like me need nudging.»  ↩

Tags: 

Illustrating Amsterdam on Wikipedia

Wikipedia has pages on Amsterdam in 197 languages, 149 of which have at least one image. The editors of different language versions do not generally use the same images to illustrate their page. The graph below shows how often images are used.[1]

Most images appear on only one language page and only a few appear on more than 10 pages.

Popular images

Here are the most popular images and the number of pages they appear on: amsterdamluchtfotobmz: 29; amsterdam airphoto: 26; keizersgrachtreguliersgrachtamsterdam: 26; amsterdam canals - july 2006: 20; river amstel by night - frans koppelaar: 17; zuidasamsterdamthenetherlands: 15; canals of amsterdam - jordaan area: 15; amsterdamdamsquar: 14; amsterdam333: 13; amsterdam 4.89943e 52.37109n: 12; sights in amsterdam: 10; cornelis anthonisz vogelvluchtkaart amsterdam: 10; amsterdam red light district 24–7–2003: 10; view of amsterdam: 10; sint-nicolaaskerk (amsterdam): 9.

The filenames do not always reflect the subject matter of the images, but still they may give an impression. Of the 149 language pages with images, 72 have at least one image with canal in its filename. Other terms include museum (29), church (26), the Zuidas business district (25), red light district (15), bicycle (10) and hash, marihuana or coffee shop (9).[2] This suggests a preference for traditional topics.

Unique images

More interesting than the most popular photos are unique photos - photos used on only one version of the Amsterdam page. Here is a list of language versions with the number of unique photos they contain: Français: 19; Lëtzebuergesch: 16; Italiano: 13; Limburgs: 12; Brezhoneg: 12; English: 10; Nederlands: 9; Čeština: 7; ქართული: 7; Polski: 7; Deutsch: 7; Svenska: 7; Slovenčina: 6; Español: 6; Ελληνικά: 5.

Aperçu de la gamme de bières artisanales de la Brouwerij ’t IJ, en 2008. By Aloxe, CC BY-SA 3.0

The French page is carefully-edited and has some cool photos, including a photochrome of Dam Square and a selection of beer bottles of Brouwerij ’t IJ;[3] the latter illustrates a section on Beer and breweries: from craft to multinationals. As for the Luxembourg page - it appears that one editor has simply dumped his or her holiday pics there.


  1. I only looked at jpg images because svg’s and png’s tend to be pictograms, maps and flags.  ↩

  2. I checked for variations, e.g. fiets, cycle or bike.  ↩

  3. As of 2008; both the bottle and the labels have changed since.  ↩

Tags: 

Datawrapper’s policy on bad graphs

Datawrapper is a tool that lets you turn a dataset into a decent-looking chart within minutes. In an interview, co-founder Mirko Lorenz said Datawrapper is designed to prevent people making misleading graphs:

With Datawrapper, we try to make it as hard as possible to take data and create misleading charts with it. For example, it’s not possible to create bar charts with cropped axes. From time to time, users ask us to add this feature, but we never have and we never will. (via)

This may sound a bit paternalistic but it makes sense: Datawrapper’s philosophy is to offer a simple, robust way to quickly create a chart. If you don’t like the limitations, learn to code D3.js.

But Lorenz’ remark made me curious: would there be more design options, besides bar charts with cropped axes, that Datawrapper deems unacceptable? And are they limited to chart designs that are outright misleading, or do they more generally ban designs that result in ineffective or inaccurate data communication? Here’s an exploration of Datawrapper’s bad graph policy.

Y-axis not starting at zero
Datawrapper disapproves of y-axes that don’t start at zero in bar and column charts, but it allows them in line charts. I think this is consistent with the consensus on the topic.[1]

Spaghetti chart

I’m using the term spaghetti chart in the non-technical sense, meaning a chart with many lines that create an indecipherable mess.[2] Datawrapper doesn’t ban spaghetti charts.

Pie chart
Long the chart type we all loved to hate, the pie chart has recently been sort of rehabilitated. I think many people would now agree that pie charts are a legitimate way to represent proportions. That said, 3D and exploding pie charts are still suspect. Datawrapper allows pie (and donut) charts, but doesn’t seem to allow 3D or exploding pie charts.

3D
Using perspective to create a 3D effect will make it difficult to compare the sizes of elements in a chart. Fortunately Datawrapper doesn’t seem to allow any type of 3D chart.

Stacked bar chart
The rehabilitation of the pie chart coincided with a renewed critique of stacked bar charts: «basic bar charts are clearly better than pie charts, but stack them and they’re worse!». Which, by the way, doesn’t mean that it’s always wrong to use stacked bar charts.[3] Datawrapper allows them.

Dual y-axes
Some charts have have a secondary y-axis, so different scales can be used in one chart (here’s an awkward example, source). There may be situations where this is defensible, but in general it shouldn’t be considered good practice. Datawrapper doesn’t seem to allow this.

Pictograms instead of bars
Some designers try to jazz up bar charts using pictograms instead of bars, forgetting to take into account that if you double the height of the pictogram, its area increases fourfold. The distortion is even worse when the pictograms are drawn to appear three-dimensional. Datawrapper doesn’t seem to allow replacing bars with pictograms.


  1. The most well-known example of y-axes not starting at zero are cropped or truncated axes which start at a value higher than zero, but there are also examples of axes starting at a negative value. Edward Tufte points this out in The Visual Display of Quantitative Information, using a chart from an annual report as an illustration: «A careful look at the middle panel reveals a negative income in 1970, which is diguised by having the bars begin at the bottom at approximately minus $4.200.000».  ↩

  2. You can make a spaghetti chart interactive, for example let users click a label and the corresponding line will be highlighted. But this may still be an awkward solution, especially on mobile.  ↩

  3. «They can be useful when the point is to show that a value is the sum of other values, but you’re only interested in comparing the totals. They also work if you only need to show one section and can make that the one on the bottom. Then the bars are comparable and work well. But just throwing values into a stacked bar chart is a bad idea», Robert Kosara argued. Here’s how Dutch minister Jeroen Dijsselbloem messed up.  ↩

Amsterdam houses on the Chinese market

In the Guardian, London mayor Sadiq Khan has announced the launch of an investigation into the effect of foreign investment on the London housing market. In Amsterdam, concerns have been voiced over super rich Russians and Chinese buying up property, although this phenomenon is probably in its infancy compared to London. Newspaper het Parool reported that 15 expensive houses have been sold to rich Chinese and Russians in 2014 and identified a canal house, asking price 6.7 million euros, that had been sold to an ‘international investor’.[1]

The Guardian article mentions Juwai.com, «a website that aims to pair Chinese investors with property developers overseas». If Amsterdam property is offered for sale to Chinese buyers, it might be listed there, although houses may also be sold through less transparent channels.

It turns out the site currently contains some thirty Amsterdam houses, offered for sale by Christie’s, Sotheby’s and other agents. The median asking price is about 1.6 million euros, with a maximum of 7.9 million euros. Unsurprisingly, they tend to be located in the posh areas of town: Canal Belt, Vondelpark, Zuidas. Incidentally, the website also contains quite a few houses in affluent villages like Aerdenhout.

Juwai has published lists of most-viewed cities. In Q4 2015 Amsterdam was the 4th most popular European city among Chinese prospective buyers. In Q1 2016 it dropped to position 8, which suggests the ranking is rather volatile.

Mind you, I have no problem per se with Chinese or Russians buying Amsterdam houses. I do think it’s a problem when rich people - be they Dutch or foreign - use houses as an investment object and drive up housing prices. But this is part of a broader problem, to do with issues like wealth inequality and the social housing sell-off.

While the scale of speculation and unoccupied houses isn’t anywhere near what’s happening in London, the Amsterdam city government warns that «it cannot be ruled out that such developments will also take place in Amsterdam.» But as De Groene argued, that doesn’t depend on Chinese and Russians buying up canal houses, but on what we’re willing to do about our housing market.

Method

A practical issue I ran into was how to search a website in Chinese. In a variation on a trick I learned from Henk van Ess, I used Google Translate to look up the Simplified Chinese translation of ‘Amsterdam’. Then I searched for 阿姆斯特丹 site:juwai.com. One of the first search results was the page NLproperty, which, as you’d expect, lists property in the Netherlands. From there it was easy to find the property in Amsterdam.


  1. Parool. According to city government data (pdf), 6 houses were sold to foreigners at a price above 1 million euros in 2014, which suggests the sources quoted by het Parool have a lower threshold for expensive housing.  ↩

Tags: 

Exploring tax haven Amsterdam

In 2012, I mapped the geographical evolution of Amsterdam’s trust offices. Since 2006, many had changed their name or moved to a different location, resulting in four major concentrations: Zuidoost, Prins Bernhardplein, Zuidas and Naritaweg.

The other day I ran into a new map (pdf) of tax haven Amsterdam. Judging by this map, Amsterdam’s tax avoidance geography hasn’t changed much since 2012. The map was posted online by Action Aid on occasion of their AMSTERDAM TAX TOUR - THE BIKE EDITION. Sounds like fun and I would’ve have loved to cycle along, but unfortunately it coincides with a previous appointment.

Incidentally, Wired did a global tax avoidance map, which also features Amsterdam.

Tags: 

Pages