Paste0 in R is one of the things that we learned about in this week’s videos for the Data Analysis course. I didn’t think much of it at the time, but I was wrong! I just learned about statistical computing’s most influential contribution of the 21st century!

Blind followers on Twitter

Select group:

On 30 september, I posted the last article on Nieuws uit Amsterdam (News from Amsterdam). The website has been inactive since, apart from a message on 28 October formally announcing that the site is no longer active. As expected, the number of new followers of @nieuwsamsterdam on twitter dropped in October. Intriguingly, it started to rise again after that.

The list of new followers has been compiled from ‘You have new followers’ emails and may be incomplete. Graph may not work in older versions of Internet Explorer.

‘Trade unions should take a much tougher stance’

Dutch trade unions have a reputation for constructive dialogue, but that’s not necessarily what people expect of them. In the LISS Political Values study, some 6,000 panel members have been asked a number of times whether they agree with the statement ‘Trade unions should take a much tougher political stance, if they wish to promote the workers’ interests’. In the latest edition of the study, those who agree with this statement outnumber those who disagree by 2.6 to 1. This support for tougher unions holds for most subgroups (but not the self-employed and people earning more than 4,500 euros per month).

Support for tougher unions over time

Percentage of respondents who agree or disagree with the statement ‘Trade unions should take a much tougher political stance, if they wish to promote the workers’ interests’. Graph may not work with older versions of Internet Explorer. Source LISS, graph dirkmjk.

Support for tougher unions, by group


Values higher than 1 mean that within that group, those in favour of tougher unions outnumber those who disagree. For example, among people with paid employment, the number of respondents in favour of tougher unions is 3.5 times as high as the number who disagree. Hover mouse over bar to see percentages. Graph may not work with older versions of Internet Explorer. Source LISS, results for December 2011, graph dirkmjk.

My first D3 graph

I’m trying to master D3, a javascript library for creating (interactive) web graphics. As an excercise, I redid this graph, which uses Eurostat data on the percentage of the population who have ever written a computer programme.

I can’t say it’s a very good graph: some of the most intriguing aspects of the data have to do with changes over time (decline in some countries, rather large growth in Finland, implausible fluctuations in the Netherlands), which don’t show very well in my graph. Nevertheless, it feels good to have coded my first interactive D3 graph.

P.s. the graph may not be visible in older versions of internet explorer.


Submitted by DIRKMJK on

D3 uses SVG. I’m not familiar with Android but if I understand correctly, you could install Firefox which should render SVG.

ATMs and cycle paths

The habits of cyclists are shaping cities like Amsterdam. “There are many ATMs along the main bicycle path network”, urban planner Marco te Brömmelstroet told Vogelvrije Fietser, the magazine of cyclists’ organisation Fietsersbond.

The map above shows Amsterdam’s main cycle path network (provided by the city as open data) and the location of ATMs. It appears that many ATMs are indeed located near cycle paths. Exceptions include shopping areas such as the Kalverstraat, Gelderlandplein and Bijlmerplein. (I tried to calculate the distance between ATMs and cycle paths but I couldn’t get this to work in QGIS.)

Data viz course assignment: bailout and votes

The fourth assignment of the data visualisation course was to do something with this data on unemployment in US states, published by the Guardian Data Blog. My project could be summarised as ‘It’s the unions, stupid’.

P.S. I didn’t post my work for the third assignment on this blog. I’m afraid it wasn’t any good.

Update - Elsewhere, the impact of the bailout on the election is questioned as well.


Clint Eastwood won the LAUGHTER contest

I’m not sure what this says about the audiences at US national party conventions, but among a sample of 16 speeches, Clint Eastwood’s was the one that elicited the most laughter (Rand Paul’s got most applause). Among the presidential candidates, Obama won the applause contest, while being about equally funny as Romney.

For the second lesson of Alberto Cairo’s online data visualisation course, we were asked to comment on and perhaps redesign this convention word count tool created by the NYT. I wouldn’t be able to do such a cool interactive thing myself (I got stuck in the jQuery part of Codeyear), so I decided to focus on differences between individual speeches instead.

First I needed the transcripts – preferably from one single source to make sure the transcription had been done in a uniform way. As far as I could find, Fox News has the largest collection of transcripts online. As a result, Republican speakers are overrepresented in my sample, but that’s ok because the key Democratic speakers are included as well.

I wrote a script to do the word count (I’m sure this could be done in a more elegant way). One problem with my script was that html-code got included in the total word count. I thought I could correct this by subtracting 1,000 from each word count, but this didn’t work so well, so I had to make some corrections.

This assignment was a bit of a rush job so I hope I didn’t make any stupid mistakes.

Data visualisation course assignment

As part of Alberto Cairo’s data visualisation course, we’ve been asked to take a look at this graphic of social media use in selected countries and see how it can be improved. What struck me most (although this probably would not surprise social media experts) is the high level of activity in emerging economies. Above is my reinterpretation of the data. As a general indicator of social media use, I calculated the average of the listed types of social media use (upload photos; upload videos; manage profile; blogging; microblogging). Note that the data are from 2009.



Submitted by Francis on

I had done something similar, although without averaging the services. You made me realize that I had made a mistake in my data. My conclusion is similar to yours, also showing how China lags in management of social network profiles. In the discussion forums, a student from Hong Kong made interesting observations on this.

Submitted by DIRKMJK on

Thanks for pointing me to that discussion - as well as your own graph!

How did Tinkebell obtain all that personal information

Almost three years ago, artists Tinkebell and Coralie Vogelaar published the book Dearest Tinkebell, in which they revealed the identity, photos, addresses and all sorts of embarrassing personal information about people who had sent hate mail to ‘cat murderer’ Tinkebell. The book is again drawing attention because of an article in the Guardian.

How did Tinkebell go about investigating the people who had made threats against her? “By checking whether the email addresses were registered at other websites as well, she could easily discover the identity of many of the people who had made threats against her”, the Volkskrant wrote. In this way, she got access to ‘Facebook profiles, Amazon wish lists and Youtube accounts’.

Of course, it wasn’t as easy as the Volkskrant suggests. In a supplement to the book, Vogelaar describes five steps to find out the identity of a mailer. Step 1 simply consists in googling the email address. “Often this only resulted in comments on blogs and sometimes a small profile but rarely in a full name.”

Apparently, the interesting information didn’t usually surface until step 2, in which the email addresses were linked to the Rapleaf database (steps 3 to 5 are mainly about verifying the information). When Tinkebell and Vogelaar published their book, nobody had heard about that company. That changed in 2010, when the Wall Street Journal created a bit of a fuss with a series of articles on the trade in personal information, under the title ‘What they know’.

One of the main companies active on this market is Rapleaf, which at the time claimed it had one billion email addresses at its disposal. These addresses are linked with data on your social network activity, your purchases and other information. In this way, the company builds a detailed profile of you. A spokesperson said at the time that Rapleaf never reveals people’s names to clients, but Vogelaar and Tinkebell had already shown that you can easily obtain someone’s identity with the data provided by the company – and much more.


In 2010, the WSJ caused a bit of a stir by describing how companies like Rapleaf deal in very detailed personal information, gathered online. A year and a half earlier, artists Tinkebell and Vogelaar had already demonstrated how Rapleaf’s databases can be used to expose the identity, photos, addresses and embarrassing personal details of people who had sent threat mails to ‘cat murderer’ Tinkebell (see also the Guardian on their project).