champagne anarchist | armchair activist

Can Twitter predict the new Dutch trade union president

Number of tweets in which candidates are mentioned


According to an American study, you can predict the outcome of elections by simply counting how often the names of the candidates are mentioned on Twitter. Members of the Dutch union confederation FNV are currently voting for their new president (it has been claimed this is the first time in the world union members get to directly elect their confederation president). Would it be possible to predict who will be the new FNV president using Twitter?

Since last Friday, I’ve been collecting the tweets containing the term ‘FNV’; so far, there are over 2,500. In those tweets, the incumbent Ton Heerts is mentioned 204 times, whereas his challenger Corrie van Brenk is mentioned 146 times. In short, if Twitter is a good predictor (which of course is a matter for debate), the contest is tighter than one might have expected.

The graph above shows the results for the days for which complete data is available. On Saturday, Van Brenk got some attention because something she had said had been fact checked (and found to be correct). On Sunday, Heerts was mentioned because he appeared on a TV show hosted by Eva Jinek. On 1 May, it was officially announced who the candidates are and they had a debate.

Update - Updated to include 13 May, the final voting day. In sum, Van Brenk was mentioned 497 times and Heerts 631. It has since been announced that Heerts has won the election (of course, this doesn’t necessarily mean that the method is sound; in order to make such claims one would need to evaluate a fair amount of predictions).
Influences reflected in the graph include: Factcheck confirms Van Brenk statement (27 April); Heerts in Eva Jinek TV show (28 April); candidates officially announced (1 May); debate in Buitenhof TV show (5 May); problems at tax authorities that Van Brenk’s Abvakabo FNV had warned about (6 May); Van Brenk interview at Nu.nl (9 May); Van Brenk in radio show (10 May); Heerts at presentation of initiative to train technical staff (13 May); EenVandaag TV show poll predicts Heerts will win (13 May).
The graph may not be visible in older versions of Internet Explorer.

Method

I collected tweets using the Twitter Streaming API (the ‘firehose’), in the way described here. I prepared the data using Python and analysed it using R (find the code on Github). The graph was created with D3.js.
I looked into how influential twitterers are (how many followers, how often listed) and into their backgrounds (e.g., do they mention ‘fnv’ in their profile). The most important finding is that twitterers who mention Van Brenk, more often mention ‘abva’ or ‘akf’ in their profile - not surprising since Van Brenk is currently president of Abvakabo FNV, the public sector union affiliated to the FNV.
The American study on Twitter as a predictor of election outcomes was done by DiGrazia c.s. and can be found here. Some remarks on their study:

  • Yes, twitterers are only a small part of the population and no, they’re not representative of the entire population. Likely, Twitter is dominated by a small, active incrowd. It’s also correct that tweets mentioning a candidate need not endorse them; they may as well be critical. Despite all this, DiGrazia c.s. found that mentions on Twitter consistently predict election outcomes. Perhaps they are an indicator of something else - e.g. media attention or how actively people are campaigning for a candidate.
  • Of course, this method doesn’t provide any certainty on who will win. It’s possible for a candidate to get almost 100% of the tweet share and still lose (at least, that’s what the scatterplots of DiGrazia c.s. suggest).
  • It’s unclear to what extent the conclusions of the American study can be generalised to other situations. It’s therefore a bit of a gamble to use this method to predict who will be the next president of the FNV.

Comments

Submitted by Karissa McKelvey on

I am second author on the study, and I wanted to clarify - we only looked at names, such as "John Boehner," and did not also restrict to other strings like "FNV" in your case. The more parameters you add, it is possible you are eliminating larger portions of the sample.

Submitted by DIRKMJK on

Thanks for clarifying, Karissa. It’s a bit of a puzzle, how to include messages like ‘we want ton’ (a slogan used by Ton Heerts supporters) yet exclude all irrelevant tweets containing the string ‘ton’ (e.g., retweets of ‘@transportonline’). So I guess you’re right, filtering by ‘fnv’ is practical but not necessarily the optimal approach. Incidentally, I know you used a huge sample of tweets collected over a much longer period; I was wondering what the range was of the number of times candidates were mentioned in your study?

Follow this blog:
Twitter (English) | Twitter (Nederlands) | RSS data blog (English) | RSS dirkmjk (Nederlands)