Open Data

Vierde plaats in de NPO app-competitie

Sinds enige tijd kan je via een API programmagegevens van de publieke omroep opvragen en er is een app-competitie georganiseerd om de mogelijkheden van de API te promoten. Een heuse app is wat hoog gegrepen, maar een web app mocht ook. Dus toen ik naar aanleiding van dit artikel nog eens op de competitie werd gewezen dacht ik: what the heck, ik stuur gewoon iets op.

Hier is de app te vinden. En dit is de omschrijving waarmee ik hem heb ingediend:

De app gaat over de rol van de publieke omroep in de samenleving. Wat zegt het dat thema’s als TTIP of het basisinkomen pas relatief laat aandacht krijgen van de NPO, terwijl het zoekgedrag bij Google suggereert dat die onderwerpen al een tijdje in de belangstelling stonden? Maakt de NPO zich wel voldoende los van de gevestigde denkkaders? En wat valt er te zeggen over onderwerpen waar de NPO juist relatief vroeg al aandacht aan besteedde, zoals de Arabische Lente? Uiteraard kan de app deze vragen over de rol van de NPO niet beantwoorden. Het is niet meer en niet minder dan een hulpmiddel bij de discussie.

OK, wereldschokkend is het niet, maar het was een leuke klus om de webpagina aan de praat te krijgen (met een combinatie van PHP en D3.js). Ondertussen is de uitslag bekend: nummer 4. Cool. En vooral ook cool dat NPO en Hack de Overheid actief hun open data promoten.

Amsterdam: raadslid wil informatie over besluitvorming toegankelijk maken

Een tijdje terug heb ik een analyse gemaakt van het stemgedrag over moties en amendementen in de gemeenteraad van Amsterdam. Het was makkelijk om aan de gegevens te komen: op de website van gemeenteraad kan je een Exceldocument downloaden met uitslagen vanaf 2013. Maar als je de gegevens wil analyseren stuit je al snel op problemen. De manier waarop stemmingen zijn beschreven is allesbehalve consistent, waardoor de gegevens flink moeten worden opgeschoond. Ik hou wel van een beetje knutselen met regex, maar dit is eigenlijk niet normaal meer.

Maar vandaag is er goed nieuws: raadslid Zeeger Ernsting heeft een voorstel ingediend voor «radicale transparantie». Hij beschrijft dat veel gemeentelijke informatie op zich openbaar is, maar dat die informatie vaak verstopt zit in pdf-documenten die nauwelijks te vinden zijn als je niet precies weet waar je naar op zoek bent.

Dat moet veranderen. Ernsting stelt daarom voor om zowel raadsinformatie als andere gemeentelijke informatie beschikbaar te stellen als makkelijk doorzoekbare open data.

Het Pythonscript waarmee ik de gegevens over moties en amendementen heb opgeschoond is hier beschikbaar.

Using strava tweets to analyse cycling patterns

A recent report by traffic research institute SWOV analyses accidents reported by cyclists on racing bikes in the Netherlands. Among other things, the data show an early summer dip in accidents: 53 in May, 38 in June and 51 in August. A bit of googling revealed this is a common phenomenon, although the dip appears to occur earlier than elsewhere (cf this analysis of cycling accidents in Montréal).

Below, I discuss a number of possible explanations for the pattern.

Statistical noise

Given the relatively small number of reported crashes in the SWOV study, the pattern could be due to random variation. Also, respondents were asked in 2014 about crashes they had had in 2013, so memory effects may have had an influence on the reported month in which accidents took place. On the other hand, the fact that similar patterns have been found elsewhere suggests it may well be a real phenomenon.

Holidays

An OECD report says the summer accident dip is specific for countries with «a high level of daily utilitarian cycling» such as Belgium, Denmark and the Netherlands. The report argues the drop is «most likely linked to a lower number of work-cycling trips due to annual holidays».

If you look at the data presented by the OECD, this explanation seems plausible. However, holidays can’t really explain the data reported by SWOV. Summer holidays started between 29 June and 20 July (there’s regional variation), so the dip should have occured in August instead of June.

Further, you’d expect a drop in bicycle commuting during the summer, but surely not in riding racing bikes? I guess the best way to find out would be to analyse Strava data, but unfortunately Strava isn’t as forthcoming with its data as one might wish (in terms of open data, it would rank somewhere between Twitter and Facebook).

A possible way around this is to count tweets of people boasting their Strava achievements. Of course, there are several limitations to this approach (I discuss some in the Method section below). Despite these limitations, I think Strava tweets could serve as a rough indicator of road cycling patterns. An added bonus is that the length of the ride is often included in tweets.

The chart above shows Dutch-language Strava tweets for the period April 2014 - March 2015. Whether you look at the number of rides or the total distance, there’s no early summer drop in cycling. There’s a peak in May, but none in August - September.

Sunset

According to the respondents of the SWOV study, 96% percent of accidents happened in daylight. Of course this doesn’t rule out that some accidents may have happened in the dusk and there may be a seasonal pattern to this.

Many tweets contain the time at which they were tweeted. This is a somewhat problematic indicator of the time at which trips took place, if only because it’s unclear how much time elapsed between the ride and the moment it was tweeted. But let’s take a look at the data anyway.

I think tweets tend to be posted rather early in the day. Also, the effect of switches between summer and winter time is missing in the median post time (perhaps Twitter converts the times to the current local time).

That said, the data suggests that rides take place closer to sunset during the winter, not during the months of May and August which show a rise in accidents. So, while no firm conclusions should be drawn on the basis of this data, there are no indications that daylight patterns can explain accident patterns.

Weather

Perhaps more accidents happen when many people cycle and there’s a lot of rain. In 2013, there was a lot of rain in May; subsequently the amount of rain declined, and there was a peak again in September (pdf). So at first sight, it seems that the weather could explain the accident peak in May, but not the one in August.

Conclusion

None of the explanations for the early summer drop in cycling accidents seem particularly convincing. It’s not so difficult to find possible explanations for the peak in May, but it’s unclear why this is followed by a decline and a second peak in August. This remains a bit of a mystery.

Method

Unfortunately, the Twitter API won’t let you access old tweets, so you have to use the advanced search option (sample url) and then scroll down (or hit CMD and the down arrow) until all tweets have been loaded. This takes some time. I used rit (ride) and strava as search terms; this appears to be a pretty robust way to collect Dutch-language Strava tweets.

It seems that Strava started offering a standard way to tweet rides as of April 2014. Before that date, the number of Strava tweets was much smaller and the wording of the tweets wasn’t uniform. So there’s probably little use in analysing tweets from before April 2014.

I removed tweets containing terms suggesting they are about running (even though I searched for tweets containing the term rit there were still some that were obviously about running) and tweets containing references to mountainbiking. I ended up with 9,950 tweets posted by 2,258 accounts. 1,153 people only tweeted once about a Strava ride. Perhaps the analysis could be improved by removing these.

I had to add 9 hrs to the tweet time, probably because I had been using a VPN when I downloaded the data.

A relevant question is how representative Strava tweets are of the amount of road cycling. According to the SWOV report, about two in three Dutch cyclists on racing bikes almost never use apps like Strava or Runkeeper; the percentage is similar for men and women. The average distance in Strava tweets is 65km; in the SWOV report most respondents report their average ride distance is 60 - 90km.

In any case, not all road cyclists use Strava and not all who use Strava consistently post their rides on Twitter (fortunately, one might add). Perhaps people who tweet their Strava rides are a bit more hardcore and perhaps more impressive rides are more likely to get tweeted.

Edit - the numbers reported above are for tweets containing the time they were posted; this information is missing in about one-third of the tweets.

Here’s the script I used to clean the twitter data.

Scooters often faster than cars

Minister Schultz wants to allow Amsterdam to ban scooters from cycle paths and make them use the road, wearing a helmet. This should make cycle paths safer for cyclists and reduce their exposure to air pollution. However, car and scooter lobbyists argue that the speed difference between scooters and cars is too large for scooters to ride safely on the road, with motorists driving 50 kmph.

So do motorists really make 50 kmph in Amsterdam? «Cycling professor» Marco te Brömmelstroet has tweeted a map showing rush hour speeds far below 50 kmph.

As part of its open data initiative, Amsterdam has released some 5 million speed measurements at the «Hoofdnet Auto» (the network of major roads for cars) during the month of January 2014. The histogram above shows that even at these main roads, the majority of measurements recorded a speed below 50 kmph, with a median speed of 31 kmph. Average speeds during afternoon rush hour were about 5 kmph lower than at night.

A 2011 study by cyclists’ organisation Fietsersbond found found an average speed for scooters on Amsterdam’s cycle paths of 36.9 kmph. The map shows roads where motorists drive on average at least 36.9 kmph (thin red line) or 50 kmph (thick red line). Note that the method by which the Fietsersbond measured scooter speed may be different from the method used to measure car speed.

There have been jokes that scooter riders don’t want to use the road because this would force them to reduce their speed. The data of the Amsterdam government show there’s actually some truth to this.

Scripts for processing the data can be found here.

Are parked cars really dominating Amsterdam’s public space

In an intriguing opinion article in Thursday’s NRC Handelsblad, an author named Fred Feddes suggests banning parked cars from Amsterdam’s city centre. He argues that the current 15,000 parking spaces in the inner city take up 18ha, amounting to as much as 40% of the 45ha public space.

Sure, parked cars use lots of space, but 40%? Apparently, I wasn’t the only one to find that figure incredible. Council member Zeeger Ernsting tweeted:

As much as I endorse the viewpoint, the figure of 40% parking can’t possibly be right.. But indeed, cars [are] still far too dominant

I couldn’t immediately trace Feddes’ source and I’m sure there will be more debate on the issue. For now, here’s a quick and dirty calculation:

  • According to this (pdf) document of the Centrum district, «traffic areas» and green areas amount to 86ha. That’s more than Feddes’ 45ha, although I think the green areas may include some non-public space.
  • The district’s open data site has data on parking spaces (dating from 2010). All types combined, there were some 16,000 of them, slightly more than Feddes’ estimate.
  • Assuming that one parking space takes up 12 to 14m2, this would amount to 19 to 22ha; again slightly more than Feddes’ 18ha.

Perhaps Ernsting could ask the local government to shed some more light on this issue. Meanwhile, my provisional conclusion is that Feddes’ estimate doesn’t seem as incredible as I initially thought. And even if parked cars use only about 25% of public space, that’s still an enormous amount of space if you think about it.

Update 3 January 2015 - in a new article on the issue, Feddes provides more detail on the data he uses. The 45ha public space refers to «traffic terrain» (verkeersterrein) in 2009. CBS data for 2008 also put that number at 45ha. A more recent table (xlsx) indicates that this has since grown to 58ha. Interestingly, these more recent data also differentiate between types of traffic space. Apparently, railways take up 19ha (and according to this pdf, tram and metro tracks haven’t even been included in that category since 1993), leaving only 40ha for road traffic. On the basis of that number, the share of space dominated by (parked) cars would be even larger. Amazing.

Pages