My first Python package

As a self-taught programmer, I sometimes feel a bit uneasy about the code I write. Sure, it may work, but there’s probably a more efficient and more elegant way to do it. These doubts notwithstanding, I’ve just published my first Python package: limepy.

Its purpose is simple: it helps you process and summarise LimeSurvey data. LimeSurvey is a survey tool, somewhat similar to Surveymonkey. It’s different in that it’s open source, and probably more versatile.

If you download survey data as a csv, the answers to question types such as multiple choice questions or blocks of questions (‘arrays’) will be spread out over multiple columns. One task of limepy is to make sure all the data for a specific item end up in one table.

Limepy will also help you with a number of other tasks, like downloading survey data, creating a codebook, printing answers to open-ended questions and printing the answers of an individual respondent.

Find the package on Github and PyPI. Install with pip install limepy. Feedback welcome.

Delete Facebook

This is becoming a bit of a tradition: me writing about people who make a New Year’s resolution to quit Facebook. The story is simple: around the turn of the year, there’s a peak in people googling how to quit smoking, but there’s an even larger peak in people trying to figure out how to delete their Facebook account.

But this year, the story is a bit more complicated (and more interesting).

Google Trends data isn’t available yet for the last days of the year, so there’s no new peak in searches for “quit smoking” yet. Other than that, the yearly pattern is dwarfed by a huge peak in search volume for “delete Facebook” in the week starting on 18 March. What happened?

The Guardian has helpfully created an overview of Facebook-related incidents during 2018; I’ve added a few stories that also seemed relevant (for sources, see Method below; thanks to Vicki Boykis for the suggestion to annotate the Google Trends chart).

No surprise: the largest peak in “delete Facebook” searches happened a few days after the publication of the Cambridge Analytica story on 17 March. The news resulted in a veritable #deletefacebook campaign, although according to Mark Zuckerberg, «I don’t think we’ve seen a meaningful number of people act on that.»

Arwa Mahdawi has argued that deleting your Facebook account isn’t a bad New Year’s resolution, even though it probably won’t change how the company operates: «Facebook’s abuse of power isn’t a problem that we can solve as individuals. Technology giants must be regulated.»

So how much impact did the controversy have on Facebook? One way to try and answer this is to look at the share price.

The pattern for Facebook is rather interesting. The share price dropped after the publication of the Cambridge Analytica story, but quickly picked up again. But then it took a plunge on 25 July, resulting in ‘the biggest-ever one-day wipeout in U.S. stockmarket history’.

One possible interpretation is that investors initially thought the Cambridge Analytica story wasn’t going to harm Facebook’s profits. But when Facebook published its Q2 earnings report, they were shocked to learn that user growth had stalled.

But the chart also shows that all major tech companies saw their share prices go down. This suggests there’s more going on than users leaving Facebook. In addition to broader economic trends, a likely explanation is that investors fear more government regulation of major tech companies in response to the controversies they are involved in (and also to their dominant market position). While this may not be the whole story, it does seem to support Mahdawi’s view about the key role of regulation.


Note that Google Trends data should be interpreted with caution because Google doesn’t provide much detail on the methodology used to produce the data.

For periods longer than three months, only weekly data can be downloaded. For the 2018 chart I wanted daily data. As suggested here, I downloaded three-month batches with overlapping data and then used the overlapping dates to calculate a ratio to adjust the scales. Here’s the code:

import pandas as pd
import numpy as np
def stitch(df1, df2):
    df1.index =
    df2.index =
    overlapping = [d for d in if d in list(]
    ratios = [df1.loc[d, 'delete facebook'] /
              df2.loc[d, 'delete facebook']
              for d in overlapping]
    ratio = np.median(ratios)
    for var in ['delete facebook', 'quit smoking']:
        df2[var] *= ratio
    df = pd.concat([df1, df2[]])
    return df
df = dfs[0]
for df2 in dfs[1:]:
    df = stitch(df, df2)

I used this Guardian article as my main source on Facebook-related incidents in 2018. I added a few from other sources: in April, Facebook announced 87 million people had been affected by the Cambridge Analytica scandal. Subsequently, it announced that it would notify people who had been affected. Dutch comedian Arjen Lubach organised a Bye Bye Facebook event (reminiscent of the 2015 Facebook Farewell Party). In September, Pew found that one in four Americans had deleted the Facebook app from their phone; and later that month a Chinese hacker threatened to delete Mark Zuckerberg’s Facebook account.


Who use the Dafne Schippers bicycle bridge

If all goes well, the Dafne Schippers bicycle bridge in Utrecht should reopen on Monday, after a short closure for maintenance. I have a special affinity with this bridge: it opened on the day I started working in Leidsche Rijn, west of the Amsterdam-Rhine Canal, and it’s part of my favourite cycle route to work.

Who else use this bridge? With the usual caveats, data of the Fietstelweek can provide some insights. The charts below show, for each direction of traffic, at what time cyclists use the bridges across the canal.

There’s a morning peak in cyclists crossing the canal from Leidsche Rijn (west) to the city centre (east), and a peak in cyclists going the opposite direction around 5 pm. This suggests that the bridges are popular among commuters from Leidsche Rijn. That doesn’t really come as a surprise: if you cycle to Leidsche Rijn during the morning rush hour, you ride past huge numbers of cyclists going in the opposite direction.

The map below shows the routes of cyclists using the bridges. From top to bottom: Hogeweidebrug (or Yellow Bridge), Dafne Schippers bridge and De Meern bridge.

It appears that many cyclists use the bridges to go to the area around Central Station. Users of the De Meern and Dafne Schippers bridges tend to use nice routes that converge along the Leidseweg. Users of the Yellow Bridge use the not-so-nice route along Vleutenseweg, or the slightly better route along the railway track.

Research has shown that cyclists don’t always prefer the shortest route to their destination; the quality of the cycle tracks also plays a role.

Yet the map suggests that many cyclists opt for the shortest route, even if a nicer alternative is available. For example, few cyclists from the northern part of Leidsche Rijn seem to use the Dafne Schippersbrug, or the route along Keulsekade (the latter avoids long waits at traffic lights).

See also this analysis by DUIC, which shows that the bridge is not only popular among cyclists, but also among runners, which is fitting given the name of the bridge.


Logos of rider unions

A nice map circulating on Twitter (here, here and here, via) shows where food delivery workers are organising. Many of their logos proudly feature bicycle parts. The Finland-based Foodora campaign is the exception; their logo appears to have been inspired by Alexander Rodchenko’s КНИГИ poster. Also note the elegant logo of Collectif des coursier-e-s / KoersKollectief.

While their fight is about the future of work, some of these groups are independent of established trade unions - and some don’t consider themselves trade unions in the first place. Riders have used wildcat strikes and other forms of direct action, as well as initiatives such as crowdfunding a strike fund. With employers like Deliveroo trying to «disrupt» the labour market, it makes sense that their workers don’t play by the rules either, it has been argued.

Unfortunately, I couldn’t find an example of the Swiss fiery backpack logo.

UPDATE - added logos from Scotland and Finland

Which Amsterdam neighbourhoods might qualify for an Airbnb ban

Last week, city council member Sofyan Mbarki (Social-Democrats) proposed a motion to ban holiday rentals in Amsterdam neighbourhoods such as the Haarlemmerbuurt, the Kinkerbuurt and the Wallen. A concentration of holiday rentals results in rising house prices, lower social cohesion, increasing pressure on the housing market and inequality, he argued. The motion has support from a majority of the council.

The city government is inclined to implement the motion, but alderman Laurens Ivens (Socialist Party) wants to study the legal aspects. He considers the neighbourhoods mentioned in the motion good candidates for a ban on holiday rentals, but he doesn’t rule out that other neighbourhoods may be selected.

So what neighbourhoods might qualify? One criterion might be Airbnb density, which is shown on the map below (for caveats see Method below).

Unsurprisingly, neighbourhoods with high Airbnb density overlap with areas where residents complain about holiday rentals: Centrum-West, Centrum-Oost, Westerpark, Oud-West/De Baarsjes and De Pijp/Rivierenbuurt (source).

Airbnb frequently claims that it contributes to tourist dispersion because many hosts are located outside the city centre. However, the map suggests that Airbnb is in fact heavily concentrated in neighbourhoods such as the Wallen, the Jordaan, the Pijp and the Kinkerbuurt. While some of these neighbourhoods are outside the city centre, the pattern appears to be concentration rather than dispersion.

While these neighbourhoods would be likely candidates for a ban on holiday rentals, Ivens may also want to anticipate future developments. A number of neighbourhoods still have a relatively low Airbnb density, but have seen their density double or even almost triple over the past three years: Transvaalbuurt, Hoofdweg e.o., Van Galenbuurt and Westindische Buurt.

UPDATE - It was rightly pointed out that Airbnb density partly reflects housing density. An alternative measure would be Airbnb relative to addresses or population. However, this would result in high values for some areas with low population density where holiday rentals don’t appear to be perceived as much as a problem as in some of the more densely populated areas.

See also:

  • Is tourist dispersion working? An analysis of Lonely Planet maps
  • Airbnb’s agreement with Amsterdam: some insights from scraped data


Both Murray Cox’s Inside Airbnb and Tom Slee provide data collected by scraping the Airbnb website. While this data has some limitations, it’s probably the best publicly available data source on Airbnb. Since Tom Slee stopped collecting data last year, I used Inside Airbnb data for the current article. A discussion of methodological aspects related to that data is here.

In addition, I used land surface data from Statistics Netherlands (CBS). This data is for 2017.

I calculated an indicator for Airbnb density in the following way:

  • I assigned each listing to a neighbourhood (note that coordinates for listings aren’t 100% accurate as discussed by Cox);
  • For each listing, I calculated an indicator for the number of stays as: reviews per month (an indicator of the number of rentals) * the minimum length of stay (capped at 3 nights following this study) * the number of beds (an indicator for the number of guests, capped at 4 because that’s the maximum number of guests allowed by local regulations);
  • I summed that number for each neighbourhood and divided that by the land surface of the neighbourhood (ha).

Note that the indicator for the number stays will not be equal to the actual number of stays, for a number of reasons:

  • It’s possible that not all beds are occupied;
  • Not all guests write a review (Cox suggests the number of rentals could be twice as high as the number of reviews);
  • People may stay longer than the minimum number of nights;
  • Sometimes more than four people may stay in an Airbnb, despite the fact that that’s not allowed;
  • For some listings, the indicator could not be calculated because of missing data (about 11.4%).

According to Airbnb, the number of stays in Amsterdam is 2.5 million. Based on that number, the actual number of stays would be about 3 times as high as the indicator for the number of stays I calculated. Given the considerations listed above, that’s more or less what one would expect.

Python script here.