champagne anarchist | armchair activist

Dutch governments consider using Strava data

Strava is a popular app to record bicycle rides. For some years, the company has been trying to sell its data to local governments for traffic planning. NDW, a platform of Dutch governments including the city of Amsterdam, has bought six months’ worth of Strava data to give it a try.

The switch to Strava may mean the end of the Fietstelweek, an annual one-week effort to collect bicycle data from thousands of volunteers. In the past, I’ve used Fietstelweek data to analyse waiting times at traffic lights. The Fietstelweek received funding from the same governments that are now experimenting with Strava data.

One reason why they are looking for alternatives is that the number of Fietstelweek participants is lower than they’d like. They seem to have a point. Consider for example the map below, which shows bicycle routes to and from Amsterdam Central Station.

As such, it’s an interesting map. Unsuprisingly, it seems that intensity is highest near the bicycle parking facilities. Main access routes appear to be the Geldersekade (with the sometimes chaotic crossing with Prins Hendrikkade) and the Piet Heinkade. It seems that people cycling to and from Central Station are somewhat more likely to live in the eastern part of the city.

There’s one caveat though: the numbers are small. Even the busiest segments represent at most 40 rides. One loyal Fietstelweek participant recording her commute during the entire week could literally change the map.

Strava has far larger numbers, but its data raises different kinds of questions. Strava calls itself ‘the social network for athletes’ and wants to know if you use a road bike, a mountain bike, a TT bike or a cyclocross bike (no option ‘other’ available). So how representative is Strava data of people who use their city bike for commutes and other practical purposes?

Strava’s response to such questions is that they’re trying to make the app less competition-focused and more social, with Facebook-like features. This should help them collect data about ‘normal’ bike rides. They have also argued that «especially in cities, those with the app tended to ride the same routes as everyone else».

But is that really true? Strava’s heatmap (choose red and rides) for Amsterdam could perhaps be interpreted as a combination of recreational rides (Vondelpark, Amstel) and cyclists trying to get in or out of the city as quickly as possible (plus quite a few people who recorded their laps at the Jaap Eden ice skating rink as bicycle rides).

Perhaps you could find a way to filter out ‘lycra’ rides and end up with a sufficient number of ‘normal’ rides. Then again, almost three-quarters of bicycle rides in the Netherlands are under 3.7 km, and I suspect very few of those short rides end up on Strava.

There’s also a socio-economic aspect. It has been argued that Strava is used most by people living in wealthier neighbourhoods, which aren’t necessarily the neighbourhoods most in need of better cycling infrastructure.

Of course, bicycle use is unequal in the first place, which is also reflected in Fietstelweek data. The map below shows the start and end points of rides for Amsterdam.

Density is highest in the area within the ring road and south of the IJ. The number of trips per 1,000 residents also correlates with house values: more bicycle trips start or end in affluent neighbourhoods. As said, this probably reflects actual patterns in bicycle use and not a problem of the data.

To summarise, Fietstelweek has smaller numbers than one would like, while Strava data raises questions about representativeness. One way for Strava to help answer these questions would be to make a subset of its Amsterdam data available as open data.

This Python script shows how the analysis was done.

The Digital City

Amsterdam has a new coalition agreement. The paragraph on democratisation and the Digital City was well received - a London-based researcher from Amsterdam liked the plans so much she decided to translate them into English.

The new coalition wants to create a democratic version of the smart city. Citizens should be in control of their data. The city will support co-operations that provide an alternative to platform monopolists. An information commissioner will see to it that the principles ‘open by default’ and ‘privacy by design’ are implemented.

The agreement also lists a number of issues the city is (or was) already working on:

  • City council information will be opened up. In 2015, the city council asked to make documents such as council meeting reports, motions, written questions etc available as open data. Since, some of that data has been made available through Open Raadsinformatie, but as yet no solution has been found for offering all council information in a machine readable form.
  • Freedom of information requests (Wob requests) will be published. Amsterdam started publishing decisions on Wob requests earlier this year; so far three have been [published][wob].
  • Interestingly, Amsterdam wants to use open source software whenever possible. Over ten years ago, Amsterdam planned a similar move, and its plans were sufficiently serious to needle Microsoft. In 2010, the plans foundered on an uncooperative IT department.

All in all, a nice combination of new ambitions and implementation of ‘old’ plans.

Dutch government drops 3D pie charts

The Dutch government has replaced pie charts with bar charts in it’s annual reports, someone noted on Twitter (via @bokami). Pie charts aren’t always a bad choice - contrary to the view of some adherents of the stricter school in data visualisation. But 3D pie charts are really hard to justify and it’s a bit awkward they were still used one year ago.

The charts are from the 2016 and 2017 reports of the Department of Social Affairs and Employment.

Tags: 

How to use Python and Selenium for scraping election results

A while ago, I needed the results of last year’s Lower House election in the Netherlands, by municipality. Dutch election data is available from the website of the Kiesraad (Electoral Board). However, it doesn’t contain a table of results per municipality. You’ll have to collect this information from almost 400 different web pages. This calls for a webscraper.

The Kiesraad website is partly generated using javascript (I think) and therefore not easy to scrape. For this reason, this seemed like a perfect project to explore Selenium.

What’s Selenium? «Selenium automates browsers. That’s it!» Selenium is primarily a tool for testing web applications. However, as a tutorial by Thiago Marzagão explains, it can also be used for webscraping:

[S]ome websites don’t like to be webscraped. In these cases you may need to disguise your webscraping bot as a human being. Selenium is just the tool for that. Selenium is a webdriver: it takes control of your browser, which then does all the work.

Selenium can be used with Python. Instructions to install Selenium are here. You also have to download chromedriver or another driver; you may store it in /usr/local/bin/.

Once you have everything in place, this is how you launch the driver and load a page:

from selenium import webdriver
 
URL = 'https://www.verkiezingsuitslagen.nl/verkiezingen/detail/TK20170315'
 
browser = webdriver.Chrome()
browser.get(URL)

This will open a new browser window. You can use either xpath or css selectors to find elements and then interact with them. For example, find a dropdown menu, identify the options from the menu and select the second one:

XPATH_PROVINCES = '//*[@id="search"]/div/div[1]/div'
element = browser.find_element_by_xpath(XPATH_PROVINCES)
options = element.find_elements_by_tag_name('option')
options[1].click()

If you’d check the page source of the web page, you wouldn’t find the options of the dropdown menu; they’re added afterwards. With Selenium, you needn’t worry about that - it will load the options for you.

Well, actually, there’s a bit more to it: you can’t find and select the options until they’ve actually loaded. Likely, the options won’t be in place initially, so you’ll need to wait a bit and retry.

Selenium comes with functions that specify what it should wait for, and how long it should wait and retry before it throws an error. But this isn’t always straightforward, as Marzagão explains:

Deciding what elements to (explicitly) wait for, with what conditions, and for how long is a trial-and-error process. […] This is often a frustrating process and you’ll need patience. You think that you’ve covered all the possibilities and your code runs for an entire week and you are all happy and celebratory and then on day #8 the damn thing crashes. The servers went down for a millisecond or your Netflix streaming clogged your internet connection or whatnot. It happens.

I ran into pretty similar problems when I tried to scrape the Kiesraad website. I tried many variations of the built-in wait parameters, but without any success. In the end I decided to write a few custom functions for the purpose.

The example below looks up the options of a dropdown menu. As long as the number of options isn’t greater than 1 (the page initially loads with only one option, a dash, and other options are loaded subsequently), it will wait a few seconds and try again - until more options are found or until a maximum number of tries has been reached.

MAX_TRIES = 15
 
def count_options(xpath, browser):
 
    time.sleep(3)
    tries = 0
    while tries < MAX_TRIES:
 
        try:
            element = browser.find_element_by_xpath(xpath)
            count = len(element.find_elements_by_tag_name('option'))
            if count > 1:
                return count
        except:
            pass
 
        time.sleep(1)
        tries += 1
    return count

Here’s a script that will download and save the result pages of all cities for the March 2017 Lower House election, parse the html, and store the results as a csv file. Run it from a subfolder in your project folder.

Notes

Dutch election results are provided by the Kiesraad as open data. In the past, the Kiesraad website used to provide a csv with the results of all the municipalities, but this option is no longer available. Alternatively, a download is available of datasets for each municipality, but at least for 2017, municipalities use different formats.

Scraping the Kiesraad website appears to be the only way to get uniform data per municipality.

Since I originally wrote the scraper, the Kiesraad website has been changed. As a result, it would now be possible to scrape the site in a much easier way, and there would be no need to use Selenium. The source code of the landing page for an election contains a dictionary with id numbers for all the municipalities. With those id numbers, you can create urls for their result pages. No clicking required.

Tags: 

The pay gap between CEO and workers

The Dutch Corporate Governance Code has been revised. As a result, Dutch listed companies have started to report their internal pay ratios. These pay ratios are often thought of as the gap between CEO pay and what other workers in the firm get paid.

How can you use these ratios? One way is to create a Fat Cat Calendar. The inspiration here is the British phenomenon of Fat Cat Day: the day when CEOs have earned as much as their employees will earn in an entire year.

The calendar shows that by 2 January at quarter to five in the afternoon, the Heineken CEO had earned more than his workers over all of 2017.

Some firms are missing from the calendar. For example, Euronext says it’s complicated to calculate a pay ratio, because they operate in different countries (but then, this would apply to most listed companies). Shell has calculated a pay ratio, but they don’t report how high it is - only how it compares to the pay ratios of a selection of other companies.

Note that a pay ratio is not a simple and straightforward figure. Below is an exploratory analysis of some of the issues involved (see Method section below for caveats).

Heineken

Heineken, which has by far the highest reported pay ratio in my sample (215), argues this is a consequence of their business model.

First, Heineken does a lot of business in ‘emerging markets with widely different pay levels and structures compared to the Netherlands and Europe’. In other words, many workers are in low-wage countries. The underlying suggestion seems to be that you don’t have to pay these workers the same wages as their colleagues in Europe.

Second, Heineken has ‘a large number of breweries and sales forces in-house worldwide, which adds to the variety of pay within the Company’ (the treatment of Heineken sales staff in Africa is controversial, but that’s a different matter). Aside from the question whether Heineken are paying their brewery and sales workers enough, they do have a point here.

Many companies have simply outsourced their low-paid workers. It’s a bit arbitrary to compare CEO pay to employees of the firm but not to the outsourced workers who clean their offices, serve their lunches, fix their computers, and manufacture and sell their products.

Third, Heineken says that pay ratios can be very volatile because of the substantial bonuses that go up and down. One might argue that this is not a problem of the pay ratio, but a problem of the composition of CEO pay.

Anyone can calculate a ratio

According to the Corporate Governance Code, firms should report the ‘ratio between the remuneration of the management board members and that of a representative reference group determined by the company’. This leaves ample room for firms to decide how to calculate the ratio. For that reason, comparing ratios between firms is a bit tricky.

In practice, many firms calculate the pay ratio as the ratio between total CEO pay and total staff costs per FTE. This is information that is normally included in the annual report, so you can calculate the pay ratio yourself.

I calculated pay ratios for a sample of companies listed in Amsterdam and compared these to the pay ratios these companies reported in their annual report. The chart below shows how my calculated pay ratios compare to the reported pay ratios.

In many cases, my calculated pay ratios are quite similar to the pay ratios reported by the companies. It’s interesting to look into some of the examples where this is not the case, and why this might be:

  • Some CEOs were appointed during 2017 and therefore weren’t paid an entire year’s remuneration. I didn’t correct for this, so my calculated pay ratio is too low in these cases. An example is AkzoNobel, which correctly reported a higher pay ratio than the one I calculated.
  • Randstad probably used only corporate employees and not ‘candidates’ (temp workers) to calculate their pay ratio. This would explain why they arrived at a smaller gap between CEO and workers than the one I calculated.
  • Other companies also use a subset of their employees for their calculation. Assuming these are relatively well-paid employees, the gap between CEO and workers will appear smaller that way. An example is OCI, which used only employees in Europe and North-America as a reference group. Unilever takes this approach one step further by comparing CEO pay to their various UK and Dutch management work levels. This resulted in a range of pay ratios, each far smaller than the one I calculated.
  • A few companies use the average remuneration of the entire executive board, rather than CEO remuneration, to calculate the pay ratio. Assuming the CEO earns more than the other board members, this will also make the gap with workers appear smaller. An example is AMG, which argues that using average board remuneration is appropriate ‘given the collective management responsibility of the Management Board members’ (this is somewhat ironic, for ‘collective responsibility’ was apparently not a key consideration when they decided to pay the CEO 50% more than the other board members).

Discussion

I think it would be fair to say that the requirement to report pay ratios is a failed attempt at transparency. Since firms can use whatever method they want to calculate the ratio, comparisons across firms are problematic. Meanwhile, anyone can calculate ratios from data that is already available. While this isn’t entirely unproblematic (see Method section below), the ratios you calculate will probably be more consistent across firms than the reported ratios.

Instead of requiring firms to report a pay ratio, it would make more sense to require them to report CEO pay, staff costs and staff numbers in a more consistent and transparent way.

Further, it’s somewhat arbitrary to compare CEO pay only to employees of the firm, and not outsourced workers such as cleaners. A fair measure of inequality should look beyond the employees of the firm. One option is to compare CEO pay to the median income in a country; another is the norm which states that CEO pay should not be higher than 20 times the legal minimum wage. Of course, a limitation of these approaches is that they don’t capture international inequality.

The highest CEO-to-minimum wage ratio for the firms in my sample is 577 (Unilever). By 1 January 2017 at quarter past three in the afternoon, the CEO of Unilever had earned a the minimum wage for an entire year. If you want to narrow this gap, there are broadly two ways to do it: show more restraint in CEO remuneration, and raise the minimum wage.

Method

I calculated ‘Fat Cat Day’ by simply adding one year, divided by the pay ratio, to 1 January 2017:

d1 = dt.datetime.strptime('2017-01-01', '%Y-%m-%d')
d2 = dt.datetime.strptime('2018-01-01', '%Y-%m-%d')
year = d2 - d1
 
dates = defaultdict(list)
for i, row in df.iterrows():
    if pd.notnull(row.ratio_reported_17):
        ratio = row.ratio_reported_17
        fcd = d1 + year / ratio
        date = dt.datetime.strftime(fcd, '%Y-%m-%d')
        time = dt.datetime.strftime(fcd, '%H:%M')
        company = row.company
        item = {
            'time': time,
            'company': company
        }
        dates[date].append(item)

Note that the British High Pay Centre has a far more elaborate method to calculate Fat Cat Day, which considers how many hours CEOs work, whether or not they work weekends, etcetera. While this is very conscientious, I don’t think it’s necessary. CEOs are paid on an annual basis, and if they work less than a full year, they’ll generally receive a (more or less) proportional share of their annual pay, regardless of the actual number of hours worked.

I googled for annual reports of companies listed on the Amsterdam stock exchange (in a few cases, I also looked up a separate remuneration report). I didn’t always find one, which can be for a number of reasons: perhaps I didn’t look hard enough; perhaps companies hadn’t filed their report yet; or perhaps they filed it with the company register but didn’t publish it online. As a quick filter, I disregarded reports that don’t contain the term pay ratio. All in all, this is a rather pragmatic sample, which may not be representative of all Amsterdam-listed companies (for example, it’s conceivable that firms with stronger roots in the Netherlands are more inclined to comply with the Corporate Governance Code). For exploratory purposes, I think that’s ok.

I calculated the pay ratio dividing total CEO pay by the average total staff costs per FTE. This may sound straightforward, but it isn’t:

  • I had to manually copy data from pdfs, so errors can’t be excluded;
  • Different methods may be used to calculate CEO pay (I used the total amount as reported by the company, without checking the method they used to calculate it);
  • It’s not always clear which categories of workers are included in staff costs and staff numbers;
  • If possible I used FTE for staff numbers, but sometimes only headcount is reported and some reports don’t specify what unit they used;
  • If possible I used the average number of staff, but sometimes only the number of staff at the end of the year is reported;
  • I didn’t annualise CEO pay for CEOs who were appointed during 2017. It might seem simple to do so (total pay * 365 / number of days worked), but in some cases CEO pay appears to contain elements that are not dependent on the number of days worked (e.g., the annual incentive at Philips Lighting).

All but two companies in my sample use EUR as presentation currency. I pragmatically used the average exchange rate for 2017 as reported by OCI to convert USD to EUR.

For the minimum wage, I used the average of the rates per 1 January and 1 July 2017, and added 8% holiday pay.

The ‘20 times minimum wage’ norm has been ascribed to trade union FNV, but that’s not technically correct.

The data I used can be found in this csv file. Please let me know if you find any errors.

Tags: 

Pages