champagne anarchist | armchair activist

Gentrification mapped

The map makers of the City of Amsterdam have created a map that shows the Neighbourhood Street Quota or BSQ. The BSQ plays a key role in a highly controversial reform that is eroding the city’s social ground lease policy, but that’s not the topic of this article. For now, I’m interested in the BSQ as an indicator of land value.

As the city government puts it, «the high BSQs are found at popular locations in the city and the low BSQs at less popular locations in the city» (for details see Method, below). Unsurprisingly, the centrally located Centrum and Zuid districts have high BSQs and the peripheral areas have low BSQs.

More interesting is how the BSQ has changed. The city government has provided data for thousands of streets or street segments, for 2014 and 2016. Of course, this is a short time period and the patterns may or may not reflect longer-term developments.

The chart below shows the distribution of BSQs for flats (as opposed to single-family dwellings) for 2014 and 2016.

The peak has moved to the right, as the median value has risen from 28 to 38. For political reasons, the BSQ can never be lower than 5 or higher than 49, which explains the large number of streets with a value of 5 or 49. This implies that rises in BSQ don’t fully reflect how much land values have risen.

The map below shows how much BSQs for flats have risen in different parts of Amsterdam. I omitted streets with low or high BSQs where substantial changes in BSQ may have been hidden by the upper and lower limits. At the high end, this applies to the Canal Belt and much of the Zuid District. At the lower end, this applies to many peripheral areas including almost the entire Zuidoost District.

Red streets indicate an increase of the BSQ by more than a half; orange streets an increase by less than a half and the rare green streets a decrease of the BSQ. There are some red areas outside the ring road: mainly the IJburg expansion to the east; some parts of Nieuw-West; and Buitenveldert. Buitenveldert is a neighbourhood south of the Zuidas business district with a growing number expats and students among its residents.

Within the ring road, BSQs are rising in areas that are often associated with gentrification, such as the Kolenkit in West, the Vogelbuurt in Noord and the Indische Buurt in Oost. Perhaps more surprising is Betondorp, a low-income area with many older residents, described in 2015 as «one of the few neighbourhoods in Amsterdam not yet affected by the advance of gentrification». If the BSQ is an indication, that may be about to change.

Method

A list (pdf) of BSQs for 2016 and 2014 was recently sent to the city council. The BSQs are referred to as 2018 and 2017, but are based on data from 2016 and 2014 respectively (or to be more precise: the ‘2017 BSQ’ uses data from 2015 or 2014, whichever is lowest). The map created by the City of Amsterdam uses the ‘2017 BSQ’.

For each house, the municipality calculates an individual land quota using the formula: land value / (land value + theoretical cost of rebuilding the house). The land value is obtained by subtracting the rebuilding cost from the total value of the house (WOZ).

Subsequently, BSQs are calculated as the average land quota per street (or street segment if a street traverses multiple neighbourhoods). This is done separately for single-family dwellings and flats.

The interpretation of the BSQ is a bit tricky: one should expect higher land values to be reflected in higher BSQs, but the exact relationship will depend on the value of the building and whether that also responds to changes in land value (for example, because more expensive materials are used).

In my analysis, I only used BSQs for flats, and only the streets or street segments for which a BSQ is available for both 2014 and 2016 (thus excluding new urban expansions).

For the map, I also excluded streets where an increase of the BSQ by less than half may be hidden by the lower or upper limit of the BSQ: those with a 2014 value of 5 and a 2016 value of less than 8; and those with a 2014 value above 32 and a 2016 value of 49.

In creating the map I also ignored long streets that traverse multiple neighbourhoods and that therefore have been separated into multiple segments. Constructing street segments from line geometries representing the entire street seemed like a lot of work (perhaps there’s a simple way to do this, but I couldn’t find it).

I used Tabula to extract data from the original pdf; this Python script to process the data, create a csv for the chart and create a shapefile for the map; D3.js for the chart and Qgis to create the map (using Open Street Map map data and Stamen Toner Lite for the background).

Tags: 

Dutch governments consider using Strava data

Strava is a popular app to record bicycle rides. For some years, the company has been trying to sell its data to local governments for traffic planning. NDW, a platform of Dutch governments including the city of Amsterdam, has bought six months’ worth of Strava data to give it a try.

The switch to Strava may mean the end of the Fietstelweek, an annual one-week effort to collect bicycle data from thousands of volunteers. In the past, I’ve used Fietstelweek data to analyse waiting times at traffic lights. The Fietstelweek received funding from the same governments that are now experimenting with Strava data.

One reason why they are looking for alternatives is that the number of Fietstelweek participants is lower than they’d like. They seem to have a point. Consider for example the map below, which shows bicycle routes to and from Amsterdam Central Station.

As such, it’s an interesting map. Unsuprisingly, it seems that intensity is highest near the bicycle parking facilities. Main access routes appear to be the Geldersekade (with the sometimes chaotic crossing with Prins Hendrikkade) and the Piet Heinkade. It seems that people cycling to and from Central Station are somewhat more likely to live in the eastern part of the city.

There’s one caveat though: the numbers are small. Even the busiest segments represent at most 40 rides. One loyal Fietstelweek participant recording her commute during the entire week could literally change the map.

Strava has far larger numbers, but its data raises different kinds of questions. Strava calls itself ‘the social network for athletes’ and wants to know if you use a road bike, a mountain bike, a TT bike or a cyclocross bike (no option ‘other’ available). So how representative is Strava data of people who use their city bike for commutes and other practical purposes?

Strava’s response to such questions is that they’re trying to make the app less competition-focused and more social, with Facebook-like features. This should help them collect data about ‘normal’ bike rides. They have also argued that «especially in cities, those with the app tended to ride the same routes as everyone else».

But is that really true? Strava’s heatmap (choose red and rides) for Amsterdam could perhaps be interpreted as a combination of recreational rides (Vondelpark, Amstel) and cyclists trying to get in or out of the city as quickly as possible (plus quite a few people who recorded their laps at the Jaap Eden ice skating rink as bicycle rides).

Perhaps you could find a way to filter out ‘lycra’ rides and end up with a sufficient number of ‘normal’ rides. Then again, almost three-quarters of bicycle rides in the Netherlands are under 3.7 km, and I suspect very few of those short rides end up on Strava.

There’s also a socio-economic aspect. It has been argued that Strava is used most by people living in wealthier neighbourhoods, which aren’t necessarily the neighbourhoods most in need of better cycling infrastructure.

Of course, bicycle use is unequal in the first place, which is also reflected in Fietstelweek data. The map below shows the start and end points of rides for Amsterdam.

Density is highest in the area within the ring road and south of the IJ. The number of trips per 1,000 residents also correlates with house values: more bicycle trips start or end in affluent neighbourhoods. As said, this probably reflects actual patterns in bicycle use and not a problem of the data.

To summarise, Fietstelweek has smaller numbers than one would like, while Strava data raises questions about representativeness. One way for Strava to help answer these questions would be to make a subset of its Amsterdam data available as open data.

This Python script shows how the analysis was done.

The Digital City

Amsterdam has a new coalition agreement. The paragraph on democratisation and the Digital City was well received - a London-based researcher from Amsterdam liked the plans so much she decided to translate them into English.

The new coalition wants to create a democratic version of the smart city. Citizens should be in control of their data. The city will support co-operations that provide an alternative to platform monopolists. An information commissioner will see to it that the principles ‘open by default’ and ‘privacy by design’ are implemented.

The agreement also lists a number of issues the city is (or was) already working on:

  • City council information will be opened up. In 2015, the city council asked to make documents such as council meeting reports, motions, written questions etc available as open data. Since, some of that data has been made available through Open Raadsinformatie, but as yet no solution has been found for offering all council information in a machine readable form.
  • Freedom of information requests (Wob requests) will be published. Amsterdam started publishing decisions on Wob requests earlier this year; so far three have been [published][wob].
  • Interestingly, Amsterdam wants to use open source software whenever possible. Over ten years ago, Amsterdam planned a similar move, and its plans were sufficiently serious to needle Microsoft. In 2010, the plans foundered on an uncooperative IT department.

All in all, a nice combination of new ambitions and implementation of ‘old’ plans.

Dutch government drops 3D pie charts

The Dutch government has replaced pie charts with bar charts in it’s annual reports, someone noted on Twitter (via @bokami). Pie charts aren’t always a bad choice - contrary to the view of some adherents of the stricter school in data visualisation. But 3D pie charts are really hard to justify and it’s a bit awkward they were still used one year ago.

The charts are from the 2016 and 2017 reports of the Department of Social Affairs and Employment.

Tags: 

How to use Python and Selenium for scraping election results

A while ago, I needed the results of last year’s Lower House election in the Netherlands, by municipality. Dutch election data is available from the website of the Kiesraad (Electoral Board). However, it doesn’t contain a table of results per municipality. You’ll have to collect this information from almost 400 different web pages. This calls for a webscraper.

The Kiesraad website is partly generated using javascript (I think) and therefore not easy to scrape. For this reason, this seemed like a perfect project to explore Selenium.

What’s Selenium? «Selenium automates browsers. That’s it!» Selenium is primarily a tool for testing web applications. However, as a tutorial by Thiago Marzagão explains, it can also be used for webscraping:

[S]ome websites don’t like to be webscraped. In these cases you may need to disguise your webscraping bot as a human being. Selenium is just the tool for that. Selenium is a webdriver: it takes control of your browser, which then does all the work.

Selenium can be used with Python. Instructions to install Selenium are here. You also have to download chromedriver or another driver; you may store it in /usr/local/bin/.

Once you have everything in place, this is how you launch the driver and load a page:

from selenium import webdriver
 
URL = 'https://www.verkiezingsuitslagen.nl/verkiezingen/detail/TK20170315'
 
browser = webdriver.Chrome()
browser.get(URL)

This will open a new browser window. You can use either xpath or css selectors to find elements and then interact with them. For example, find a dropdown menu, identify the options from the menu and select the second one:

XPATH_PROVINCES = '//*[@id="search"]/div/div[1]/div'
element = browser.find_element_by_xpath(XPATH_PROVINCES)
options = element.find_elements_by_tag_name('option')
options[1].click()

If you’d check the page source of the web page, you wouldn’t find the options of the dropdown menu; they’re added afterwards. With Selenium, you needn’t worry about that - it will load the options for you.

Well, actually, there’s a bit more to it: you can’t find and select the options until they’ve actually loaded. Likely, the options won’t be in place initially, so you’ll need to wait a bit and retry.

Selenium comes with functions that specify what it should wait for, and how long it should wait and retry before it throws an error. But this isn’t always straightforward, as Marzagão explains:

Deciding what elements to (explicitly) wait for, with what conditions, and for how long is a trial-and-error process. […] This is often a frustrating process and you’ll need patience. You think that you’ve covered all the possibilities and your code runs for an entire week and you are all happy and celebratory and then on day #8 the damn thing crashes. The servers went down for a millisecond or your Netflix streaming clogged your internet connection or whatnot. It happens.

I ran into pretty similar problems when I tried to scrape the Kiesraad website. I tried many variations of the built-in wait parameters, but without any success. In the end I decided to write a few custom functions for the purpose.

The example below looks up the options of a dropdown menu. As long as the number of options isn’t greater than 1 (the page initially loads with only one option, a dash, and other options are loaded subsequently), it will wait a few seconds and try again - until more options are found or until a maximum number of tries has been reached.

MAX_TRIES = 15
 
def count_options(xpath, browser):
 
    time.sleep(3)
    tries = 0
    while tries < MAX_TRIES:
 
        try:
            element = browser.find_element_by_xpath(xpath)
            count = len(element.find_elements_by_tag_name('option'))
            if count > 1:
                return count
        except:
            pass
 
        time.sleep(1)
        tries += 1
    return count

Here’s a script that will download and save the result pages of all cities for the March 2017 Lower House election, parse the html, and store the results as a csv file. Run it from a subfolder in your project folder.

Notes

Dutch election results are provided by the Kiesraad as open data. In the past, the Kiesraad website used to provide a csv with the results of all the municipalities, but this option is no longer available. Alternatively, a download is available of datasets for each municipality, but at least for 2017, municipalities use different formats.

Scraping the Kiesraad website appears to be the only way to get uniform data per municipality.

Since I originally wrote the scraper, the Kiesraad website has been changed. As a result, it would now be possible to scrape the site in a much easier way, and there would be no need to use Selenium. The source code of the landing page for an election contains a dictionary with id numbers for all the municipalities. With those id numbers, you can create urls for their result pages. No clicking required.

Tags: 

Pages