Data

De Amsterdamse fietser gevisualiseerd

Fietsstad Amsterdam, een nieuw boek van Fred Feddes en Marjolein de Lange, beschrijft hoe Amsterdam een fietsbeleid ontwikkelde (meer over het boek hieronder). Het archief van de Fietsersbond Amsterdam vormde een belangrijke bron voor het boek. Daarnaast is gebruik gemaakt van verkeersgegevens om trends te analyseren.

Een interessante dataset bestaat uit tellingen van het aantal fietsers, auto’s en andere weggebruikers die de stad in- en uitreden, over de periode 1980–2009. De meeste lokaties waar verkeer is geteld liggen op de Singelgracht, die een soort cirkel vormt om het centrum van Amsterdam.

De cijfers zijn telkens gebaseerd op handmatige tellingen op één dag, van 7:00 - 19:00 uur, van het verkeer in beide richtingen.

Ik werd gevraagd om mee te denken over een manier om deze gegevens te visualiseren, een interessante (en erg leuke) klus. Hieronder bespreek ik enkele opties die we hebben overwogen.

Spindiagram

Vanwege de ligging van de tellokaties lag het voor de hand om een cirkelvormige grafiek uit te proberen. De gemeentelijke Dienst Infrastructuur Verkeer en Vervoer was in 2007 ook al op dat idee gekomen. In een factsheet gebruikten ze een spindiagram om de fietstellingen in beeld te brengen.

Overigens noemden ze hun grafiek geen spindiagram, maar waaier. Met een fietsmetafoor legden ze uit hoe de grafiek werkt: «vanuit het middelpunt zijn de telpunten rond de binnenstad verbonden als spaken in een fietswiel».

Het is een mooie grafiek, maar dit grafiektype heeft ook een nadeel. Impliciet wordt de suggestie gewekt dat de oppervlakte binnen de paarse lijn correspondeert met het aantal passeringen, wat eigenlijk misleidend is (zie dit artikel voor een bespreking van een vergelijkbaar probleem). Een andere beperking is dat de grafiek niet laat zien hoe het fietsgebruik is veranderd - al zou je een versie kunnen maken met aparte lijnen voor 1980 en 2009.

Radial lollipop chart

Als alternatief heb ik een radial lollipop chart gemaakt. Althans, zo noem ik hem maar; voor zover ik weet bestond dit grafiektype nog niet. De grafiekbibliotheek die ik gebruik, D3.js, lijkt geen methode te hebben om de ‘spaken’ te tekenen, of in ieder geval kon ik die niet vinden. Ik heb daarom een functie geschreven om het begin- en eindpunt van de lijnen te berekenen. Ik was allang vergeten hoe je sinus en cosinus gebruikt, dus dat moest ik opzoeken. Ik heb de code hier gepubliceerd.

Hieronder een radial lollipop chart die laat zien hoe het fietsverkeer op bijna alle Singelgrachtkruisingen is toegenomen.

En hier een die het tegenovergestelde effect laat zien voor auto’s.

Ik hou er wel van als datapunten buiten het grafiekgebied vallen - al is dit misschien een beetje overdreven. De uitschieters worden veroorzaakt door het feit dat een groot deel van het autoverkeer de route Wibautstraat - IJtunnel gebruikt. Ik had de schaal kunnen aanpassen zodat deze uitschieters binnen het grafiekgebied zouden vallen, maar dan zou het veel moeilijker worden om veranderingen op andere routes en op de fietsgrafiek te onderscheiden.

Vlakdiagram

Ik ben op zich wel gecharmeerd van die radial lollipop chart, maar hij heeft een beperking: hij laat de veranderingen tussen 1980 en 2009 zien, maar niet wanneer die veranderingen zich voordeden. Het autoverkeer nam al af voordat de groei van het fietsverkeer goed op gang kwam, maar op de radial lollipop chart zie je dat niet.

In het boek staat daarom een vlakdiagram, waarbij kleuren corresponderen met de geografische oriëntatie van de kruisingen. Eenvoudig, maar effectief. En als je in de details wil duiken, klik dan hier voor een eerdere schets: fiets, auto.

Over het boek en de tentoonstelling

De Fietsersbond Amsterdam heeft zijn archief overgedragen aan het Stadsarchief. Marjolein de Lange, die een vrijwilligersproject coördineerde om de overdracht voor te bereiden, kwam op het idee om het materiaal te gebruiken als input voor een boek. Dat idee heeft ze vervolgens uitgevoerd samen met auteur Fred Feddes.

Het resultaat is een erg interessant boek over activisme versus samenwerking, over de plek van de fiets in het gemeentelijk beleid en over hoe de toverkracht van de Amsterdamse fietscultuur de doorslag gaf in de epische strijd om de onderdoorgang voor fietsers onder het Rijksmuseum. Verder staat het boek vol fantastische foto’s, kaarten en affiches. Een must voor iedereen die geïnteresseerd is in fietsen, Amsterdam, of actieposters. Er is ook een gratis toegankelijke tentoonstelling in het Stadsarchief (tot 30 juni).

Visualising Amsterdam’s cyclists

Bike City Amsterdam, a new book by Fred Feddes and Marjolein de Lange, recounts how Amsterdam developed a cycling policy (more on the book below). An important source for the book is the archive of the Amsterdam branch of cyclists’ organisation Fietsersbond. In addition, traffic data was used to analyse trends.

An interesting dataset consists of counts of the number of cyclists, cars and other road users moving into and out of Amsterdam’s city centre, over the years 1980–2009. Most of the locations where traffic was counted are on the Singelgracht, which encircles Amsterdam’s city centre.

The data represents manual counts on a single day, between 7am and 7pm, of traffic in both directions.

I was asked to think about a way to visualise this dataset, which posed an interesting challenge (and was a lot of fun to do). Below, I’ll discuss a few of the options we considered.

Radar chart

Given the geographical distribution of counting locations, it seemed to make sense to try a circular chart design. In fact, that idea had also occurred to the city’s infrastructure department. In a 2007 fact sheet, they used a radar chart (or cobweb chart) to visualise the Singelgracht bicycle counts.

Incidentally, they didn’t use the term radar chart, but called it a fan (waaier). They used a bicycle metaphor to describe how it works: «from the middle, the counting locations around the city centre are connected like spokes in a bicycle wheel».

The chart looks really nice, but this chart type also has a drawback: there’s an implicit suggestion that the area within the purple line represents the number of crossings, which is in fact misleading (see this article for a discussion of a similar problem). Another limitation is that the chart doesn’t show how bicycle traffic changed - although it would be possible to make a version with separate lines representing 1980 and 2009.

Radial lollipop chart

As an alternative, I created what I’ll call a radial lollipop chart (to my knowledge, this chart type didn’t exist yet). The chart library that I use, D3.js, doesn’t seem to have a method to draw the ‘spokes’, or at least I couldn’t find it. Therefore, I wrote a function that calculates the start and end points of the lines. I had long forgotten how to use sine and cosine, so I had to look that up. I’ve published the code here.

Here’s a radial lollipop chart showing how cycling has increased at virtually all the Singelgracht crossings.

And here’s one showing the opposite effect for cars:

I love it when a chart has data points that break out of the chart area - although this is perhaps a bit extreme. The outliers are due to the fact that a large share of car traffic uses the Wibautstraat - IJtunnel route. I could have changed the scale to include those outliers, but then changes on other routes as well as changes in bicycle use would have become much more difficult to discern.

Area chart

I rather like the radial lollipop chart, but it has a limitation: it shows changes between 1980 and 2009, but not when those changes happened. Car use started to go down before cycling really started to increase, but from the radial lollipop chart you couldn’t tell.

This is why the chart used in the book is an area chart, with colours corresponding to the broad geographical orientation of the crossings. Simple, but effective. And if you want to explore the details, click here for a draft version of the charts: bicycle, car.

About the book and exhibition

On 4 April, the Amsterdam branch of cyclists’ organisation Fietsersbond has handed over its archive to the Municipal Archive. Marjolein de Lange, who coordinated a volunteer project to prepare the archive, came up with the idea to use the material as input for a book - a project she carried out with author Fred Feddes.

This resulted in a very interesting book about activism versus cooperation; the place of cycling in urban planning; and how the magic power of Amsterdam’s cycling culture decided the epic fight for the right to cycle through the passage under the Rijksmuseum. The book, which contains a wealth of great photos; maps and posters, is a must-read for anyone interested in cycling, Amsterdam, or activist poster design. It’s been published both in Dutch and in English. There’s also an exhibition at the Municipal Archive (until 30 June, Vijzelstraat 32, access is free).

Using a jagged baseline to indicate a broken y-axis

In an article for the recently created Data Visualisation Society, R.J. Andrews suggests using a jagged baseline to indicate a broken y-axis (i.e., an axis that doesn’t start at zero). The idea - inspired by some beautiful charts dating back to WWI - is to suggest that the bottom part of the chart has been torn off. I like the idea - but I found it isn’t easy to implement.

Contrary to the view of some chart fundamentalists, using a y-axis that doesn’t start at zero can be perfectly ok in some situations. Still, one might want to alert the reader that the zero line is missing. One way is to add a little zigzag or some other symbol to the y-axis, as shown here. And then there’s Andrews’ suggestion to use a jagged baseline.

I tried to implement this in a chart that shows the number of flights at Schiphol Airport. For background: Schiphol has all but reached the cap of 500,000 flights per year, agreed on after negotiations between local residents and the aviation industry. There’s currently a heated debate on whether Schiphol should be allowed to grow further. Experts expect that maintaining the cap will result in more efficient use of the available slots (e.g. fewer short-distance flights, fewer low-cost flights, larger aircraft and fewer empty seats).

Creating a jagged baseline is a bit of a hassle: you have to remove the regular baseline, move the axis labels down a bit and create a new, jagged baseline.

And then there are some design issues. Having the baseline and the ‘regular’ chart lines look too similar may cause confusion. In fact, all of Andrews’ examples have very pronounced chart lines, which are clearly distinct from the baseline. If you prefer a more subtle approach, another solution is to use a light colour for the baseline.

Then again, it also matters whether there are gridlines. After some experimenting, I think the jagged baseline only works well with gridlines added; without them it looks a little weird. But see for yourself if you agree.

I’ve written a Python script to download and clean Schiphol Airport traffic data; find it on Github.

How to investigate assets: lessons from The Wire

I’m rewatching The Wire. It’s a great series anyhow, but for researchers, episode 9 of the first season (2002) is especially interesting. It features detective Lester Freamon instructing detectives Roland Pryzbylewski and Leander Sydnor how to investigate the assets of drug kingpin Avon Barksdale.

They use microfilm instead of the Internet. They don’t have databases like Orbis, Companyinfo or OpenCorporates, and they don’t seem to calculate social network metrics. Yet the general principles behind Freamon’s methodology still make perfect sense today:

Start with the nightclub that Barksdale owns. Look up Orlando’s, by address, you match it, and you see it’s owned by - who?

Turns out it’s owned by D & B Enterprises. Freamon tells Prez to take that information to the state office buildings on Preston Street.

Preston Street?

Corporate charter office.

Corporate who?

They have the paperwork on every corporation and LLC licensed to do business in the state. You look up D & B Enterprises on the computer. You’re going to get a little reel of microfilm. Pull the corporate charter papers that way. Write down every name you see. Corporate officers, shareholders or, more importantly, the resident agent on the filing who is usually a lawyer. While they use front names as corporate officers, they usually use the same lawyer to do the charter filing. Find that agent’s name, run it through the computer, find out what other corporations he’s done the filing for, and that way we find other front companies.

This is pretty much the same approach you’d take when investigating shady temp agencies: trace connections via (former) shareholders, board members, company addresses and related party transactions. And, of course, try to figure out where the profits go.

On that aspect, Freamon also has some wisdom to share:

And here’s the rub. You follow drugs, you get drug addicts and drug dealers. But you start to follow the money, and you don’t know where the fuck it’s gonna take you.

Delete Facebook

This is becoming a bit of a tradition: me writing about people who make a New Year’s resolution to quit Facebook. The story is simple: around the turn of the year, there’s a peak in people googling how to quit smoking, but there’s an even larger peak in people trying to figure out how to delete their Facebook account.

But this year, the story is a bit more complicated (and more interesting).

Google Trends data isn’t available yet for the last days of the year, so there’s no new peak in searches for “quit smoking” yet. Other than that, the yearly pattern is dwarfed by a huge peak in search volume for “delete Facebook” in the week starting on 18 March. What happened?

The Guardian has helpfully created an overview of Facebook-related incidents during 2018; I’ve added a few stories that also seemed relevant (for sources, see Method below; thanks to Vicki Boykis for the suggestion to annotate the Google Trends chart).

No surprise: the largest peak in “delete Facebook” searches happened a few days after the publication of the Cambridge Analytica story on 17 March. The news resulted in a veritable #deletefacebook campaign, although according to Mark Zuckerberg, «I don’t think we’ve seen a meaningful number of people act on that.»

Arwa Mahdawi has argued that deleting your Facebook account isn’t a bad New Year’s resolution, even though it probably won’t change how the company operates: «Facebook’s abuse of power isn’t a problem that we can solve as individuals. Technology giants must be regulated.»

So how much impact did the controversy have on Facebook? One way to try and answer this is to look at the share price.

The pattern for Facebook is rather interesting. The share price dropped after the publication of the Cambridge Analytica story, but quickly picked up again. But then it took a plunge on 25 July, resulting in ‘the biggest-ever one-day wipeout in U.S. stockmarket history’.

One possible interpretation is that investors initially thought the Cambridge Analytica story wasn’t going to harm Facebook’s profits. But when Facebook published its Q2 earnings report, they were shocked to learn that user growth had stalled.

But the chart also shows that all major tech companies saw their share prices go down. This suggests there’s more going on than users leaving Facebook. In addition to broader economic trends, a likely explanation is that investors fear more government regulation of major tech companies in response to the controversies they are involved in (and also to their dominant market position). While this may not be the whole story, it does seem to support Mahdawi’s view about the key role of regulation.

Method

Note that Google Trends data should be interpreted with caution because Google doesn’t provide much detail on the methodology used to produce the data.

For periods longer than three months, only weekly data can be downloaded. For the 2018 chart I wanted daily data. As suggested here, I downloaded three-month batches with overlapping data and then used the overlapping dates to calculate a ratio to adjust the scales. Here’s the code:

import pandas as pd
import numpy as np
 
def stitch(df1, df2):
    df1.index = df1.date
    df2.index = df2.date
    overlapping = [d for d in df1.date if d in list(df2.date)]
    ratios = [df1.loc[d, 'delete facebook'] /
              df2.loc[d, 'delete facebook']
              for d in overlapping]
    ratio = np.median(ratios)
    for var in ['delete facebook', 'quit smoking']:
        df2[var] *= ratio
    df = pd.concat([df1, df2[~df2.date.isin(overlapping)]])
    return df
 
df = dfs[0]
for df2 in dfs[1:]:
    df = stitch(df, df2)

I used this Guardian article as my main source on Facebook-related incidents in 2018. I added a few from other sources: in April, Facebook announced 87 million people had been affected by the Cambridge Analytica scandal. Subsequently, it announced that it would notify people who had been affected. Dutch comedian Arjen Lubach organised a Bye Bye Facebook event (reminiscent of the 2015 Facebook Farewell Party). In September, Pew found that one in four Americans had deleted the Facebook app from their phone; and later that month a Chinese hacker threatened to delete Mark Zuckerberg’s Facebook account.

Pages