Using a jagged baseline to indicate a broken y-axis

In an article for the recently created Data Visualisation Society, R.J. Andrews suggests using a jagged baseline to indicate a broken y-axis (i.e., an axis that doesn’t start at zero). The idea - inspired by some beautiful charts dating back to WWI - is to suggest that the bottom part of the chart has been torn off. I like the idea - but I found it isn’t easy to implement.

Contrary to the view of some chart fundamentalists, using a y-axis that doesn’t start at zero can be perfectly ok in some situations. Still, one might want to alert the reader that the zero line is missing. One way is to add a little zigzag or some other symbol to the y-axis, as shown here. And then there’s Andrews’ suggestion to use a jagged baseline.

I tried to implement this in a chart that shows the number of flights at Schiphol Airport. For background: Schiphol has all but reached the cap of 500,000 flights per year, agreed on after negotiations between local residents and the aviation industry. There’s currently a heated debate on whether Schiphol should be allowed to grow further. Experts expect that maintaining the cap will result in more efficient use of the available slots (e.g. fewer short-distance flights, fewer low-cost flights, larger aircraft and fewer empty seats).

Creating a jagged baseline is a bit of a hassle: you have to remove the regular baseline, move the axis labels down a bit and create a new, jagged baseline.

And then there are some design issues. Having the baseline and the ‘regular’ chart lines look too similar may cause confusion. In fact, all of Andrews’ examples have very pronounced chart lines, which are clearly distinct from the baseline. If you prefer a more subtle approach, another solution is to use a light colour for the baseline.

Then again, it also matters whether there are gridlines. After some experimenting, I think the jagged baseline only works well with gridlines added; without them it looks a little weird. But see for yourself if you agree.

I’ve written a Python script to download and clean Schiphol Airport traffic data; find it on Github.

Tags: 

How to investigate assets: lessons from The Wire

I’m rewatching The Wire. It’s a great series anyhow, but for researchers, episode 9 of the first season (2002) is especially interesting. It features detective Lester Freamon instructing detectives Roland Pryzbylewski and Leander Sydnor how to investigate the assets of drug kingpin Avon Barksdale.

They use microfilm instead of the Internet. They don’t have databases like Orbis, Companyinfo or OpenCorporates, and they don’t seem to calculate social network metrics. Yet the general principles behind Freamon’s methodology still make perfect sense today:

Start with the nightclub that Barksdale owns. Look up Orlando’s, by address, you match it, and you see it’s owned by - who?

Turns out it’s owned by D & B Enterprises. Freamon tells Prez to take that information to the state office buildings on Preston Street.

Preston Street?

Corporate charter office.

Corporate who?

They have the paperwork on every corporation and LLC licensed to do business in the state. You look up D & B Enterprises on the computer. You’re going to get a little reel of microfilm. Pull the corporate charter papers that way. Write down every name you see. Corporate officers, shareholders or, more importantly, the resident agent on the filing who is usually a lawyer. While they use front names as corporate officers, they usually use the same lawyer to do the charter filing. Find that agent’s name, run it through the computer, find out what other corporations he’s done the filing for, and that way we find other front companies.

This is pretty much the same approach you’d take when investigating shady temp agencies: trace connections via (former) shareholders, board members, company addresses and related party transactions. And, of course, try to figure out where the profits go.

On that aspect, Freamon also has some wisdom to share:

And here’s the rub. You follow drugs, you get drug addicts and drug dealers. But you start to follow the money, and you don’t know where the fuck it’s gonna take you.

Tags: 

My first Python package

As a self-taught programmer, I sometimes feel a bit uneasy about the code I write. Sure, it may work, but there’s probably a more efficient and more elegant way to do it. These doubts notwithstanding, I’ve just published my first Python package: limepy.

Its purpose is simple: it helps you process and summarise LimeSurvey data. LimeSurvey is a survey tool, somewhat similar to Surveymonkey. It’s different in that it’s open source, and probably more versatile.

If you download survey data as a csv, the answers to question types such as multiple choice questions or blocks of questions (‘arrays’) will be spread out over multiple columns. One task of limepy is to make sure all the data for a specific item end up in one table.

Limepy will also help you with a number of other tasks, like downloading survey data, creating a codebook, printing answers to open-ended questions and printing the answers of an individual respondent.

Find the package on Github and PyPI. Install with pip install limepy. Feedback welcome.

Delete Facebook

This is becoming a bit of a tradition: me writing about people who make a New Year’s resolution to quit Facebook. The story is simple: around the turn of the year, there’s a peak in people googling how to quit smoking, but there’s an even larger peak in people trying to figure out how to delete their Facebook account.

But this year, the story is a bit more complicated (and more interesting).

Google Trends data isn’t available yet for the last days of the year, so there’s no new peak in searches for “quit smoking” yet. Other than that, the yearly pattern is dwarfed by a huge peak in search volume for “delete Facebook” in the week starting on 18 March. What happened?

The Guardian has helpfully created an overview of Facebook-related incidents during 2018; I’ve added a few stories that also seemed relevant (for sources, see Method below; thanks to Vicki Boykis for the suggestion to annotate the Google Trends chart).

No surprise: the largest peak in “delete Facebook” searches happened a few days after the publication of the Cambridge Analytica story on 17 March. The news resulted in a veritable #deletefacebook campaign, although according to Mark Zuckerberg, «I don’t think we’ve seen a meaningful number of people act on that.»

Arwa Mahdawi has argued that deleting your Facebook account isn’t a bad New Year’s resolution, even though it probably won’t change how the company operates: «Facebook’s abuse of power isn’t a problem that we can solve as individuals. Technology giants must be regulated.»

So how much impact did the controversy have on Facebook? One way to try and answer this is to look at the share price.

The pattern for Facebook is rather interesting. The share price dropped after the publication of the Cambridge Analytica story, but quickly picked up again. But then it took a plunge on 25 July, resulting in ‘the biggest-ever one-day wipeout in U.S. stockmarket history’.

One possible interpretation is that investors initially thought the Cambridge Analytica story wasn’t going to harm Facebook’s profits. But when Facebook published its Q2 earnings report, they were shocked to learn that user growth had stalled.

But the chart also shows that all major tech companies saw their share prices go down. This suggests there’s more going on than users leaving Facebook. In addition to broader economic trends, a likely explanation is that investors fear more government regulation of major tech companies in response to the controversies they are involved in (and also to their dominant market position). While this may not be the whole story, it does seem to support Mahdawi’s view about the key role of regulation.

Method

Note that Google Trends data should be interpreted with caution because Google doesn’t provide much detail on the methodology used to produce the data.

For periods longer than three months, only weekly data can be downloaded. For the 2018 chart I wanted daily data. As suggested here, I downloaded three-month batches with overlapping data and then used the overlapping dates to calculate a ratio to adjust the scales. Here’s the code:

import pandas as pd
import numpy as np
 
def stitch(df1, df2):
    df1.index = df1.date
    df2.index = df2.date
    overlapping = [d for d in df1.date if d in list(df2.date)]
    ratios = [df1.loc[d, 'delete facebook'] /
              df2.loc[d, 'delete facebook']
              for d in overlapping]
    ratio = np.median(ratios)
    for var in ['delete facebook', 'quit smoking']:
        df2[var] *= ratio
    df = pd.concat([df1, df2[~df2.date.isin(overlapping)]])
    return df
 
df = dfs[0]
for df2 in dfs[1:]:
    df = stitch(df, df2)

I used this Guardian article as my main source on Facebook-related incidents in 2018. I added a few from other sources: in April, Facebook announced 87 million people had been affected by the Cambridge Analytica scandal. Subsequently, it announced that it would notify people who had been affected. Dutch comedian Arjen Lubach organised a Bye Bye Facebook event (reminiscent of the 2015 Facebook Farewell Party). In September, Pew found that one in four Americans had deleted the Facebook app from their phone; and later that month a Chinese hacker threatened to delete Mark Zuckerberg’s Facebook account.

Tags: 

Who use the Dafne Schippers bicycle bridge

If all goes well, the Dafne Schippers bicycle bridge in Utrecht should reopen on Monday, after a short closure for maintenance. I have a special affinity with this bridge: it opened on the day I started working in Leidsche Rijn, west of the Amsterdam-Rhine Canal, and it’s part of my favourite cycle route to work.

Who else use this bridge? With the usual caveats, data of the Fietstelweek can provide some insights. The charts below show, for each direction of traffic, at what time cyclists use the bridges across the canal.

There’s a morning peak in cyclists crossing the canal from Leidsche Rijn (west) to the city centre (east), and a peak in cyclists going the opposite direction around 5 pm. This suggests that the bridges are popular among commuters from Leidsche Rijn. That doesn’t really come as a surprise: if you cycle to Leidsche Rijn during the morning rush hour, you ride past huge numbers of cyclists going in the opposite direction.

The map below shows the routes of cyclists using the bridges. From top to bottom: Hogeweidebrug (or Yellow Bridge), Dafne Schippers bridge and De Meern bridge.

It appears that many cyclists use the bridges to go to the area around Central Station. Users of the De Meern and Dafne Schippers bridges tend to use nice routes that converge along the Leidseweg. Users of the Yellow Bridge use the not-so-nice route along Vleutenseweg, or the slightly better route along the railway track.

Research has shown that cyclists don’t always prefer the shortest route to their destination; the quality of the cycle tracks also plays a role.

Yet the map suggests that many cyclists opt for the shortest route, even if a nicer alternative is available. For example, few cyclists from the northern part of Leidsche Rijn seem to use the Dafne Schippersbrug, or the route along Keulsekade (the latter avoids long waits at traffic lights).

See also this analysis by DUIC, which shows that the bridge is not only popular among cyclists, but also among runners, which is fitting given the name of the bridge.

Tags: 

Pages