Are trade unions important? Depends on who you ask

A majority of Dutch employees think trade unions are important or even very important, Statistics Netherlands (CBS) reported. But there are exceptions. For example, very few general managers think trade unions are important. That shouldn’t really come as a surprise.

I combined the data with a previously published dataset on how satisfied employees in different occupational groups are with their salary. The results are shown below.

There’s a moderately strong correlation. General managers can’t complain about their salary and, as indicated, they could do without trade unions. On the other hand, cleaners are less satisfied with their salary, and they overwhelmingly support trade unions.

More interesting perhaps is the question which groups deviate from the pattern. Nurses appear to have a strong sense of solidarity: they are pretty satisfied with their salary, but they also attach great importance to trade unions.

The opposite applies to personnel officers. Personel officers are less satisfied with their salary than nurses, but that doesn’t translate into support for trade unions. Perhaps they think their job would be easier if workers wouldn’t organise.

Sources: opinion on trade unions xlsx, satisfaction with salary xlsx, number of workers xlsx.

Tags: 

Left-wing collaboration in Amsterdam

This weekend, PvdA (social-democrats), SP (socialists) and GroenLinks (green party) have announced a left-wing pact. The parties criticize the ‘worthless’ coalition agreement of the new right-wing national government and opt for a city that is sustainable and characterised by solidarity. At the same time, the pact is an indication that the signatories want to form a coalition after the local election in March 2018.

Such a far-reaching form of collaboration is quite remarkable by Amsterdam standards. Have there been signs that such an alliance was in the making? An interesting indicator is collaboration on motions and amendments. Jointly presenting a motion not only requires that you agree on substance, but also that you get along well.

The chart below shows the percentage of motions and amendments that were presented by PvdA, SP and GroenLinks, since the previous election (in order to iron out seasonal effects, the chart shows the 12-month moving average).

The numbers aren’t very large, so we shouldn’t draw too firm conclusions from this. That said, it appears that PvdA, SP and GroenLinks have increased their collaboration. This started around May, not very long after the national election in March.

Have there been similar overtures among right-wing parties? The chart below shows joint initiatives of right-wing VVD and christian-democrat CDA.

It appears that VVD and CDA have also increased their collaboration since the national election. The majority of their joint initiatives are from Werner Toonk (VVD) and Diederik Boomsma (CDA), often dealing with education. If the collaboration depends on the people involved, this doesn’t bode well for the future: Toonk has ended his membership of the city council.

D66 (green and pro-market) is in a bit of a quandary. Nationally, they’re part of the coalition with VVD; CDA and ChristenUnie, and locally they have commited to defend the national coalition agreement. On the other hand, in the Amsterdam council, D66 appears to have somewhat intensified its collaboration with GroenLinks and PvdA.

All in all, it appears that the national election has been a catalyst for changes at the Amsterdam level. Left-wing parties have increased their collaboration, which has now resulted in a quite remarkable pact. Right-wing parties also seem to explore closer collaboration, but it’s too early to say how sustainable this will be.

Examples of left-wing collaboration

The council members most actively involved in PvdA-SP-GroenLinks motions are Jorrit Nuijens (GroenLinks), Dennis Boutkan (PvdA) en Tiers Bakker (SP).

Recent motions dealt with topics including a municipal tax on «hot money» (Bakker, Roosma, Boutkan), transparency regarding the remuneration of board members of organisations that receive subsidies (Boutkan, Groot Wassink, Peters) and a cap on insecure jobs at the municipality (Boutkan, Ernsting, Peters).

Motions and amendments filed until 27 September have been published.

Tags: 

How to do fuzzy matching in Python

Statistics Netherlands (CBS) has an interesting dataset containing data at the city, district and neighbourhood levels. However, some names of neighbourhoods have changed, specifically between 2010 and 2011 for Amsterdam. For example, Bijlmer-Centrum D, F en H was renamed Bijlmer-Centrum (D, F, H).

In some of those cases the neighbourhood codes have changed as well, and CBS doesn’t have conversion tables. So this is one of those cases where you need fuzzy string matching.

There’s a good Python library for that job: Fuzzywuzzy. It was developed by SeatGeek, a company that scrapes event data from a variety of websites and needed a way to figure out which titles refer to the same event, even if the names have typos and other inconsistencies.

Fuzzywuzzy will compare two strings and compute a score between 0 and 100 reflecting how similar they are. It can use different methods to calculate that score (e.g. fuzz.ratio(string_1, string_2) or fuzz.partial_ratio(string_1, string_2). Some of those methods are described in this article, which is worth a read.

Alternatively, you can take a string and have Fuzzywuzzy pick the best match(es) from a list of options (e.g., process.extract(string, list_of_strings, limit=3) or process.extractOne(string, list_of_strings)). Here, too, you could specify the method to calculate the score, but you may want to first try the default option (WRatio), which will figure out which method to use. The default option seems to work pretty well.

Here’s the code I used to match the 2010 CBS Amsterdam neighbourhood names to those for 2011:

import pandas as pd
from fuzzywuzzy import process
 
# Prepare data
 
colnames = ['name', 'level', 'code']
 
data_2010 = pd.read_excel('../data/Kerncijfers_wijken_e_131017211256.xlsx', skiprows=4)
data_2010.columns = colnames
data_2010 = data_2010[data_2010.level == 'Buurt']
names_2010 = data_2010['name']
 
data_2011 = pd.read_excel('../data/Kerncijfers_wijken_e_131017211359.xlsx', skiprows=4)
data_2011.columns = colnames
data_2011 = data_2011[data_2011.level == 'Buurt']
names_2011 = data_2011['name']
 
# Actual matching
 
recode = {}
for name in names_10:
    best_match = process.extractOne(name, names_11)
    if best_match[1] < 100:
        print(name, best_match)
    recode[name] = best_match[0]
 

It prints all matches with a score below 100 so you can inspect them in case there are any incorrect matches (with larger datasets this may not be feasible). With the process option I didn’t get any incorrect matches, but with fuzz.partial_ratio, IJplein en Vogelbuurt was matched with Vondelbuurt instead of Ijplein/Vogelbuurt.

PS In case you’re actually going to work with the local CBS data, you should know that Amsterdam’s neighbourhoods (buurten) were reclassified as districts (wijken) in 2016, when a more detailed set of neighbourhoods was introduced. You can translate 2015 neighbourhood codes to 2016 district codes:

def convert_code(x):
    x = 'WK' + x[2:]
    x = x[:6] + x[-2:]
    return x
Tags: 

Wikinews is a great idea, but is it viable?

I’ve decided to start reposting relevant Dutch-language articles from Wikinews on News from Amsterdam. Wikinews is a sister project of Wikipedia. Its contributors describe themselves as follows:

We are a group of volunteers whose mission is to present reliable, unbiased and relevant news. All our content is released under a free license. By making our content perpetually available for free redistribution and use, we hope to contribute to a global digital commons. Wikinews stories are written from a neutral point of view to ensure fair and unbiased reporting.

It would seem a bit naive to simply claim that your writing is neutral and unbiased. The strength of Wikinews rather lies in providing a place were the merits of a story and its sourcing can be discussed on the basis of arguments. In times of clickbait, hoaxes and fake news, that’s an interesting concept.

Wikinews operates without ads (which matters). Their website may look a bit austere, but since it’s all open source, anyone can resuse the content with a different layout.

For now, Wikinews mainly consists of syntheses of news published by other media, but the site also invites other types of stories such as investigative reporting; (photo) reports and opinion articles. This could also provide room for another goal of Wikinews, which is to cover stories that are underreported in other media.

But is Wikinews viable? The consensus seems to be that it’s not. As Jonathan Dee of the New York Times put it in July 2007: «Wikinews … has sunk into a kind of torpor; lately it generates just 8 to 10 articles a day on a grab bag of topics that happen to capture the interest of its fewer than 26,000 users worldwide …».

That was ten years ago. Since, activity on the English-language version of Wikinews has further declined, as the chart below shows. Meanwhile, there’s a remarkable rise in the number of articles on Dutch-language Wikinews.

Perhaps this is a temporary boost of enthusiasm, that will fade out after a while. Then again, maybe it’ll last and Wikinews will reinvent itself. It’ll be interesting to see how this develops.

Tags: 

Charts on mobile screens - always tricky

The chart shown here is from Dutch national statistics office CBS and it was created with Highcharts (or at least that name appears a few dozen times in the source code). It shows working people with second jobs and it’s from this page.

There’s a problem with the labels on the x-axis: they have been abbreviated and only show the first digit of the years. This could easily have been avoided by showing fewer labels when the chart is displayed on a narrow screen. In fact, that solution is used in the other graphs in the same article. Apparently, Highcharts doesn’t adjust the number of labels automatically, or it doesn’t do so consistently.

It appears that CBS has chosen Highcharts because it’s an easy way to create charts with added functionality. And some of that functionality seems to make sense: I can imagine people using the option of downloading a PNG to use it in a report or share it on social media.

However, it may not always be a good idea to rely on standard solutions. Here’s another example where the labels have been abbreviated. It’s a bit of a challenge to figure out what they mean.

I know it can be a pain to get charts to work well on different types of screens (let alone network graphs). Apparently, you cannot simply rely on Highcharts to get it right. CBS should probably assign someone to edit each chart individually, making sure they are displayed properly on different screen types.

As for the contents of the charts: here’s why it’s not OK that more and more Dutch workers need to take a second job to make ends meet (in Dutch).

Tags: 

How to make a d3js force layout stay within the chart area - even with multiple components

For a post on analysing networks of corporate control, I wanted to create some network graphs with d3.js. The new edition of Scott Murray’s great book on d3.js, which is updated to version 4, contains a good example to get you started. However, I was still struggling with some practical issues, as the chart below illustrates (reload the page to see the problem develop).

A large part of the graph drifts out of the chart area, and the problem only gets worse on a mobile screen. But I figured out some sort of solution.

As Murray explains, you can vary the strength value of the force layout. Positive values attract, negative values repel. The default value is –30. You could set d3.forceManyBody().strength(-3) to create a more compact graph.

Of course, the ideal setting will depend on screen size. You could vary the strength value according to screen width. While you’re at it, you may also want to vary the radius of nodes and the stroke-width of edges. For example with something like this:

if(w > 380){
    var strength = -3;
    var r = 3;
    var sw = 0.3;
}
else{
    var strength = -1;
    var r = 3;
    var sw = 0.15;
}

Now this may make the graph more compact, but it doesn’t solve one specific problem: components not connected to the rest of the chart will still drift out of the chart area. In my example, there are four components: a large one, and three pairs of nodes that are only connected to each other and not to the rest of the graph.

The way in which I dealt with this was to create four different graphs and attach the small components to a forceCenter at the margin of the chart area. For example, d3.forceCenter().x(0.1 * w).y(0.9 * h)) will put one of them in the bottom left corner. Here’s the result:

It’s still a lot of code - I can’t help feeling there should be a more efficient way to do this. Also, it’s slightly weird that the small components immediately freeze, whereas the large one takes its time to develop into its final shape. And the text labels could be improved. But at least it seems to work.

The network of Dutch firms

One of the ways in which firms are linked is through board members who also sit on the boards of other firms. Researchers use these board interlocks to determine which firms occupy a central position in the corporate network. This «is widely considered as an indication of a powerful or at least advantageous position», Frank Takes and Eelke Heemskerk explain in an interesting paper on the subject.

Two Dutch newspapers, de Volkskrant and NRC Handelsblad, have published visualisations of the Dutch (corporate) elite and their board memberships. You can use the data from those visualisations to create board interlock networks. Below is an example using data from NRC Handelsblad from 2017:

Darker nodes represent organisations with a more central position in the network, as measured by their betweenness centrality. Below is another example, using data from de Volkskrant from 2013:

The most obvious difference is that the second graph contains far more nodes (organisations) and edges (shared board members) than the 2017 chart. But there’s more. The 2017 dataset contains only nodes that have at least two edges - probably the result of a selection criterion used by NRC Handelsblad because of the type of visualisation they wanted to make. Further, the 2013 dataset consists of multiple components: three sets of organisations only share board members with each other; not with the rest of the network.

Given the differences between the datasets, would it still be meaningful to make comparisons between the two? The table below shows the top 10 of organisations with the highest centrality scores, for 2013 and 2017. The comparison is limited to organisations that are included in both datasets.

2013 2017
VNO-NCW VNO-NCW
DNB Ahold
Concertgebouw Concertgebouw
KLM KLM
ABN Amro Schiphol
Aegon NV FrieslandCampina
Concertgebouw Fonds Philips
DSM DNB
Philips Rabobank Groep
Heineken NV Vopak

Organisations like employers’ organisation VNO-NCW, the Concertgebouw concert hall and airline KLM seem to occupy a pretty stable position at the centre of the network. VNO-NCW has a huge non-executive board with representatives from a wide range of industries. The Concertgebouw has been described years ago as the living room of the [Dutch] elite.

Aside from these stable elements, there are substantial differences between the two rankings. The rank correlation is only 0.33 and not statistically significant. This may be due to differences in the way the datasets were created; the small size of the overlap between them (only 34 organisations) and other data quality issues.

On the other hand, some changes in the ranking appear to reflect genuine changes in the position firms occupy. Two examples:

  • One of the fastest risers is Ahold. Ahold merged with Belgian retailer Delhaize in 2016. It would seem plausible that this has strengthened their position in the corporate network.
  • ABN Amro disappeared from the top 10. The bank used to have a board with well-connected members like Gerrit Zalm and Joop Wijn (both have gone through the revolving door between government and the corporate world), Peter Wakkie and Marjan Oudeman (one of the most influential Dutch women according to various rankings). In 2015, chairman Wakkie stepped down over a commotion caused by excessive executive board remunerations (the bank was still state-owned after having been bailed out with public money in 2008). Subsequently, Oudeman, Zalm and Wijn also left the bank, for reasons partly related to its upcoming flotation. It appears the current board has a lower profile.

This type of analyses could benefit enormously from having a larger dataset available. This is yet another reason why the Dutch Company Register should be opened up as open data: this will allow for better understanding of the networks of corporate control.

Method and data

Both de Volkskrant (2013, 2014) and NRC Handelsblad (2017) have published visualisations of the Dutch (corporate) elite and their board memberships. Note that these board memberships not only include companies, but also employers’ organisations, cultural institutions and other types of organisations the collectors of the data deemed relevant for analysing corporate elite networks.

Before comparisons can be made, the names of the organisations need to be cleaned up. Beyond correcting typos and dealing with additions like N.V. (plc) and B.V. (ltd), this involves deciding when to consider units as part of the same organisation. Pragmatically, I decided to treat businesses that are part of the same corporate structure as identical. This may not always be the ideal approach; on the other hand, it’s not always possible to determine what unit a name refers to (e.g. ING could refer to the holding or to one of its subsidiaries). I did treat foundations (e.g. charities linked to a company) as separate from the company.

There are different ways to measure the centrality of a node in a network. Taking my cue from Takes and Heemskerk, I used betweenness centrality, which is based on how often a node is on the shortest path between two other nodes. I calculated centrality for the entire network, that is, before taking a subgraph. I included endpoints to prevent many nodes having a score of zero.

I used the Python library networkx to analyse the graphs (here’s the code and here’s the accompanying text file for cleaning up organisation names). I used d3.js to visualise the network graphs - here’s a description of the problems I ran into and how I dealt with them.

Scraping LinkedIn profiles could be legal. Is that creepy?

An interesting American court ruling discusses whether the company hiQ Labs may legally scrape public LinkedIn profiles and sell analyses based on that information.

LinkedIn tries to portray this as an attack to its efforts to protect the privacy of its users. Some media seem to buy this line: Having a public profile just got more risky (the Independent); Is your boss checking up on you? Court rules software IS allowed to look for changes to your LinkedIn profile that suggest you’re quitting your job (Daily Mail).

It’s worthwhile to read the actual ruling. Judge Edward M. Chen pretty much trashes LinkedIn’s argument:

LinkedIn’s professed privacy concerns are somewhat undermined by the fact that LinkedIn allows other third-parties to access user data without its members’ knowledge or consent.

LinkedIn specifically refered to users who use the don’t broadcast feature, which prevents the site from notifying other users when these users make profile changes. hiQ could be violating these users’ privacy by informing their employers about profile changes, which may be an indication that they’re looking for another job.

However, hiQ presented marketing materials from LinkedIn suggesting LinkedIn does exactly the same:

Indeed, these materials inform potential customers that when they ‘follow’ another user, «[f]rom now on, when they update their profile or celebrate a work anniversary, you‘ll receive an update on your homepage. And don‘t worry – they don‘t know you‘re following them.»

All in all, LinkedIn has credibility issues when it claims it’s protecting its users’ privacy. Perhaps what hiQ does is creepy, but not more creepy than what LinkedIn is doing itself. (It’s a bit reminiscent of Facebook, which limited access to user data through it’s public API, claiming it was protecting its users’ privacy.)

There’s more to the case than just the privacy issue. For example, can a website owner prohibit the automated retrievel of otherwise public information? Can they sanction specific users for even looking at their website («effectuating the digital equivalence of Medusa»)? And is it ok for LinkedIn to use it’s dominant position in the professional networking market to stifle competition in a different market? It will be interesting to see how this develops.

Via. Ruling downloadable from the Register

Tags: 

Open company data in the Netherlands

Awkward: according to an Open Corporates ranking, the Netherlands is among the least transparant countries in Europe when it comes to company data. In many countries, the company register has been opened up as open data. Examples include the UK, France, Belgium, Romania, Bulgaria, Finland, Norway and Denmark (according to Open State).

In November 2015, the Dutch Lower House adopted a motion asking if the Dutch Company Register can be opened up. It took a while, but on 17 July this year, the Chamber of Commerce has published two datasets. Open State, an organisation that advocates for government transparency, is not impressed. Is their criticism justified?

The data

Two datasets have been published, and they will be updated on a weekly basis. One contains company data from the Company Register, including city, industry, establishment date, etc. The other contains data from annual accounts. The accounts are in a zip file containing 580,000 xml files.

The data has been anonymised. According to the Chamber of Commerce, this is necessary in order to protect the privacy of entrepreneurs. Incidentally, non-anonimised data is still available at a charge from the Chamber of Commerce.

Research institute TNO has also looked into the matter. It agrees that the privacy of entrepreneurs must be protected, but deems the solution (anonymising all data) unnecessarily drastic.

Anonymising data not only makes it impossible to look up data about individual companies, but also restricts the possibilities for data analysis. For example, it’s not possible to track changes over time at the company level.

The annual accounts

The open data contains only those annual accounts that companies have submitted digitally, in the right format. It contains 185,000 annual accounts for 2016, whereas 255,000 companies have filed their annual account for that year with the Chamber of Commerce (according to the Company Register dataset). Especially the accounts of some of the larger companies appear to be missing. For 2015 and before, even more accounts seem to be missing.

This means, among other things, that it’s not really possible to calculate aggregate amounts for industries. However, the Chamber of Commerce expects that more companies will file their annual account in digital form in the future.

Almost all annual accounts in the open data contain at least a few items from the balance sheet, but other essential data is missing:

  • In almost all cases, the income statement is missing (small companies are not required to file their income statement, but this information is also lacking for larger companies).
  • The number of employees is missing.
  • Over half the annual accounts lack an industry code.

Significant step?

Open State has called the publication of the data a «first small step». Given the limitations of the data, I can see their point.

The Chamber of Commerce quoted Minister Henk Kamp, who spoke of a «significant step». His statement was based on a report by the Chamber of Commerce. The report suggested that it would be possible to aggregate data by number of employees, or to analyse concentration ratios.

I’m afraid that’s not possible with the current data. In fact, one may ask whether it’s at all possible to draw conclusions from this data (I’m not the only one who’s asking that question). Hopefully, this is indeed just a first step towards a truly open company register.

Here’s a Python script that will download and unzip the data and store the annual accounts as a csv. This may take a while.

Tags: 

Airbnb’s agreement with Amsterdam: some insights from scraped data

Airbnb is under fire. The platform would harm the liveability of Amsterdam neighbourhoods and drive up house prices. In December last year, Amsterdam and Airbnb signed a Memorandum of Understanding (MOU) to deal with abuses. According to Airbnb, the agreement is already bearing fruit. Airbnb thinks Amsterdam should focus its enforcement efforts on other platforms. But Amsterdam wants to introduce a registration requirement for holiday lettings, a measure Airbnb vehemently opposes.

In this article, I analyse some of the changes that occurred since the announcement of the MOU. I use data from Murray Cox (Inside Airbnb) and Tom Slee, who have scraped the Airbnb website various times between May 2014 and May 2017 (scraping is automated retrieval of data from websites). Note that data about Airbnb is always controversial. Read this article to understand why the data collected by Cox and Slee is an important addition to the data provided by the company itself.

Sixty-day limit

Amsterdam residents may rent out their home sixty days per year. According to the new agreement, Airbnb will block advertisements when they exceed that limit. However, this applies only to entire homes and not to rooms, because these may be B&Bs, where the sixty-day cap doesn’t apply.

According to Airbnb, there has been a substantial decrease in the number of homes offered for rent more than sixty days per year. This would be evidence that the MOU is already reducing illegal offerings.

The chart below shows how many rooms and entire homes were available more than sixty days per year, using data from Murray Cox.

The chart appears to confirm what Airbnb has claimed: the number of entire homes available more than sixty days has gone down. However, this started in the first half of 2016, well before the MOU was signed (let alone implemented). So it appears that it was caused by something else.

Perhaps it was the threat of stricter enforcement by the government itself. On 16 February 2016, Amsterdam announced that it was creating its own scraper to collect information from home rental platforms such as Airbnb. In March, the Green Party and Social-Democrats filed motions to step up enforcement.

Room type changes

Does this mean the agreement between Amsterdam and Airbnb wasn’t a turning point? Perhaps it was - but in a different way.

Aggregate data about Airbnb listings are the result of a complex interplay of developments. Some listings are taken off the platform, and new ones are created. In addition, some hosts change the type of their listing - from room to entire home, or vice versa. This is shown in the chart below (using data from Tom Slee).

Until recently, few hosts changed the type of their listings. But since the announcement of the MOU, hundreds of listings have been changed from entire home to room. As indicated above, in the MOU, Airbnb promised to block advertisements for homes that have reached the sixty-day limit. Could it be that people changed their listings into «rooms» to evade the sixty-day limit?

I analysed listings that were changed from home to room between early March and early April 2017 (data from Murray Cox). In early April, over three-quarters of these listings were available more than sixty days per year. This would be consistent with the theory that hosts changed these homes into rooms because of the cap.

It’s possible that these hosts actually stopped renting out entire homes and started to rent out a single room instead. In that case, you’d expect them to have lowered the price and changed the description. However, the price was almost never lowered. Often, the host didn’t even change the description. In some cases, the description still explicitly says that guests have the entire home to themselves.

Incidentally, this is not the first time enforcement led to a large-scale conversion of homes into rooms. This has also happened in New York.

Conclusions

With the available data, it’s not possible to know with certainty what exactly happened over the past months. That said, there are indications the agreement between Amsterdam and Airbnb may be less effective than it seemed:

  • There has been a decrease in the number of entire homes offered for rent more than sixty days. However, this started well before the agreement was signed. It could be a result of (the threat of) enforcement by the government itself.
  • After the announcement of the agreement between Amsterdam and Airbnb, hundreds of entire homes were categorised as rooms. This could be a way for hosts to evade the sixty-day cap for entire homes.

Method and data

Both Murray Cox and Tom Slee frequently scrape the Airbnb website. Cox’ data is more detailed (including, for example, the texts used in advertisements and availability information). Slee collects his data more frequently, at least so for Amsterdam. Both Cox and Slee have made their data available as open data (thanks!).

Cox and Slee are not the only ones who collect data from the Airbnb website; there are commercial providers as well. In addition, the Amsterdam Municipality has started scraping the websites of Airbnb and other platforms. It appears Amsterdam only shares this data with city council members on a confidential basis.

As for room type changes: the data probably underestimates the actual number of changes, especially for the earlier periods. The reason is that you can only detect changes if an avertisement is in both the old and the new dataset. The longer the period between two measurements, the higher turnover will be (listings disappear, new ones are added) and therefore the higher the chance of missing room type changes.

Therefore, I did an additional calculation, correcting for the amount of overlap between the old and the new measurement. The result can be seen here. The picture is slightly different, but the conclusion stands: as of the end of 2016, there was a clear increase in home-to-room changes.

I used Python for the analysis. Here’s the code. As always, comments regarding the analysis and interpretation of the data are welcome.

Pages