salonanarchist | leunstoelactivist

Data

Scraping LinkedIn profiles could be legal. Is that creepy?

An interesting American court ruling discusses whether the company hiQ Labs may legally scrape public LinkedIn profiles and sell analyses based on that information.

LinkedIn tries to portray this as an attack to its efforts to protect the privacy of its users. Some media seem to buy this line: Having a public profile just got more risky (the Independent); Is your boss checking up on you? Court rules software IS allowed to look for changes to your LinkedIn profile that suggest you’re quitting your job (Daily Mail).

It’s worthwhile to read the actual ruling. Judge Edward M. Chen pretty much trashes LinkedIn’s argument:

LinkedIn’s professed privacy concerns are somewhat undermined by the fact that LinkedIn allows other third-parties to access user data without its members’ knowledge or consent.

LinkedIn specifically refered to users who use the don’t broadcast feature, which prevents the site from notifying other users when these users make profile changes. hiQ could be violating these users’ privacy by informing their employers about profile changes, which may be an indication that they’re looking for another job.

However, hiQ presented marketing materials from LinkedIn suggesting LinkedIn does exactly the same:

Indeed, these materials inform potential customers that when they ‘follow’ another user, «[f]rom now on, when they update their profile or celebrate a work anniversary, you‘ll receive an update on your homepage. And don‘t worry – they don‘t know you‘re following them.»

All in all, LinkedIn has credibility issues when it claims it’s protecting its users’ privacy. Perhaps what hiQ does is creepy, but not more creepy than what LinkedIn is doing itself. (It’s a bit reminiscent of Facebook, which limited access to user data through it’s public API, claiming it was protecting its users’ privacy.)

There’s more to the case than just the privacy issue. For example, can a website owner prohibit the automated retrievel of otherwise public information? Can they sanction specific users for even looking at their website («effectuating the digital equivalence of Medusa»)? And is it ok for LinkedIn to use it’s dominant position in the professional networking market to stifle competition in a different market? It will be interesting to see how this develops.

Via. Ruling downloadable from the Register

Open company data in the Netherlands

Awkward: according to an Open Corporates ranking, the Netherlands is among the least transparant countries in Europe when it comes to company data. In many countries, the company register has been opened up as open data. Examples include the UK, France, Belgium, Romania, Bulgaria, Finland, Norway and Denmark (according to Open State).

In November 2015, the Dutch Lower House adopted a motion asking if the Dutch Company Register can be opened up. It took a while, but on 17 July this year, the Chamber of Commerce has published two datasets. Open State, an organisation that advocates for government transparency, is not impressed. Is their criticism justified?

The data

Two datasets have been published, and they will be updated on a weekly basis. One contains company data from the Company Register, including city, industry, establishment date, etc. The other contains data from annual accounts. The accounts are in a zip file containing 580,000 xml files.

The data has been anonymised. According to the Chamber of Commerce, this is necessary in order to protect the privacy of entrepreneurs. Incidentally, non-anonimised data is still available at a charge from the Chamber of Commerce.

Research institute TNO has also looked into the matter. It agrees that the privacy of entrepreneurs must be protected, but deems the solution (anonymising all data) unnecessarily drastic.

Anonymising data not only makes it impossible to look up data about individual companies, but also restricts the possibilities for data analysis. For example, it’s not possible to track changes over time at the company level.

The annual accounts

The open data contains only those annual accounts that companies have submitted digitally, in the right format. It contains 185,000 annual accounts for 2016, whereas 255,000 companies have filed their annual account for that year with the Chamber of Commerce (according to the Company Register dataset). Especially the accounts of some of the larger companies appear to be missing. For 2015 and before, even more accounts seem to be missing.

This means, among other things, that it’s not really possible to calculate aggregate amounts for industries. However, the Chamber of Commerce expects that more companies will file their annual account in digital form in the future.

Almost all annual accounts in the open data contain at least a few items from the balance sheet, but other essential data is missing:

  • In almost all cases, the income statement is missing (small companies are not required to file their income statement, but this information is also lacking for larger companies).
  • The number of employees is missing.
  • Over half the annual accounts lack an industry code.

Significant step?

Open State has called the publication of the data a «first small step». Given the limitations of the data, I can see their point.

The Chamber of Commerce quoted Minister Henk Kamp, who spoke of a «significant step». His statement was based on a report by the Chamber of Commerce. The report suggested that it would be possible to aggregate data by number of employees, or to analyse concentration ratios.

I’m afraid that’s not possible with the current data. In fact, one may ask whether it’s at all possible to draw conclusions from this data (I’m not the only one who’s asking that question). Hopefully, this is indeed just a first step towards a truly open company register.

Here’s a Python script that will download and unzip the data and store the annual accounts as a csv. This may take a while.

De open data van de Kamer van Koophandel

Pijnlijk: Nederland is een van de minst transparante landen van Europa als het gaat om bedrijfsinformatie. In veel landen is het Handelsregister opengesteld als open data. Voorbeelden zijn Groot-Brittanië, Frankrijk, België, Roemenië, Bulgarije, Finland, Noorwegen en Denemarken (aldus Open State).

In november 2015 heeft de Tweede Kamer een motie aangenomen die vraagt of het Nederlandse handelsregister open kan worden gesteld. Het heeft even geduurd, maar op 17 juli dit jaar heeft de Kamer van Koophandel twee datasets gepubliceerd. Open State, een organisatie die zich inzet voor een transparante overheid, is niet echt enthousiast. Terecht?

De gegevens

Er zijn twee datasets gepubliceerd die wekelijks worden bijgewerkt. De ene bevat bedrijfsgegevens uit het Handelsregister zoals woonplaats, sector, datum oprichting, etc. De andere bevat gegevens uit jaarrekeningen. De jaarrekeningen zitten in een zip-bestand met 580.000 xml-bestanden.

De gegevens zijn geanonimiseerd. Volgens de Kamer van Koophandel is dat noodzakelijk om de privacy van ondernemers te beschermen. Overigens zijn niet-geanonimiseerde gegevens tegen betaling wel verkrijgbaar bij de Kamer van Koophandel.

TNO heeft zich hier ook over gebogen. Het onderzoeksinstituut vindt het terecht om rekening te houden met de privacy van ondernemers, maar vindt de gekozen oplossing (alles anonimiseren) onnodig drastisch.

Het anonimiseren maakt het niet alleen onmogelijk om gegevens over een individueel bedrijf op te zoeken; het beperkt ook de mogelijkheden om gegevens te analyseren. Je kan bijvoorbeeld niet op bedrijfsniveau ontwikkelingen in de tijd volgen.

De jaarrekeningen

In de gepubliceerde gegevens zitten alleen jaarrekeningen die door bedrijven digitaal en in het gewenste bestandsformaat zijn aangeleverd. Er zitten 185.000 jaarrekeningen over 2016 bij, terwijl 255.000 bedrijven hun jaarrekening over dat jaar bij de Kamer van Koophandel hebben gedeponeerd (volgens de Handelsregister dataset). Het lijkt erop dat vooral grotere bedrijven ontbreken. Voor eerdere jaren lijken nog meer jaarrekeningen te ontbreken.

Dit betekent onder meer dat je geen totaalbedragen per sector kan berekenen. Overigens verwacht de Kamer van Koophandel dat in de toekomst meer bedrijven hun jaarrekening digitaal zullen aanleveren.

Bijna alle jaarrekeningen in de open dataset bevatten op zijn minst enkele posten uit de balans, maar andere essentiële informatie ontbreekt:

  • In bijna alle gevallen ontbreekt een winst- en verliesrekening (kleine bedrijven hoeven geen winst- en verliesrekening te deponeren, maar ook voor grotere bedrijven ontbreekt deze informatie).
  • Het aantal werknemers ontbreekt.
  • Ruim de helft van de jaarrekeningen bevat geen sectoraanduiding (SBI-code).

Betekenisvolle stap?

Open State noemt de publicatie van de gegevens slechts «een eerste kleine stap». Gezien de beperkingen kan ik me daar wel iets bij voorstellen.

De Kamer van Koophandel citeert minister Henk Kamp, die sprak van een «betekenisvolle stap». Zijn uitspraak was gebaseerd op een rapport dat de Kamer van Koophandel zelf had opgesteld. Dat rapport suggereerde dat het mogelijk zou worden om aggregaties te maken op basis van het aantal werknemers, of om bijvoorbeeld de concentratie van bepaalde typen bedrijven te onderzoeken.

Met de huidige datasets kan dat niet, lijkt me. Sterker, het is de vraag of je überhaupt conclusies kan verbinden aan deze gegevens (en ik ben niet de enige die zich dit afvraagt). Hopelijk is dit inderdaad slechts een eerste stap naar een echt open handelsregister.

Hier is een Python-script dat de gegevens downloadt en unzipt en de jaarrekeningen opslaat als csv. Dit kan een tijdje duren.

Gemeente: «geparkeerde fiets neemt veel openbare ruimte in beslag»

De foto hierboven zou je kunnen opvatten als commentaar op het nieuwe Meerjarenplan Fiets van de gemeente Amsterdam. Daarin staat:

Van de Amsterdammers zet 43% zijn of haar fiets in de openbare ruimte: alleen al bijna 350.000 fietsen binnen de Ring A10 ten zuiden van het IJ. Een geparkeerde fiets neemt gemiddeld zo’n twee vierkante meter ruimte in. Dit neemt veel van de openbare ruimte in beslag.

Veel ruimte? Het zijn vooral de autoparkeerplekken die veel ruimte innemen. Naar aanleiding van een tweet van Marco te Brömmelstroet heb ik ooit uitgerekend dat je 2,1 miljoen fietsenrekken zou kunnen neerzetten op de plek die nu in beslag wordt genomen door autoparkeerplaatsen. Met cijfers van de gemeente kom je zelfs nog hoger uit: ruimte voor 2,65 miljoen fietsenrekken (volgens de gemeente zijn er 265.000 autoparkeerplaatsen op straat die elk 20m2 innemen, en heeft een fietsenrek maar 2m2 nodig).

Elders in het Meerjarenplan erkent de gemeente overigens dat auto’s veel meer ruimte innemen dan fietsen. In drukke buurten wil ze de mogelijkheden verkennen voor «een andere verdeling en slim dubbelgebruik van de ruimte voor voetganger, fiets(parkeren), auto(parkeren) en andere voorzieningen». Dat kan betekenen dat er wordt afgeweken van de parkeernormen, maar een echte keuze wordt (nog) niet gemaakt.

Amsterdam worstelt al zeker tien jaar met een tekort aan fietsenrekken. De afgelopen jaren zijn er 16.000 plekken bijgekomen, maar dat is waarschijnlijk te weinig om de groei van het fietsgebruik bij te houden.

Als je echt iets wil doen aan het tekort aan fietsenrekken, dan is daar makkelijk ruimte voor de vinden. De foto hierboven laat dat zien.

De foto is gemaakt door Marieke de Lange en staat op de voorpagina van de OEK, het ledenblad van de Fietsersbond Amsterdam. De OEK is hier te vinden. Leden van de Fietsersbond krijgen hem in de bus. De cijfers komen uit het Meerjarenplan, behalve het aantal autoparkeerplekken; dat staat in de Thermometer Bereikbaarheid.

Airbnb’s agreement with Amsterdam: some insights from scraped data

Airbnb is under fire. The platform would harm the liveability of Amsterdam neighbourhoods and drive up house prices. In December last year, Amsterdam and Airbnb signed a Memorandum of Understanding (MOU) to deal with abuses. According to Airbnb, the agreement is already bearing fruit. Airbnb thinks Amsterdam should focus its enforcement efforts on other platforms. But Amsterdam wants to introduce a registration requirement for holiday lettings, a measure Airbnb vehemently opposes.

In this article, I analyse some of the changes that occurred since the announcement of the MOU. I use data from Murray Cox (Inside Airbnb) and Tom Slee, who have scraped the Airbnb website various times between May 2014 and May 2017 (scraping is automated retrieval of data from websites). Note that data about Airbnb is always controversial. Read this article to understand why the data collected by Cox and Slee is an important addition to the data provided by the company itself.

Sixty-day limit

Amsterdam residents may rent out their home sixty days per year. According to the new agreement, Airbnb will block advertisements when they exceed that limit. However, this applies only to entire homes and not to rooms, because these may be B&Bs, where the sixty-day cap doesn’t apply.

According to Airbnb, there has been a substantial decrease in the number of homes offered for rent more than sixty days per year. This would be evidence that the MOU is already reducing illegal offerings.

The chart below shows how many rooms and entire homes were available more than sixty days per year, using data from Murray Cox.

The chart appears to confirm what Airbnb has claimed: the number of entire homes available more than sixty days has gone down. However, this started in the first half of 2016, well before the MOU was signed (let alone implemented). So it appears that it was caused by something else.

Perhaps it was the threat of stricter enforcement by the government itself. On 16 February 2016, Amsterdam announced that it was creating its own scraper to collect information from home rental platforms such as Airbnb. In March, the Green Party and Social-Democrats filed motions to step up enforcement.

Room type changes

Does this mean the agreement between Amsterdam and Airbnb wasn’t a turning point? Perhaps it was - but in a different way.

Aggregate data about Airbnb listings are the result of a complex interplay of developments. Some listings are taken off the platform, and new ones are created. In addition, some hosts change the type of their listing - from room to entire home, or vice versa. This is shown in the chart below (using data from Tom Slee).

Until recently, few hosts changed the type of their listings. But since the announcement of the MOU, hundreds of listings have been changed from entire home to room. As indicated above, in the MOU, Airbnb promised to block advertisements for homes that have reached the sixty-day limit. Could it be that people changed their listings into «rooms» to evade the sixty-day limit?

I analysed listings that were changed from home to room between early March and early April 2017 (data from Murray Cox). In early April, over three-quarters of these listings were available more than sixty days per year. This would be consistent with the theory that hosts changed these homes into rooms because of the cap.

It’s possible that these hosts actually stopped renting out entire homes and started to rent out a single room instead. In that case, you’d expect them to have lowered the price and changed the description. However, the price was almost never lowered. Often, the host didn’t even change the description. In some cases, the description still explicitly says that guests have the entire home to themselves.

Incidentally, this is not the first time enforcement led to a large-scale conversion of homes into rooms. This has also happened in New York.

Conclusions

With the available data, it’s not possible to know with certainty what exactly happened over the past months. That said, there are indications the agreement between Amsterdam and Airbnb may be less effective than it seemed:

  • There has been a decrease in the number of entire homes offered for rent more than sixty days. However, this started well before the agreement was signed. It could be a result of (the threat of) enforcement by the government itself.
  • After the announcement of the agreement between Amsterdam and Airbnb, hundreds of entire homes were categorised as rooms. This could be a way for hosts to evade the sixty-day cap for entire homes.

Method and data

Both Murray Cox and Tom Slee frequently scrape the Airbnb website. Cox’ data is more detailed (including, for example, the texts used in advertisements and availability information). Slee collects his data more frequently, at least so for Amsterdam. Both Cox and Slee have made their data available as open data (thanks!).

Cox and Slee are not the only ones who collect data from the Airbnb website; there are commercial providers as well. In addition, the Amsterdam Municipality has started scraping the websites of Airbnb and other platforms. It appears Amsterdam only shares this data with city council members on a confidential basis.

As for room type changes: the data probably underestimates the actual number of changes, especially for the earlier periods. The reason is that you can only detect changes if an avertisement is in both the old and the new dataset. The longer the period between two measurements, the higher turnover will be (listings disappear, new ones are added) and therefore the higher the chance of missing room type changes.

Therefore, I did an additional calculation, correcting for the amount of overlap between the old and the new measurement. The result can be seen here. The picture is slightly different, but the conclusion stands: as of the end of 2016, there was a clear increase in home-to-room changes.

I used Python for the analysis. Here’s the code. As always, comments regarding the analysis and interpretation of the data are welcome.

Pages