champagne anarchist | armchair activist

Python

Just 13% of my Linkedin connections use buzzwords

Linkedin recently released it’s newest analysis of overused buzzwords in members’ profiles. Of course, this is just a ploy to get you to volunteer more personal details («Update your profile today!»), but never mind that.

Just in case, I checked whether any of my connections engage in overusing buzzwords. Reassuringly, the majority can’t even be bothered to fill out «summaries» and «specialties» in the first place. Those who do, seldom use the top-ten buzzwords for the Netherlands, as the table below shows.

Term Percentage Term Percentage
Verantwoordelijk 0.0 Responsible 0.5
Strategisch 1.8 Strategic 3.2
Expert 3.7 Expert 3.7
Creatief 0.9 Creative 1.8
Innovatief 0.0 Innovative 1.4
Dynamisch 0.0 Dynamic 0.5
Gedreven 0.5 Motivated 0.9
Duurzaam 0.0 Sustainable 1.4
Effectief 0.0 Effective 0.5
Analytisch 0.9 Analytic 0.5

Only two Dutch buzzwords are used by more than one percent of my connections. Interestingly, their English equivalents are slightly more prevalent. 87% of my connections are completely buzzword-free. For what it’s worth, people who use buzzwords also tend to have more connections.

Full disclosure: I may have used the word «strategic» in my own profile.

Update 21 Dec - Don’t push it, Linkedin.

Incidentally, 436,567 people, that’s less than 0.17% of all the 259m Linkedin users. Not that impressive.

Method

I used these scripts (in part adapted from Matthew Russell’s Mining the Social Web) to get the «summaries» and «specialties» of connections from the Linkedin Api and process them.

Mijn Facebookvrienden vinden FNV Schoongenoeg leuk. En Hans Spekman

Viz3

Eigenlijk ben ik niet zo’n fan van Facebook, maar nu ik doorkrijg wat voor analyses je ermee kan doen begin ik er ook wel een beetje de lol van in te zien. Hierboven zie je m’n Facebooknetwerk. Ik heb opgezocht welke pagina’s mensen leuk vinden. Het populairst is FNV Schoongenoeg, de pagina van de schoonmakerscampagne; deze is door 45 mensen in m’n netwerk geliked. Terecht.

Andere voorbeelden van populaire pagina’s zijn FNV Supermarkt (34), Hans Spekman (22, daar keek ik van op) en de campagne om van 1 mei een nationale feestdag te maken (13).

Dit betekent niet dat mensen in m’n netwerk alleen maar pagina’s leuk vinden die met vakbonden of politiek te maken hebben. Zo vinden ze samen 631 pagina’s leuk die met muziek te maken hebben, maar ze vinden meestal niet dezelfde muziekpagina’s leuk. Wie FNV Schoongenoeg leuk vindt deelt deze like met 44 anderen; degene die Mark E. Smith (The Fall) leuk vindt deelt deze like met niemand anders. (Dezelfde persoon blijkt ook Iggy And The Stooges leuk te vinden. Mooi zo.)

De grafiek hierboven laat zien dat hier een patroon in zit. Lichtblauwe cirkels zijn mensen die hun voorkeuren gemiddeld met weinig anderen delen (althans, weinig anderen binnen m’n netwerk). Donkerblauwe cirkels zijn mensen die juist vaak pagina’s leuk vinden die anderen in m’n netwerk ook leuk vinden. Daar zitten veel mensen tussen met een achtergrond in de vakbeweging. Daar wordt flink campagne gevoerd; wellicht zorgt dat ervoor dat bepaalde pagina’s door veel mensen worden geliked.

Als je wil weten welke cirkel je bent in de grafiek hierboven, laat het dan even weten.

Methode

Een flink deel van de analyse is gebaseerd op de cursus Social Network Analysis van Lada Adamic en hoofdstuk 2 van Mining the Social Web van Matthew Russel. Allebei aanbevolen. Ik heb Python gebruikt om gegevens te ontfutselen aan de Facebook Graph API en om de gegevens te verwerken (mensen die in de privacysettings hun likes hebben afgeschermd heb ik bij de analyse buiten beschouwing gelaten). De scripts zijn hier te vinden. De grafiek heb ik gemaakt met Gephi.

Overigens leert een zogenaamde modulariteitsanalyse dat de groep die ik had aangemerkt als mensen met een vakbondsachtergrond in feite uit twee clusters (zie deze grafiek) bestaat: één met vooral mensen die betrokken zijn bij mijn eigen bond en één met mensen die bij andere bonden en sociale bewegingen betrokken zijn. De eerste van deze clusters heeft het meeste gedeelde likes.

Why some pages are popular among my Facebook friends - and others not

Viz3

I’m not crazy about Facebook, but now that I’m finding out what kind of analyses can be done with it, I’m starting to appreciate the fun of it. Above is my Facebook network. I’ve looked up which pages people like. Most popular is FNV Schoongenoeg, the page of the Dutch cleaners’ campaign for decent work, which has been liked by 45 people in my network. Rightly so.

Other examples of popular pages include FNV Supermarkt, the trade union page for supermarket workers (34); Hans Spekman, the page of the leader of the social-democrat party (22); and the page of the campaign to make 1 May a national holiday in the Netherlands (13).

This doesn’t mean that people in my network only like pages related to trade unions or politics. For example, they jointly like 631 pages related to music – but they don’t often like the same music pages. Someone who likes FNV Schoongenoeg shares this like with 44 others; the person who likes Mark E. Smith (The Fall) shares this like with no-one else. (It turns out the same person also likes Iggy And The Stooges. Good.)

The graph above shows there’s a pattern to all this. Light blue circles are people who, on average, share their likes with few others (that is, few others within my network). Dark blue circles are people who tend to like pages that are also liked by others in my network. This group includes many people with a trade union background. Quite a bit of campaigning goes on in those circles, which is perhaps why some pages are liked by a lot of people.

If you’d like to know which circle you are in the graph above, let me know.

Method

Much of this analysis is based on Lada Adamic’ Social Network Analysis course and chapter 2 of Mining the Social Web by Matthew Russell. Both highly recommended. I used Python to retrieve the data from the Facebook Graph API and for processing (I excluded people who chose not to display their likes in their privacy settings). The scripts can be found here. I used Gephi to create the graph.

Incidentally, a modularity analysis shows that the group I interpreted als people with a background in the trade union movement in fact consists of two clusters (see this graph): one consisting mainly of people involved with my own union, and one consisting of people involved with other unions and social movements. The first of those clusters has the highest level of shared likes.

Script to look up the gender of Dutch first names


This script determines the gender of Dutch persons by looking up their first name in a database of the Meertens Institute. The database indicates how often the name occurred as a first name for men and women in 2010. If the name is used for women substantially more often than for men, the name will be interpreted as female – and vice versa.

The reason I wrote the script has to to with this article on how the performance of women professional road cyclists is improving. I wanted to check whether a similar trend is going on among amateur riders, more specifically, participants in the Gerrie Knetemann Classic (incidentally, the script would take Knetemann for a woman – it’s not foolproof). The results of the ride are available online, but pre-2012 editions lack information on the gender of participants. So that’s what the script was for.

Speed of participants in Knetemann Classic

The results of the analysis aren’t exactly clearcut. The number of women participants in the 150km ride varied from 36 to 46, or 5 to 8% of the participants whose gender could be determined (the percentage for 2013 was 6%). The (median) speed of women participants rose in 2013, and more so than for men, but this rather thin to speak of a trend.

Cycling: Garmin altimeter compared to elevation databases

During a very rainy ride in Scotland, my Garmin altimeter appeared to be off: on some of the steepest climbs it failed to register any gradient. Afterwards, I tried the «elevation correction» feature on the Garmin website, which generously added over 750m to the total ascent the device had measured. This was certainly more satisfying, but it left me wondering. Can the weather affect the Garmin altimeter? And how accurate is the recalculated ascent?

Garmin’s recalculation service works basically by looking up the gps locations of your ride in an elevation database. Strava offers a similar service. Below, I analyse the Garmin and Strava recalculations for a number of rides. Note that this is only an exploratory analysis and that no firm conclusions can be drawn on the basis of this rather small set of observations. That said, here are some preliminary conclusions:

  • If you want to boost your ego, let Garmin recalculate your ascent: chances are it will add (quite) a few metres. Strava’s recalculations tend to stay closer to the original measurement. When it does make changes, it frequently lowers the number of metres you’re supposed to have climbed, especially on relatively flat rides.
  • In theory, you’d expect weather changes to affect the ascent measured by the device, because the altimeter is basically a barometer. In practice, weather changes don’t seem to have much effect on the altimeter.
  • It appears plausible that heavy rain does in fact mess with the altimeter.

In the graphs below, the colour of the dots represents the region of the ride. Red dots represent the Ronde Hoep, a flat ride to the south of Amsterdam. Blue ones represent the Kopje van Bloemendaal (north, south), the closest thing to a climb near Amsterdam (it’s not high but quite steep). Green dots represent the central area of the country and include the Utrechtse Heuvelrug, Veluwezoom, Rijk van Nijmegen and Kreis Kleve (the latter in Germany).

General

By default, the graph above shows how much the Garmin recalculation differs from the ascent measured by the device (graphs may not show in older versions of Internet Explorer). The closer a dot is to the dashed line, the the closer the recalculated ascent is to the original measurement.

For rides shown on the left part of the graph, where the device measured less than 500m ascent, Garmin’s recalculation often adds about 50 to 100% or more. With higher ascents, the recalculated ascent is closer to the original measurement, although it still tends to add about 30 to 50%. The highest dot to the far right of the graph is the rainy ride in Scotland; here Garmin’s recalculation added over 35%.

With the selector above the graph, you can select the Strava recalculation. You’ll notice the scale on the y axis changes (and the dashed line moves up). Also, a few red dots enter the graph. These are rides along the Ronde Hoep, which is a flat ride. For these rides, Garmin’s recalculation added up to 750% to the ascent measured by the device; therefore these dots were initially outside the graph area.

The Strava recalculations are similar to the Garmin ones in that the correction is larger for relatively flat rides. Unlike Garmin, Strava lowers the ascent in these cases, often by 15 to 50%. For rides where the device measured a total ascent of over 500m, the Strava recalculation tends to be pretty close to the original measurement.

Weather changes

It has been suggested that changes in the weather may affect elevation measurements. This makes sense, since the Garmin altimeter is in fact a barometer. Wikipedia says that pressure decreases by about 1.2 kPa for every 100 metres in ascent. In other words, if net atmospheric pressure would rise by 6 mBar, this would cause the device to underestimate total ascent by about 50 metres, so the theoretical effect wouldn’t seem to be huge.

The graph above shows how much recalculations differed from the original measurement, with change in pressure on the x axis. Note that the effect of recalculations is here in metres, not percent. I tried different combinations of pressure measures and recalculations and in only one case - the Garmin recalculation shown above - the correlation was statistically significant (and the regression line much steeper than the Wikipedia data would suggest), so this is not exactly firm evidence for an effect of weather change on elevation measurement.

Heavy rain

It has been suggested that heavy rain may block the sensor hole and thus affect elevation measurement. This may sound a bit weird, but I have seen the device stop registering any ascent during very heavy rain. Among the rides considered here, there are two that saw really heavy rainfall (the Scottish ride and a ride in Utrechtse Heuvelrug on 27 July). These do show some of the largest corrections, especially in the Strava recalculation. So it does seem plausible that rain does in fact affect elevation measurement.

In the spirit of true pseudoscientific enquiry, I tried to replicate the effect of heavy rain by squirting water from my bidon onto the device during a ride in Utrechtse Heuvelrug. This didn’t yield straightforward results. At first, the device registered implausibly steep gradients and it turned out it had interpreted the hump between Maarn and Doorn as 115m high, more than twice its real height. About halfway, unpredicted rain started to fall, mocking my experiment. Strava recalculation didn’t change much to the total ascent but it did correct the height of the bit between Maarn and Doorn, so it must have added some 50+ metres elsewhere. Be it as it may, the «experiment» does seem to confirm that water can do things to the altimeter.

Method

I took total ascent data measured by my Garmin Edge 800 and obtained a recalculation from the Garmin Connect and Strava websites. Subsequently, I looked up weather data from Weather Underground (as an armchair activist I do appreciate their slightly subversive name). Weather Underground offers historical weather data by location, with numerous observations per day. I wrote a Python script that looks up the data for the day and location of the ride and then selects the observations that roughly overlap with the duration of the ride. There turned out to be two limitations to the data. First, it appears that only data at the national level are available (the Scottish ride yielded data for London and all Dutch ones data for Amsterdam). Second, for the day / location combinations I tried there was no time-specific data for precipitation available, only for the entire day.

Because of these limitations, I also took an alternative approach, looking up data from the Royal Netherlands Meteorological Institute KNMI. This did yield more fine-grained data, although obviously limited to the Netherlands. In the end it turned out that it didn’t make much difference for the analysis whether KNMI or Weather Underground data is used. Code from the scripts I used for looking up weather data is here.

I tested quite a few correlations so a couple of ‘false positives’ may be expected. I didn’t statistically correct for this. Instead, I took a rather pragmatic approach: I’m cautious when there’s simply a significant correlation between two phenomena but I’m more confident when there’s a pattern to the correlations (e.g., Garmin and Strava recalculations are correlated in a similar way to another variable).

Pages