Step by step: creating an R package

With the help of posts by Hillary Parker and trestletech I managed to create my first R package in RStudio (here’s why) . It wasn’t as difficult as I thought and it seems to work. Below is a basic step-by-step description of how I did it (this assumes you have one or more R functions to include in your package, preferably in separate R-script files):

If you want, you can upload the package to Github. Other people will then be able to install it:

library(devtools)
install_github('username/package-name')

Tags: 

A new balance in Amsterdam’s city council?

Last autumn, Amsterdam politicians discussed on Twitter whether the relations between coalition and opposition have changed since the March 2014 election, which resulted in a new coalition.

One way to look at this is to analyse voting behaviour on motions and amendments over the past two years. From a political perspective, proposals with broad support may not be very interesting:

For example, a party can propose a large number of motions that get very broad support, but materially change little in the stance, let alone the policy, of the government. In the litterature, this is sometimes referred to as «hurrah voting»: everybody yells «hurrah!», but is there any real influence? (Tom Louwerse)

In a sense, it could be argued that the same applies to proposals supported by the entire coalition. More interesting are what I’ll call x proposals: proposals that do not have the support of the entire coalition, but are adopted nevertheless. In the Amsterdam situation these are often proposals opposed by the right-wing VVD. The explanation is simple: Amsterdam coalitions tend to lean to the right (relative to the composition of the city council). As a result, left-wing coalition parties have more allies outside the coalition.

Let’s start with the situation before the March 2014 election. The social-democrat PvdA was the largest party. The coalition consisted of green party GroenLinks, PvdA and VVD, but the larger left-wing parties PvdA, GroenLinks and socialist party SP had a comfortable majority. The chart below shows the parties that introduced x proposals. The arrows show who they got support from to get these proposals adopted.

The size of the circles corresponds to the size of the parties; pink circles represent coalition parties. The thickness of arrows corresponds to the number of times one party supported another party’s x proposal. The direction of the arrows is not only shown by the arrow heads but also by the curvature: arrows bend to the right.

The image is clear: PvdA and especially GroenLinks were the main mediators who managed to gain support for x proposals.

And now the situation after March 2014. By now neoliberal party D66 is the largest party and the coalition consists of SP, D66 and VVD. This means that PvdA and GroenLinks are now opposition parties, but it turns out they still play a key role in getting x proposals adopted. GroenLinks initiated as many as half the x proposals.

The most active mediator is Jorrit Nuijens (GroenLinks), followed by Maarten Poorter (PvdA) and Femke Roosma (GroenLinks).

Method

Data is from the archive of the Amsterdam city council. Votes on motions and ammendments as of January 2013 can be downloaded as an Excel file. The file (downloaded on 31 January 2015) contains data on 1,165 (versions) of proposals, put to a vote until 17 December 2014.

A few things can be said about the Excel file. On the one hand, it’s great this information is being made available. On the other hand, the file is a bit of a beast that takes quite a few lines of code to control. The way in which voting is described varies (e.g., «rejected with the votes of the SP in favour», «adopted with the votes of the council members Drooge and De Goede against»); the structure of the title changed in November 2014; Partij voor de Dieren is sometimes abbreviated and sometimes not; and sometimes the text describing voting has been truncated, apparently because it didn’t fit into a cell. Given the complexity of the file, it can’t be exluded completely that proposals may have been classified incorrectly.

The analysis (by necessity) focuses on visible influence. The first name on the list of persons introducing a proposal is considered as the initiator. In reality, it will probably sometimes occur that an initiator will let someone else take credit for a proposal.

The code for cleaning and analysing the data is available here. The D3 code for the network graphs is based on this example.

Deceptive charts - do they work?

Anyone mildly interested in data visualisation must have come across examples of shamelessly deceptive Fox News charts. Truncated y-axes, distorted x-axes, messing with units - nothing’s too bold when it comes to manipulating the audience. But does this kind of deception actually work? Anshul Vikram Pandey and his colleagues at New York University decided (pdf) to find out. They showed subjects either control or deceptive versions of a number of charts.

The deceptive versions were: a bar chart with truncated y-axis; a bubble chart with one bubble too large relative to the other; a line chart with a more spread-out y-axis, resulting in a less steep rise than in the control version and a chart with an inverted y-axis (inspired by Reuters’ famous Gun Deaths in Florida chart - interesting discussion here). In all cases, the correct numbers were included in the chart.

Of course a truncated y-axis can sometimes be defensible and needn’t be deceptive, as long as it is made clear what’s going on. More problematic is the aspect ratio chart. The authors claim the chart to the right is deceptive and the one to the left not, but how can you tell? You can’t. There’s no rule that says what the number of pixels per year on the x-axis should be.

Be that as it may, the authors found substantial differences in how the deceptive charts were interpreted compared to the control charts. Note that in most cases, they didn’t measure whether deceptive charts were interpreted incorrectly, just whether they were interpreted differently than the control charts. For example, participants were asked how much better access to drinking water was in Silvatown, represented by the bar to the right of the bar plot, relative to Willowtown, represented by the bar to the left (on a 5-point Likert scale ranging from slightly better to substantially better). When shown the control bar chart, the average score was 1.45; with the truncated y-axis the average score was 2.77.

The authors also tried to find out whether factors such as education and familiarity with charts had an influence on how charts were interpreted. It appears that people who are familiar with charts are less easily fooled by a truncated y-axis. Perhaps because truncated y-axes are second on the list of phenomena chart geeks love to hate and criticise (after 3D exploding pie charts, of course).

Tags: 

Peak economist

On Friday, the New York Times published an interesting article by Justin Wolfers about the kind of experts the paper mentions. Don’t worry, he’s aware of the methodological issues:

While the idea of measuring influence through newspaper mentions will elicit howls of protest from tweed-clad boffins sprawled across faculty lounges around the country, the results are fascinating.

To summarize: by his measure, economists have become the most influential profession among the social sciences and their influence rises during economic crises. Or at least so in the New York Times. I looked up data for the Dutch newspaper NRC Handelsblad, which has data available from 1990.

Some conclusions can be drawn:

  • The current ranking is the same as for the NYT, with economists heading the list and demographers at the bottom;
  • Apparently, NRC Handelsblad has always had a pretty high regard for historians, but due to the crisis they lost their top position to economists;
  • There was a peak in mentions of psychologist in 2012, but some of that can be ascribed to reports of scientific fraud by psychologist Diederik Stapel.

For comparison, I tried reproducing Wolfers’ NYT chart for the years 1990 - 2014. Here’s what I got:

The sudden increase for all professions in 2014 is unexpected - see Method for possible explanations. If we leave 2014 aside, what emerges is that «peak economist» (to borrow an expression from Wolfers) seems to have happened earlier in the NYT than in NRC Handelsblad. Perhaps something to do with the fact that the crisis hit the US earlier than Europe.

Method

The NYT data were downloaded from the NYT Chronicle Tool (I had to separately download the data for each search term). Data from NRC Handelsblad were downloaded using the website’s search function. In order to get the total numbers per year I also did a search using «de» («the») as a search term («de» is the most frequently used word in written Dutch).

As indicated in the article, I got a steep rise in the percentages for all professions in the NYT in 2014. I manually checked some of the percentages I got against those in the chart of the NYT Chronicle Tool, and these appear to be correct. The spike is not visible in Wolfers’ chart, but that may be due to the fact that he uses three-year averages.

There may be an issue with the denominator, i.e. the total number of articles. The number for total_articles_published in the data I downloaded from the NYT was pretty stable at about 100,000 between 1990 and 2005. Then it rose to about 250,000 in 2013 (perhaps something to do with changed archiving practices, or with online publishing?). However, in 2014, it dropped to about one-third of the 2013 level.

The NRC Handelsblad data also has some fluctuations in the total number of articles per year, but less extreme and at first sight they don’t seem to coincide with unexpected fluctuations in the percentages of articles mentioning professions.

Code is available here.

Tags: 

Are the social-democrats getting enough seats in the Dutch Senate

This weekend, the Dutch social-democrat PvdA will decide on the list of candidates for the Senate election this spring. The party isn’t doing too well in the polls, but it may be facing an additional problem, as the charts below illustrate.

Since the beginning of the 1980s, the PvdA has nearly always had a weaker position in the Senate than in the Lower House. The main exception is 2002, when the Lower House election took place within days after the murder of rightwing populist Pim Fortuyn and the PvdA, seen by many as a symbol of the establishment, temporarily lost half its seats.

The relatively weak position of the PvdA in the Senate may be a coincidence, but it could also be related to turnout. In elections for the provincial councils, which in turn elect the Senate, almost half the voters stay at home (compared to a 75–80% turnout in Lower House elections). It may well be that the way in which the Senate is elected has a negative impact on the outcome for the PvdA.

Sources

Data from the Election Council and Wikipedia (e.g., EK and TK). Data and script are available here.

Tags: 

Pages