Assignment 1-4

A little background: I’m using the Outlook on Life Surveys dataset and I’m interested in the relation between union membership and political participation (background here). In my previous assignment, I discussed variables on current employment status; union membership (household level); various political participation variables and a secondary variable on whether respondents have engaged in any of the forms of political participation. I’ve created a subset consisting only of respondents with paid jobs, which I’m using for my analyses.

In the fourth assignment, we’re asked to create univariate graphs for the variables to be used and to create a bivariate graph for the association between the independent and dependent variables. Here are some remarks on how I did this assignment:

I’ve posted my Python code here.

Opinion of unionised workers

How would you rate unionized workers on a 0-100 scale? Histogram of answers from OOL respondents with paid jobs

The histogram shows that the mode is between 50 and 60 (in fact, a look at the data reveals it’s 50); the median is 60 and the mean just above that. The measures of centrality suggest American respondents with paid jobs tend to have an indifferent or slightly positive view of unions.[2] The distribution is uneven: a substantial share of respondents have (very) negative views of unionised workers, a larger share have positive views and about one in four are indifferent.

The data suggests that the view Americans have of unionised workers is less polarised and somewhat more positive than I’d expected. Then again, as I noted earlier, there is a posibility that union members are overrepresented in the sample.

Union membership

Does anyone in your household currently belong to a union? Percentage for OOL respondents with paid jobs, missing values omitted.

The graph shows that about 20% of respondents indicate someone in their household is a union member. This is higher than I’d expect; as I just noted there’s a possibilty that union members are overrepresented in the sample.

Political participation

Have you engaged in any of the four forms of political participation in the last 2 years? Percentage for OOL respondents with paid jobs, missing values omitted.

The graph shows a composite measure of political participation. A positive score means respondents have contacted an official, participated in a protest or march or signed a petition in the past 2 years. This is the case for little less than half the respondents.

Association between union membership and political participation

Percentage who have engaged in any of the four forms of political participation in the last 2 years, for OOL respondents with paid jobs, by union membership (at household level).

Union member in household

And finally, the association between union membership (at household level) and political participation. Consistent with the hypothesis I formulated in the first assignment, political participation (at least by this measure) is higher for respondents with a union member in their household than for other respondents. Of course, the graph doesn’t tell us whether the correlation is statistically significant nor whether there’s a causal relationship between the two phenomena.


  1. I’m not yet entirely sure this is the best approach, but here are my considerations: my last graph (association) is categorical / categorical. I could include missing values using a stacked bar chart with different colours for ‘Yes’, ‘No’ and ‘missing’, but I’m afraid that would come at the expense of clarity. If I choose to omit missing values in the last graph, then the consistent thing to do would be to omit them in the other graphs as well. Like I said, I’m not yet entirely sure this is the best way to go, and that’s because I think it’s important to keep track of where the missing values are (especially when there are more of them). Perhaps the solution would be to do a combination of a graph and a frequency table and have the latter include missing values. Note that in my Python script I’ve opted for a slightly different approach by showing counts instead of percentages.  ↩

  2. I’m assuming that respondents interpret the scale to mean that 50 stands for indifferent, higher values for positive and lower for negative. I have to admit I’m not sure this is how American respondents would interpret the scale. For comparison, Dutch respondents might associate the scale with the 0–10 scale used in schools, on which a 6 means you pass and a 5 you fail, and correspondingly interpret 50 on a 0–100 scale as negative.  ↩

30 March 2016 | Categories: assignment, dai, data