I’m using the OOL Surveys dataset and I’m interested in the association between union membership and political participation in the US (more specifically, between union membership at the household level and having engaged in at least one out of four forms of political participation over the past 2 years).
In the current assignment, we’re asked to run a chi square test of independence to figure out whether two categorical variables are related. If the outcome is significant and the explanatory variable has more than two levels, we’re required to carry out and interpret a post-hoc test. This would mean carrying out comparisons between all pairs of categories for the explanatory variable and dividing the required significance level (for example, 0.05) by the number of comparisons.
I’m in a bit of luck this time. First, my original research question concerns the relation between two categorical variables, so there’s no need to recode quantitative variables to categorical ones or to look for other variables. Second, my explanatory variable has only two levels (respondents either do or don’t have a union member in their household), so there’s no need to do a post-hoc test.
The entire Python script for my analysis can be found here. Here’s an excerpt from the script:
# contingency table of observed counts ct1=pandas.crosstab(sub2['ANY'], sub2['W1_P8']) print (ct1) print() # column percentages colsum=ct1.sum(axis=0) colpct=100*ct1/colsum print(colpct) print () # chi-square print ('chi-square value, p value, expected counts') cs1= scipy.stats.chi2_contingency(ct1) print (cs1)
And here’s the relevant output:
W1_P8 No Yes ANY No 445 89 Yes 365 125 W1_P8 No Yes ANY No 54.938272 41.588785 Yes 45.061728 58.411215 chi-square value, p value, expected counts (11.559955638910083, 0.00067387460877846761, 1, array([[ 422.40234375, 111.59765625], [ 387.59765625, 102.40234375]]))
Among respondents with union members in their household, the percentage who have engaged in political participation is higher (58%) than among other respondents (45%). There are 125 participants who have a union member in their household and who have engaged in political participation; had there been no relation between the two variables a lower number (102) were to be expected. For other answer categories, the observed values also differ from the values that were to be expected if there were no relation between the variables.
The chi square value is 11.6 and the p-value < 0.001. In other words, the outcome of the test is that there is indeed a significant relation between union membership (at household level) and political participation.