# Assignment 3-2

In this assignment, we’re required to report on a regression analysis with just one explanatory variable. If we’re using a categorical variable as explanatory variable, we’re to recode it to two categories with values 0 and 1. I’m using the Outlook On Life surveys dataset and I’ll look into the association between age and interview duration. My hypothesis is that duration will be shorter for younger respondents because they require less explanation.

Note that the mean duration of interviews is 245 minutes or over 4 hours, which is pretty long.

The python code for the analysis is available here. Here’s a frequency table for the recoded explanatory variable age_cats, where 0 = 18-44 and 1 = 45+:

1 1048

0 553

Name: age_cats, dtype: int64

And here’s the output of the regression analysis:

OLS regression model for the association between age and interview duration

OLS Regression Results

===

Dep. Variable: duration R-squared: 0.005

Model: OLS Adj. R-squared: 0.004

Method: Least Squares F-statistic: 7.237

Date: Sun, 22 May 2016 Prob (F-statistic): 0.00721

Time: 09:15:58 Log-Likelihood: -13746.

No. Observations: 1601 AIC: 2.750e+04

Df Residuals: 1599 BIC: 2.751e+04

Df Model: 1

Covariance Type: nonrobust

===

coef std err t P>|t| [95.0% Conf. Int.]

---

Intercept 365.2966 55.153 6.623 0.000 257.117 473.477

age_cats -183.3910 68.169 -2.690 0.007 -317.100 -49.682

===

Omnibus: 2283.566 Durbin-Watson: 2.023

Prob(Omnibus): 0.000 Jarque-Bera (JB): 479121.259

Skew: 8.367 Prob(JB): 0.00

Kurtosis: 86.080 Cond. No. 3.16

===

Warnings:

[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

First of all, there is a statistically significant association between age and interview duration (`p ). However, the sign for the beta coefficient is negative, which means that interview duration is *shorter* for older respondents. My hypothesis is wrong. The beta coefficient is -183, which means that interviews for the older age category were on average over 3 hrs shorter than for the younger category. The intercept is 365 which implies that mean duration for younger respondents was about twice as long as for older respondents. This is quite a large difference that deserves further analysis.`