12 Exercise 4: Test your knowledge

After working through Exercise 4, you’ll…

  • have practiced to run simple and multiple linear regression models in R
  • know how to use coefficient standardization to compare effect sizes
  • know how to interpret the results of these models

12.1 Task 1

Let’s use the data set glbwarm again, which you should know well by now. Install / activate the processR package and assign the glbwarm data to a source object.

# installing/loading the package:
if(!require(processR)) {
  install.packages("processR"); 
  require(processR)
} #load / install+load processR

data <- processR::glbwarm

In this task, we want to tackle simple linear regression. More specifically, we want to predict the ideology of our respondents by their age because we assume that older respondents will hold more conservative viewpoints. The higher the values of the ideology variable, the more conservative the respondents are (coded from 1 ‘very liberal’ to 7 ‘very conservative’).

Research question: Do older U.S. Americans hold more conservative viewpoints than younger U.S. Americans?

To answer this question, prepare a visual inspection of this relationship without fitting a regression line. Can you recognize a relationship? What is its nature?

12.2 Task 2

Next, try to quantify the association using Pearson’s r. Interpret the result.

12.3 Task 3

Using your graph from Task 1, fit a regression line to your data points (Hint: You will need to load the ggpubr package). Interpret the parameters of the regression line.

12.4 Task 4

Run a linear model in R using the ideology and age variables. Interpret the results.

12.5 Task 5

Since age alone does not seem to be a good predictor of conservatism, we want to introduce other predictors into the model and run a Multiple Linear Regression. This means that we will predict the effect of age on conservatism while controlling for the effect of third variables. For example, the respondents’ gender (sex, 0 = female, 1 = male) and their party preference (partyid, 1 = Democrat, 2 = Independent, 3 = Republican) might be great predictors of conservatism.

Note: In a linear regression model, we can only include metric variables and variables that are binary coded (0/1). However, partyid is a categorical, i.e. factor variable, since Democrats are coded 1, Independents 2, and Republicans 3. Therefore, you need to mutate partyid and create two new binary variables democrat (0/1) and republican (0/1), where 1 indicates that the respondent identifies with that political party. (You don’t need to create a variable independent, since that information would be redundant: someone who has a value of 0 for both republican AND democrat MUST be an independent, so you can derive party preference with just two variables).

Then, run a multiple linear model that predicts ideology by sex, democrat, republican, and age. Interpret the results and the meaning of the age coefficient.

12.6 Task 6

Standardize all relevant variables and run the model again (note that binary variables shouldn’t be standardized). How does the interpretation of the age coefficient change?