Exercise 3: dplyr

After working through Exercise 3, you’ll…

  • have assessed how well you know dplyr
  • know what dplyr functions and concepts you might want to repeat again
  • have managed to apply the dplyr concepts to data

Task 1

Below you will see multiple choice questions. Please try to identify the correct answers. 1, 2, 3 and 4 correct answers are possible for each question.

1. What are the main characteristics of tidy data?

  • Every cell contains values.
  • Every cell contains a variable.
  • Every observation is a column.
  • Every observation is a row.

2. What are dplyr functions?

  • summary()
  • describe()
  • mutate()
  • manage()

3. How can you sort the eye_color of Star Wars characters from Z to A?

  • starwars_data %>% arrange(desc(eye_color))
  • starwars_data %>% arrange(eye_color)
  • starwars_data %>% select(arrange(eye_color))
  • starwars_data %>% select(eye_color) %>% arrange(desc(eye_color))

4. Imagine you want to recode the height of these characters. You want to have three categories from small and medium to tall. What is a valid approach?

  • starwars_data %>% mutate(height = case_when(height<=150~"small",height<=190~"medium",height>190~"tall"))
  • starwars_data %>% mutate(height = case_when(height<=150~small,height<=190~medium,height>190~tall))
  • starwars_data %>% recode(height = case_when(height<=150~"small",height<=190~"medium",height>190~"tall"))
  • starwars_data %>% recode(height = case_when(height<=150~small,height<=190~medium,height>190~tall))

5. Imagine you want to provide a systematic overview over all hair colors and what species wear these hair colors frequently (not accounting for the skewed sampling of species)? What is a valid approach?

  • starwars_data %>% group_by(hair_color) %>% group_by(species) %>% summarize(count = n()) %>% arrange(hair_color)
  • starwars_data %>% group_by(hair_color, species) %>% summarize(count = n()) %>% arrange(hair_color)
  • starwars_data %>% group_by(hair_color & species) %>% summarize(count = n()) %>% arrange(hair_color)
  • starwars_data %>% group_by(hair_color + species) %>% summarize(count = n()) %>% arrange(hair_color)

Task 2

It’s your turn now. Load the starwars data like this:

library(dplyr) # to activate the dplyr package
starwars_data <- starwars # to assign the pre-installed starwars data set (dplyr) into a source object in our environment

Filter the dataset to show only characters with a mass greater than 100kg in the console. Do not save these filtered data back into the starwars_data.

Task 3

Arrange the characters by their mass in ascending order.

Task 4

Select only the name, species, mass, and homeworld columns from the dataset.

Task 5

Create a new variable named weight_gram in the dataset that converts the mass of each character from kilograms to grams.

Task 6

Calculate the average mass of the characters in the dataset.

Task 7

Let’s move to difficult tasks.

Determine the total number of human characters in the Star Wars dataset. (Hint: use summarize(count = n()) or count())

Task 8

Break down the number of human characters in the Star Wars dataset by gender.

Task 9

What is the most prevalent eye_color among Star Wars characters? (Hint: use arrange())

Task 10

What is the average mass of Star Wars characters that are not human and have yellow eyes? (Hint: remove all NAs)

Task 11

Compare the mean, median, and standard deviation of mass between human and droid characters. (Hint: remove all NAs)

When you’re ready to look at the solutions, you can find them here: Solutions for Exercise 3.