Exercise 3: dplyr

After working through Exercise 3, you’ll…

have assessed how well you know dplyr
know what dplyr functions and concepts you might want to repeat again
have managed to apply the dplyr concepts to data

Task 1

Below you will see multiple choice questions. Please try to identify the correct answers. 1, 2, 3 and 4 correct answers are possible for each question.

1. What are the main characteristics of tidy data?

Every cell contains values.
Every cell contains a variable.
Every observation is a column.
Every observation is a row.

2. What are dplyr functions?

summary()
describe()
mutate()
manage()

3. How can you sort the eye_color of Star Wars characters from Z to A?

starwars_data %>% arrange(desc(eye_color))
starwars_data %>% arrange(eye_color)
starwars_data %>% select(arrange(eye_color))
starwars_data %>% select(eye_color) %>% arrange(desc(eye_color))

4. Imagine you want to recode the height of these characters. You want to have three categories from small and medium to tall. What is a valid approach?

starwars_data %>% mutate(height = case_when(height<=150~"small",height<=190~"medium",height>190~"tall"))
starwars_data %>% mutate(height = case_when(height<=150~small,height<=190~medium,height>190~tall))
starwars_data %>% recode(height = case_when(height<=150~"small",height<=190~"medium",height>190~"tall"))
starwars_data %>% recode(height = case_when(height<=150~small,height<=190~medium,height>190~tall))

5. Imagine you want to provide a systematic overview over all hair colors and what species wear these hair colors frequently (not accounting for the skewed sampling of species)? What is a valid approach?

starwars_data %>% group_by(hair_color) %>% group_by(species) %>% summarize(count = n()) %>% arrange(hair_color)
starwars_data %>% group_by(hair_color, species) %>% summarize(count = n()) %>% arrange(hair_color)
starwars_data %>% group_by(hair_color & species) %>% summarize(count = n()) %>% arrange(hair_color)
starwars_data %>% group_by(hair_color + species) %>% summarize(count = n()) %>% arrange(hair_color)

Task 2

It’s your turn now. Load the starwars data like this:

library(dplyr) # to activate the dplyr package
starwars_data <- starwars # to assign the pre-installed starwars data set (dplyr) into a source object in our environment

How many humans are contained in the starwars dataset overall? First, solve this task using filter() only. Next, solve it by combining filter() with summarize(count = n()). Finally, try to solve it with count().

Task 3

Use mutate() and case_when() to create a new column height_group with: - “short” if height < 140 - “medium” if height < 180 - “tall” otherwise.

Task 4

How many humans are contained in starwars by height_group? (Hint: You’ll need to chain multiple functions. Remember that summarize() has a best friend that will help you solve this task. :))

Task 5

What is the most common height_group among Star Wars characters? (Hint: You’ll need to chain multiple functions, but one of them should be arrange())

Task 6

What is the average mass of Star Wars characters that are not human and have yellow eyes? (Hint: You’ll need to calculate descriptive statistics and remove all NAs.)

Task 7

Compare the mean, median, and standard deviation of mass for all humans and droids. (Hint: You’ll need to calculate descriptive statistics and remove all NAs.)

Task 8

Create a new variable in which you store the mass in gram (gr_mass). Add it to the dataframe. Test whether your solution works by printing your data to the console, but only show the name, species, mass, and your new variable gr_mass.

When you’re ready to look at the solutions, you can find them here: Solutions for Exercise 3.