Exercise 3: dplyr

After working through Exercise 3, you’ll…

have assessed how well you know dplyr
know what dplyr functions and concepts you might want to repeat again
have managed to apply the dplyr concepts to data

Task 1

Below you will see multiple choice questions. Please try to identify the correct answers. 1, 2, 3 and 4 correct answers are possible for each question.

1. What are the main characteristics of tidy data?

Every cell contains values.
Every cell contains a variable.
Every observation is a column.
Every observation is a row.

2. What are dplyr functions?

summary()
describe()
mutate()
manage()

3. How can you sort the eye_color of Star Wars characters from Z to A?

starwars_data %>% arrange(desc(eye_color))
starwars_data %>% arrange(eye_color)
starwars_data %>% select(arrange(eye_color))
starwars_data %>% select(eye_color) %>% arrange(desc(eye_color))

4. Imagine you want to recode the height of these characters. You want to have three categories from small and medium to tall. What is a valid approach?

starwars_data %>% mutate(height = case_when(height<=150~"small",height<=190~"medium",height>190~"tall"))
starwars_data %>% mutate(height = case_when(height<=150~small,height<=190~medium,height>190~tall))
starwars_data %>% recode(height = case_when(height<=150~"small",height<=190~"medium",height>190~"tall"))
starwars_data %>% recode(height = case_when(height<=150~small,height<=190~medium,height>190~tall))

5. Imagine you want to provide a systematic overview over all hair colors and what species wear these hair colors frequently (not accounting for the skewed sampling of species)? What is a valid approach?

starwars_data %>% group_by(hair_color) %>% group_by(species) %>% summarize(count = n()) %>% arrange(hair_color)
starwars_data %>% group_by(hair_color, species) %>% summarize(count = n()) %>% arrange(hair_color)
starwars_data %>% group_by(hair_color & species) %>% summarize(count = n()) %>% arrange(hair_color)
starwars_data %>% group_by(hair_color + species) %>% summarize(count = n()) %>% arrange(hair_color)

Task 2

It’s your turn now. Load the starwars data like this:

library(dplyr) # to activate the dplyr package
starwars_data <- starwars # to assign the pre-installed starwars data set (dplyr) into a source object in our environment

Filter the dataset to show only characters with a mass greater than 100kg in the console. Do not save these filtered data back into the starwars_data.

Task 3

Arrange the characters by their mass in ascending order.

Task 4

Select only the name, species, mass, and homeworld columns from the dataset.

Task 5

Create a new variable named weight_gram in the dataset that converts the mass of each character from kilograms to grams.

Task 6

Calculate the average mass of the characters in the dataset.

Task 7

Let’s move to difficult tasks.

Determine the total number of human characters in the Star Wars dataset. (Hint: use summarize(count = n()) or count())

Task 8

Break down the number of human characters in the Star Wars dataset by gender.

Task 9

What is the most prevalent eye_color among Star Wars characters? (Hint: use arrange())

Task 10

What is the average mass of Star Wars characters that are not human and have yellow eyes? (Hint: remove all NAs)

Task 11

Compare the mean, median, and standard deviation of mass between human and droid characters. (Hint: remove all NAs)

When you’re ready to look at the solutions, you can find them here: Solutions for Exercise 3.