Exercise 3: dplyr
After working through Exercise 3, you’ll…
- have assessed how well you know
dplyr - know what
dplyrfunctions and concepts you might want to repeat again - have managed to apply the
dplyrconcepts to data
Task 1
Below you will see multiple choice questions. Please try to identify the correct answers. 1, 2, 3 and 4 correct answers are possible for each question.
1. What are the main characteristics of tidy data?
- Every cell contains values.
- Every cell contains a variable.
- Every observation is a column.
- Every observation is a row.
2. What are dplyr functions?
summary()describe()mutate()manage()
3. How can you sort the eye_color of Star Wars characters from Z to A?
starwars_data %>% arrange(desc(eye_color))starwars_data %>% arrange(eye_color)starwars_data %>% select(arrange(eye_color))starwars_data %>% select(eye_color) %>% arrange(desc(eye_color))
4. Imagine you want to recode the height of these characters. You want to have three categories from small and medium to tall. What is a valid approach?
starwars_data %>% mutate(height = case_when(height<=150~"small",height<=190~"medium",height>190~"tall"))starwars_data %>% mutate(height = case_when(height<=150~small,height<=190~medium,height>190~tall))starwars_data %>% recode(height = case_when(height<=150~"small",height<=190~"medium",height>190~"tall"))starwars_data %>% recode(height = case_when(height<=150~small,height<=190~medium,height>190~tall))
5. Imagine you want to provide a systematic overview over all hair colors and what species wear these hair colors frequently (not accounting for the skewed sampling of species)? What is a valid approach?
starwars_data %>% group_by(hair_color) %>% group_by(species) %>% summarize(count = n()) %>% arrange(hair_color)starwars_data %>% group_by(hair_color, species) %>% summarize(count = n()) %>% arrange(hair_color)starwars_data %>% group_by(hair_color & species) %>% summarize(count = n()) %>% arrange(hair_color)starwars_data %>% group_by(hair_color + species) %>% summarize(count = n()) %>% arrange(hair_color)
Task 2
It’s your turn now. Load the starwars data like this:
library(dplyr) # to activate the dplyr package
starwars_data <- starwars # to assign the pre-installed starwars data set (dplyr) into a source object in our environmentHow many humans are contained in the starwars dataset overall?
First, solve this task using filter() only.
Next, solve it by combining filter() with summarize(count = n()).
Finally, try to solve it with count().
Task 3
Use mutate() and case_when() to create a new column height_group with:
- “short” if height < 140
- “medium” if height < 180
- “tall” otherwise.
Task 4
How many humans are contained in starwars by height_group? (Hint: You’ll need to chain multiple functions. Remember that summarize() has a best friend that will help you solve this task. :))
Task 5
What is the most common height_group among Star Wars characters? (Hint: You’ll need to chain multiple functions, but one of them should be arrange())
Task 6
What is the average mass of Star Wars characters that are not human and
have yellow eyes? (Hint: You’ll need to calculate descriptive statistics and remove all NAs.)
Task 7
Compare the mean, median, and standard deviation of mass for all humans
and droids. (Hint: You’ll need to calculate descriptive statistics and remove all NAs.)
Task 8
Create a new variable in which you store the mass in gram (gr_mass). Add it to the dataframe. Test whether your solution works by printing your data to the console, but only show the name, species, mass, and your new variable gr_mass.
When you’re ready to look at the solutions, you can find them here: Solutions for Exercise 3.