10 Tutorial: Good coding style & project management

After working through Tutorial 10, you’ll…

  • have received some tips about how to organize your R projects
  • know about how to write good code according to the tidyverse principles

Writing clean, understandable code is fundamental in programming. It’s not just for the machine to execute, but also for humans to read and comprehend. The Tidyverse style guide by Hadley Wickham offers a set of conventions that can help you achieve this. In this tutorial, we’ll go over some of these guidelines, and we’ll also discuss some packages that can enforce good coding style in R.

10.1 Naming conventions for variables

  1. Variable names: Use lowercase and separate words with underscores. Avoid dots in variable names, as they have a special meaning in R.
# Good
variable_name <- "example"

# Bad
VariableName <- "example"
variable.name <- "example"
  1. Long variable names: While it can be tempting to use short, cryptic variable names, it’s often a good idea to use longer, more descriptive names. This makes your code easier to understand and maintain. And you can always copy and paste variable names (and RStudio will provide the full name once you start typing the first letters anyway), so long names are no problem. Here’s a very simple example:
# Good
average_work_experience <- 10.3

# Bad
work_exp <- 10.3
  1. Function names: Use lowercase and separate words with underscores. Use verbs because functions are doing something!
# Good
calculate_mean <- function(x) {
  mean(x, na.rm = TRUE)
}

# Bad
calculateMean <- function(x) {
  mean(x, na.rm = TRUE)
}

# Bad
mean_calculation <- function(x) {
  mean(x, na.rm = TRUE)
}

10.2 Spacing and operators

  1. Insert spaces around all binary operators (=, +, -, <, >, etc.).
# Good
average <- sum(x) / length(x)

# Bad
average <- sum(x)/length(x)
  1. Never place a space before a comma, but always after a comma.
# Good
average <- mean(x, na.rm = TRUE)

# Bad
average <- mean(x ,na.rm = TRUE)

10.3 Commenting your code

Good commenting is a crucial aspect of writing clean, understandable code. Comments should provide clarity and context to your code. Here are some guidelines:

  1. Purpose of the code: Every nontrivial function should have a comment explaining what it does. The comment should explain what the function does, not how it does it.

  2. Assumptions and constraints: If your code makes certain assumptions (e.g., about the input data such as numbers must be inserted as strings and not as integers) or has certain constraints, it’s good to document them in comments.

  3. Do not over-comment: Avoid commenting on obvious things. Good code is self-documenting to some extent. If you’re finding that you need to add a lot of comments to explain what your code is doing, it may be a sign that you need to refactor your code to make it clearer. If you follow all of the above guidelines on how to write clean code, then you won’t need as many comments!

  4. Update comments as you update code: There’s nothing more confusing than comments that don’t match the code. If you change your code, make sure to update the relevant comments as well.

10.4 Don’t Repeat Yourself: DRY code

The DRY principle is a fundamental concept in software development. You should strive to avoid duplicating code, as it inflates your code and makes it harder to read. If you find yourself writing the same or very similar code in multiple places, consider creating a function, which can then be called from those places.

# Bad
x <- x - mean(x)
y <- y - mean(y)
z <- z - mean(z)

# Good
center <- function(x) {
  x - mean(x)
}

x <- center(x)
y <- center(y)
z <- center(z)

10.5 Using tidyverse functions

  1. The pipe operator %>% is a valuable tool for chaining multiple operations. Each step in the chain should be on a new line.

  2. Use mutate to add new variables that are functions of existing variables.

# Good
WoJ %>%
  mutate(work_experience_in_days = work_experience * 365)

# Bad
WoJ$work_experience_in_days <- WoJ$work_experience * 365

10.6 File and script organization

The organization of files and directories in your project, as well as the naming and organization of scripts, is a critical aspect of good coding practices. Here are some tips to get you started:

  1. Consistent project structure: Make sure your project has a consistent structure.
project/
│
├── R/
│   ├── functions.R
│   └── main_script.R
│
├── data/
│   ├── raw/
│   └── processed/
│
├── figures/
│   └── plot.png
│
├── doc/
│   ├── report.Rmd
│   └── report.html
│
└── project.Rproj
  1. Separate raw and processed data: Keep raw and processed data in separate folders.

  2. Use meaningful names: Use clear and descriptive names for your files and folders.

  3. Enumerate scripts: It’s often a good idea to separate data preparation (loading, cleaning, transforming) and data analysis (statistical analysis, plotting) into separate scripts. If there’s a specific order in which scripts should be run, consider enumerating them in your R/ folder. For example:

01_load_data.R
02_clean_data.R
03_analyze_data.R
04_plot_data.R
  1. Function scripts: If you have functions that are used in multiple scripts, consider putting them in their own script, like utilities.R or functions.R, and source this script when you need to use those functions, i.e. source(functions.R).

For an example of this file organization, see this OSF repository by Julian Unkel & Anna Kümpel: Link. For another example, look up this OSF repository by Niels G. Mede: Link.

10.7 Using R Projects

R Projects, via RStudio, offer a straightforward way to organize your work. An R Project is simply a working directory designated with a .Rproj file. When you open an R Project, RStudio sets the working directory to the project root directory, which is very helpful for file referencing and reproducibility.

To create an R Project in RStudio, go to File -> New Project. You can then choose to create a new directory for your project or to associate the project with an existing directory. After you’ve created your R Project, RStudio will create a .Rproj file in the project directory. This file contains various settings and preferences, which RStudio will restore each time you open the project.

Using R Projects brings several benefits:

  1. Reproducibility: Since the working directory is set to the project root, you can use relative file paths in your code, which makes it more likely to run on other machines.

  2. Organization: Each project is self-contained, making it easier to manage files, variables, packages, and more.

  3. Integration with version control systems: RStudio’s Projects integrate well with version control systems like Git, making it easier to track changes, collaborate with others, and sync your work across different machines.

  4. Multiple sessions: You can work on multiple projects in separate sessions, each with its own working directory and workspace.

  5. Restoring previous work: When you reopen a project, RStudio restores the working directory, the command history, and the previously opened scripts, helping you pick up right where you left off.

10.8 Function documentation with Roxygen2

When you’re writing functions, especially if you’re planning to share your code or use it in the future, it’s important to document what the function does, what inputs it expects, and what it returns. The Roxygen2 package in R makes this easy.

# Install the roxygen2 package
install.packages("roxygen2")

Roxygen2 uses specially formatted comments that start with #’ to generate documentation for your functions that is universally known by other R programmers. Here’s an example of how to use roxygen2 to document a function:

#' Calculate the mean of a vector
#'
#' This function calculates the mean of a vector, excluding NA values.
#'
#' @param x A numeric vector.
#' @return The mean of the vector, excluding NA values.
#' @examples
#' calculate_mean(c(1, 2, 3, NA))
calculate_mean <- function(x) {
  mean(x, na.rm = TRUE)
}

10.9 Automated code styling

While adhering to coding style guidelines is crucial, it can sometimes be challenging to manage manually. Thankfully, R has several packages to help with this:

10.9.1 styler

The styler package can automatically format your R code to adhere to the tidyverse style guide without changing the code’s behavior.

# Install the styler package
install.packages("styler")

# Load the styler package
library(styler)

# Style a single line of code
style_text("a=1+2")

# Style an entire script
style_file("path/to/your/script.R")

# Style an entire package
style_pkg("path/to/your/package")

10.9.2 lintr

The lintr package provides static code analysis for R, enforcing good coding style. It can be integrated with various text editors, including RStudio.

# Install the lintr package
install.packages("lintr")

# Load the lintr package
library(lintr)

# Lint a single file
lint("path/to/your/script.R")

# Lint an entire package
lint_package("path/to/your/package")

The lint() function returns a list of issues with your code, including stylistic issues, syntax errors, and potential bugs.