How to improve your R code ?

Tips 1. Split your code into different files

The first thing to keep a nice clean code is to avoid coding files with thousand and thousand of lines of code. The easy fix is to split your code according to the following rule: "One task, one file". For example: one file for importing the packages, one file for importing the data, one file for cleaning the data, one file for the descriptive statistics, etc.

Some basic rules to organize all those codes:

  1. Each file start with a number following its running order;

  2. File names should in lower case and with dashes;

  3. Each file should have a readable name explaining its purpose such as 'import_dataset', 'clean_dataset';

  4. File names should be preceded by numbers giving their running order. For example, if we have two files '01_import_dataset', '02_clean_dataset' then the 01 should be run before the 02 during the analysis;

Tips 2. Use the tidyverse package

Instead of base R, use the tidyverse package. The core tidyverse includes the packages that you’re likely to use in everyday data analyses and improve the readability of your code. More information on the tidyverse website:

https://www.tidyverse.org/packages/

Cheatsheets are available at: https://www.rstudio.com/resources/cheatsheets/

Interesting links to learn more about data manipulation with dplyr:

Tips 3. Use the pipe

Using the pipe is a great way to improve the readability of your code by creating some sort of a 'flow' for each step. To learn more, you can read this tuto:

https://www.datacamp.com/community/tutorials/pipe-r-tutorial

Tips 4. Jump rows after parentheses, commas, and pipe

This one is pretty straightforward. For example, when calling an option, jump rows between the commas:

Don't do that:
mtcars %>%
.$mpg %>%
mean(trim = 0,na.rm = FALSE)

Instead do that:
mtcars %>%
.
$mpg %>%
mean(trim = 0,
na.rm = FALSE)

Tips 5. Write functions

Programers philosophy is to be lazy and you should adopt this ethos when coding: DO NOT REPEAT THE SAME PIECE OF CODE TWICE, use functions instead.

A complete starting tuto can be found here:

https://www.datacamp.com/community/tutorials/functions-in-r-a-tutorial

Tips 6. Use standardized variable name

One easy way to gain time when coding is to use a standardized approach for naming your variables. I personally always name my variables in lower case with dashese: "income_households", "study_level". In any case, avoid as much as possible spaces (you can format the name at the export stage) and accents.

I also make sure to transform the initial variable name in this format when importing the dataset. Assuming you use the Tidyverse package for handling you datasets, one approach to do that is to use the following code:

library(tidyverse)
library(snakecase)

snake_n_unfrench <- function(x) {
x %>%
to_snake_case() %>%
iconv(from = "UTF-8", to = "ASCII//TRANSLIT")
}

Indometh %>%
rename_all(snake_n_unfrench)

The snake_n_unfrench function will automatically modify upper cases into lower cases and modify all accents, dots, dashs, apostrophes, etc. into '_'.