10 Cheat sheets

10.1 Scoped verbs vs. purrr

It can be easy to get confused between purrr and scoped verbs. The following diagram illustrates which to use for different combinations of inputs and outputs. For example, use a scoped verb if you want to start and end with a tibble, but purrr if you want to start with a tibble and end up with a vector.

10.2 Suffixes

suffix use when
_all you want to apply the verb to all columns
_at you want to apply the verb to specified columns
_if you want to apply the verb to all the columns with some property

10.3 Examples

10.3.1 mutate(), summarize(), select(), and rename()

10.3.1.1 Named functions

Verb Example Example explanation
summarize_all summarize_all(mean) finds the mean of all variables
summarize_at summarize_at(vars(x, y), mean) finds the mean of variables x and y
summarize_if summarize_if(is.double, mean) finds the mean of all double variables
mutate_all mutate_all(as.character) converts all variables to characters
mutate_at mutate_at(vars(x, y), as.character) converts variables x and y to characters
mutate_if mutate_if(is.factor, as.character) converts all factor variables to characters
rename_all rename_all(str_to_lower) changes all column names to lowercase
rename_at rename_at(vars(X, Y), str_to_lower) changes the names of columns X and Y to x and y
rename_if rename_if(is.double, str_to_lower) changes the names of double columns to lowercase
select_all select_all(str_to_lower) selects all columns and changs their names to lowercase (better to use rename_all())
select_at select_at(vars(X, Y), str_to_lower) selects just columns X and Y and changes their names to x and y
select_if select_if(is.double, str_to_lower) selects just double columns and changes their names to lowercase

10.3.1.2 Extra arguments

verb example example_explanation
summarize_if summarize_if(is.double, mean, na.rm = TRUE) finds the mean, excluding NAs, of all double variables
summarize_all summarize_all(mean, trim = 0.1, na.rm = TRUE) finds the mean of all variables, exluding NAs. Removes the bottom and top 10% of values of each variable before computing mean

10.3.1.3 Anonymous functions

verb example example_explanation
summarize_all summarize_all(~ sum(is.na(.))) determines the number of NAs in each column
select_if select_if(~ n_distinct(.) > 1) selects only the columns with more than one distinct value

10.3.2 filter()

verb example example_explanation
filter_all filter_all(all_vars(!is.na(.)) finds rows without any NAs
filter_all filter_all(any_vars(!is.na(.)) finds rows with at least one non-NA value
filter_at filter_at(vars(x, y), all_vars(!is.na(.)) finds rows where both x and y are non-NA
filter_at filter_at(vars(x, y), any_vars(!is.na(.)) finds rows where at least one of x and y is non-NA
filter_if filter_if(is.double, all_vars(!Is.na(.)) finds rows where all double variables are non-NA
filter_if filter_if(is.double, any_vars(!Is.na(.)) finds rows where at least one double variable is non-NA