We wish to pass the names of the columns to be selected dynamically.
We will cover the following:
We wish to specify the names of the columns to be selected as a variable in the environment.
In this example, we specify the names of the columns we wish to select as a variable cols_to_select
and then use that variable for column selection.
cols_to_select = c('col_1','col_2','col_3')
df_2 = df %>% select(all_of(cols_to_select))
Here is how this works:
cols_to_select
.cols_to_select
to select()
, we wrap it in the all_of()
function.all_of()
allows the selection of columns in the data frame df
whose name matches the elements in the cols_to_select
list.all_of()
is strict i.e. all strings in cols_to_select
must match a column name otherwise an error is thrown. To ignore list elements that do not match any of the data frame’s column names, see “Flexible Matching” below.We wish to pass the names of the columns to be selected as ab argument to a function. The actual column selection happens inside the function.
In this example, column selection happens inside the function pipeline()
which takes the names of the columns to be selected as an argument cols_to_select
.
pipeline <- function(df, cols_to_select) {
df %>%
select(all_of(cols_to_select))
}
df_2 = df %>%
pipeline(c('col_1','col_2','col_3'))
Here is how this works:
pipeline()
has two arguments: the data frame df
and the names of the columns to be selected cols_to_select
.pipeline()
via the pipe %>%
.cols_to_select
argument.all_of()
with select()
as described in the Environment Variables scenario described above.all_of()
is strict i.e. all strings in cols_to_select
must match a column name otherwise an error is thrown. To ignore list elements that do not match any of the data frame’s column names, see “Flexible Matching” below.We wish to select any columns whose names are in a list of strings where the list may have strings that do not match any column names. We wish to select columns with matching names and ignore the non matching strings (do not throw an error).
cols_to_select = c('col_1','col_2','col_3')
df_2 = df %>% select(any_of(cols_to_select))
Here is how this works:
any_of()
instead of all_of()
.any_of()
is forgiving in that it ignores any strings in the vector passed to it (here cols_to_select
) that do not match any column names.