Dynamic Column Selection

There are some scenarios where we need to specify the columns to select or the column selection function dynamically i.e. through a variable or function arguments. Two common scenarios are when we wish to build a reusable data manipulation function and when we wish to separate parameter specifications (e.g. column names to select) from the data manipulation logic.

One of the most powerful features of the tidyverse is having “data variables” which give us the ability to refer to a data frame’s column names as if they were variables in the environment i.e. select(col_1) instead of select(df[’col_1’]). This power comes at the cost of making it more challenging to refer to column names indirectly e.g. via string vectors or as function arguments.

In this section we cover the following dynamic column selection scenarios:

  • Column Specification where we specify column names dynamically as an environment variable or a function argument.
  • Function Specification where we specify column selection predicate functions dynamically as an environment variable or a function argument or when the function name is passed as a string.
R
I/O