In essence, filtering involves applying one or more logical conditions to the columns of a data frame and, if there is more than one condition, combining the results from each condition via some form of boolean logic.
In some situations we may need to specify the columns or the logical conditions dynamically i.e. through environment variables or function arguments. Two common scenarios are when we wish to build a reusable data manipulation function and when we wish to separate parameter specifications (e.g. conditions to use for filtering) from the data manipulation logic while structuring a script..
One of the most powerful features of the tidyverse
is having “data variables” which give us the ability to refer to a data frame’s column names as if they were variables in the environment i.e. select(col_1)
instead of select(df[’col_1’])
. This power comes at the cost of making it more challenging to refer to column names indirectly e.g. via string vectors or as function arguments.
This section is organized as follows:
tidyverse
chain to a function. (2) Passing column names as string vectors.TRUE
or FALSE
) that would be applied to the specified columns. We will look at how to dynamically specify a named function, a formula (anonymous function), and multiple functions.tidyverse
chain to a function and (2) Specifying a condition as a string e.g. ‘col_1 > 0’
, to filter the rows of a data frame.If we are dealing with a scenario where both columns, and functions need to be specified dynamically, which is often the case in real life, we would need to combine the solutions in these three sections.
In addition to the above, the following sections complete the story for dynamic filtering: