Dynamic Filtering

In essence, filtering involves applying one or more logical conditions to the columns of a data frame and, if there is more than one condition, combining the results from each condition via some form of boolean logic.

In some situations we may need to specify the columns or the logical conditions dynamically i.e. through environment variables or function arguments. Two common scenarios are when we wish to build a reusable data manipulation function and when we wish to separate parameter specifications (e.g. conditions to use for filtering) from the data manipulation logic while structuring a script..

One of the most powerful features of the tidyverse is having “data variables” which give us the ability to refer to a data frame’s column names as if they were variables in the environment i.e. select(col_1) instead of select(df[’col_1’]). This power comes at the cost of making it more challenging to refer to column names indirectly e.g. via string vectors or as function arguments.

This section is organized as follows:

  • Column Specification where we cover how to dynamically specify the columns to which filtering logic would be applied. We will cover two important scenarios: (1) Passing column names as data variables from a tidyverse chain to a function. (2) Passing column names as string vectors.
  • Function Specification where we cover how to dynamically specify the filtering predicate function(s) (functions that return TRUE or FALSE) that would be applied to the specified columns. We will look at how to dynamically specify a named function, a formula (anonymous function), and multiple functions.
  • Condition Specification where we cover how to specify the entire logical condition that we wish to use to filter the rows of a data frame dynamically (as opposed to specifying the columns and functions separately). We will cover two scenarios: (1) Passing a condition specified in terms of data variables from a tidyverse chain to a function and (2) Specifying a condition as a string e.g. ‘col_1 > 0’, to filter the rows of a data frame.

If we are dealing with a scenario where both columns, and functions need to be specified dynamically, which is often the case in real life, we would need to combine the solutions in these three sections.

In addition to the above, the following sections complete the story for dynamic filtering:

  • Dynamic Transformation has a deeper coverage of performing data manipulation operations dynamically. The scenarios covered there can also be applied for filtering.
  • Dynamic Grouped Transformation has a coverage of performing grouped operations dynamically. The scenarios covered there can also be applied for dynamic filtering in a grouped context.
R
I/O