Dynamic Condition Specification

We wish to specify the entire logical condition that we wish to use to filter the rows of a data frame dynamically (as opposed to specifying the columns and functions separately).

We will cover the following scenarios:

  1. Data Expression: We wish to pass a logical data expression, i.e. an expression defined in the context of a data frame in a dplyr chain e.g. col_1 > 0, as an argument to a function where the filtering will be carried out.
  2. String Expression: We wish to specify the logical condition(s) to use for filtering the rows of a data frame as a string variable e.g. ‘col_1 > 0’.

Data Expression

We wish to pass a logical data expression, i.e. an expression defined in the context of a data frame in a dplyr chain e.g. col_1 > 0, as an argument to a function where the filtering will be carried out.

One Condition to One Argument

m_filter <- function(.df, cond) {  
  .df %>%    
    filter({{ cond }})
}

df_2 = df %>% 
    m_filter(col_1 > 0)

Here is how this works:

  • We have a custom function m_filter() to which we pass the data frame to be filtered df via the pipe operator %>% and the logical expression to use for filtering as a data expression cond.
  • We use the embrace {{ }} operator to "tunnel" the data expression col_1 > 0 through the function argument cond.

N Conditions to N Arguments

m_filter <- function(.df, cond_1, cond_2) {  
  .df %>%    
    filter({{ cond_1 }}, {{ cond_2 }})
}

df_2 = df %>% 
    m_filter(col_1 > 0, col_2 < 0)

Here is how this works:

  • We have a custom function m_filter() to which we pass the data frame to be filtered df via the pipe operator %>% and two logical expressions to use for filtering as data expressions cond_1 and cond_2.
  • We use the embrace {{ }} operator to "tunnel" the data expressions col_1 > 0 and col_2 < 0 through the function arguments cond_1 and cond_2.

N Conditions to One Argument

m_filter <- function(.df, ...) {  
  .df %>%    
    filter(...)
}

df_2 = df %>% 
  m_filter(col_1 == 0, col_2 > 0, col_3 < 0)

Here is how this works:

  • We have a custom function m_filter() to which we pass the data frame to be filtered df via the pipe operator %>% and any number of logical conditions via the ellipsis operator.
  • We use the ellipsis **** operator to pass the data expressions passed to m_filter() through to filter() which carries out the actual filtering.

String Expression

We wish to specify the logical condition(s) to use for filtering the rows of a data frame as a string variable

One Condition

filter = "col_1 > 0"

df_2 = df %>% 
  filter(!!(rlang::parse_expr(filter)))

Here is how this works:

  • We use parse_expr() from the package rlang to transform a string into an R expression.
  • We use !! to evaluate and inject the results of a single argument.

Multiple Conditions

filters = c("col_1 > 0", 
            "col_2 < 3", 
            "col_3 > 6 | col_4 == 'Yes'")

df_2 = df %>% 
  filter(!!!(rlang::parse_exprs(filters)))

Here is how this works:

  • We use parse_exprs() from the package rlang to transform a string vector into a list of R expressions.
  • We use !!! evaluates and injects the results of a list of arguments.

Alternatively:

We can pass the components of a condition separately as opposed to passing a condition as a data expression or a string expression.

Pass Condition Components

vars = c("col_1", "col_2", "col_3")
comps = c(0, 3, 6)
funs = c(`<`, `>`, `==`)

df_2 = df %>%
  filter(
    funs[[1]](.data[[vars[[1]]]], comps[[1]]),
    funs[[2]](.data[[vars[[2]]]], comps[[2]]),
    funs[[2]](.data[[vars[[3]]]], comps[[3]])
  )

Here is how this works:

  • This solution works for simple logical conditions.
  • In this example, we wish to apply the conditions col_1 < 0, col_2 > 2, and col_3 == 0. To do so we specify three variables:
    • vars holding the names of the columns as a vector of strings.
    • comps holding the thresholds we wish to compare to as a vector of integers.
    • funs holding the comparison operators we wish to apply as a vector of functions.
  • Inside filter(), we construct the filtering conditions from their components:
    • In funs[[1]] we refer to the first function (which is the smaller than operation <) in the vector of functions funs and then pass to it the two inputs that we wish to compare (see Dynamic Function Specification).
    • In .data[[vars[[1]]]] we obtain the column of the data frame being processed whose name is the first string (which is col_1) in the vector of column names vars (see Dynamic Column Specification).
    • In comps[[1]] we refer to the first integer in the vector comps which is the integer we wish to compare the values of col_1 against.

Pass Anonymous Functions

vars = c("col_1", "col_2", "col_3", "col_4")
funs = c(~ .x > 0, ~ .x < 3, ~ .x > 6 | .y == 'Yes')

df_2 = df %>%    
    filter(map_lgl(.data[[vars[[1]]]], funs[[1]]), 
           map_lgl(.data[[vars[[2]]]], funs[[2]]),
           map2_lgl(.data[[vars[[3]]]], .data[[vars[[4]]]], funs[[3]]))

Here is how this works:

  • Another alternative is to use formulas to express the filtering conditions that we wish to apply then use purrr map functions (in this case map_lgl() and map2_lgl()) to evaluate those formulas as anonymous functions on the columns of interest.
  • This solution is quite flexible. We can represent any logical expression as a formula.
  • See Dynamic Function Specification for a description of evaluating formulas dynamically to filter the rows of a data frame.
R
I/O