We wish to specify the entire logical condition that we wish to use to filter the rows of a data frame dynamically (as opposed to specifying the columns and functions separately).
We will cover the following scenarios:
col_1 > 0
, as an argument to a function where the filtering will be carried out.‘col_1 > 0’
.We wish to pass a logical data expression, i.e. an expression defined in the context of a data frame in a dplyr chain e.g. col_1 > 0
, as an argument to a function where the filtering will be carried out.
One Condition to One Argument
m_filter <- function(.df, cond) {
.df %>%
filter({{ cond }})
}
df_2 = df %>%
m_filter(col_1 > 0)
Here is how this works:
m_filter()
to which we pass the data frame to be filtered df
via the pipe operator %>%
and the logical expression to use for filtering as a data expression cond
.{{ }}
operator to "tunnel" the data expression col_1 > 0
through the function argument cond
.N Conditions to N Arguments
m_filter <- function(.df, cond_1, cond_2) {
.df %>%
filter({{ cond_1 }}, {{ cond_2 }})
}
df_2 = df %>%
m_filter(col_1 > 0, col_2 < 0)
Here is how this works:
m_filter()
to which we pass the data frame to be filtered df
via the pipe operator %>%
and two logical expressions to use for filtering as data expressions cond_1
and cond_2
.{{ }}
operator to "tunnel" the data expressions col_1 > 0
and col_2 < 0
through the function arguments cond_1
and cond_2
.N Conditions to One Argument
m_filter <- function(.df, ...) {
.df %>%
filter(...)
}
df_2 = df %>%
m_filter(col_1 == 0, col_2 > 0, col_3 < 0)
Here is how this works:
m_filter()
to which we pass the data frame to be filtered df
via the pipe operator %>%
and any number of logical conditions via the ellipsis …
operator.…
operator to pass the data expressions passed to m_filter()
through to filter()
which carries out the actual filtering.We wish to specify the logical condition(s) to use for filtering the rows of a data frame as a string variable
One Condition
filter = "col_1 > 0"
df_2 = df %>%
filter(!!(rlang::parse_expr(filter)))
Here is how this works:
parse_expr()
from the package rlang
to transform a string into an R expression.!!
to evaluate and inject the results of a single argument.Multiple Conditions
filters = c("col_1 > 0",
"col_2 < 3",
"col_3 > 6 | col_4 == 'Yes'")
df_2 = df %>%
filter(!!!(rlang::parse_exprs(filters)))
Here is how this works:
parse_exprs()
from the package rlang
to transform a string vector into a list of R expressions.!!!
evaluates and injects the results of a list of arguments.Alternatively:
We can pass the components of a condition separately as opposed to passing a condition as a data expression or a string expression.
Pass Condition Components
vars = c("col_1", "col_2", "col_3")
comps = c(0, 3, 6)
funs = c(`<`, `>`, `==`)
df_2 = df %>%
filter(
funs[[1]](.data[[vars[[1]]]], comps[[1]]),
funs[[2]](.data[[vars[[2]]]], comps[[2]]),
funs[[2]](.data[[vars[[3]]]], comps[[3]])
)
Here is how this works:
col_1 < 0
, col_2 > 2
, and col_3 == 0
. To do so we specify three variables:vars
holding the names of the columns as a vector of strings.comps
holding the thresholds we wish to compare to as a vector of integers.funs
holding the comparison operators we wish to apply as a vector of functions.filter()
, we construct the filtering conditions from their components:funs[[1]]
we refer to the first function (which is the smaller than operation <
) in the vector of functions funs
and then pass to it the two inputs that we wish to compare (see Dynamic Function Specification)..data[[vars[[1]]]]
we obtain the column of the data frame being processed whose name is the first string (which is col_1
) in the vector of column names vars
(see Dynamic Column Specification).comps[[1]]
we refer to the first integer in the vector comps
which is the integer we wish to compare the values of col_1
against.Pass Anonymous Functions
vars = c("col_1", "col_2", "col_3", "col_4")
funs = c(~ .x > 0, ~ .x < 3, ~ .x > 6 | .y == 'Yes')
df_2 = df %>%
filter(map_lgl(.data[[vars[[1]]]], funs[[1]]),
map_lgl(.data[[vars[[2]]]], funs[[2]]),
map2_lgl(.data[[vars[[3]]]], .data[[vars[[4]]]], funs[[3]]))
Here is how this works:
purrr
map functions (in this case map_lgl()
and map2_lgl()
) to evaluate those formulas as anonymous functions on the columns of interest.