We wish to dynamically specify the predicate function(s) that will be applied to filter the rows of a data frame.
A predicate function is one that returns a logical value; TRUE
or FALSE
.
We will cover how to dynamically specify functions in each of the following scenarios:
is.na()
.<
, >
, or ==
.~ .x > 5
.c(is.na, is.infinite)
.We wish to dynamically specify a named function to be used to filter the rows of a data frame.
As Function Variable
We wish to use a function referred to via an environment variable to filter the rows of a data frame.
In this example, we wish to apply a function referred to via an environment variable fun
to the column col_1
to filter the rows of the data frame df
.
fun = is.na
df_2 = df %>% filter(fun(col_1))
Here is how this works:
fun
holds a reference to the function is.na()
.fun
just like we would use is.na()
. In other words, fun(col_1)
is equivalent to is.na(col_1)
where fun = is.na
.As Function Argument
We wish to pass a function as an argument to another function to be used (the first function) to filter the rows of a data frame.
In this example, we wish to filter the rows of the data frame df
via a custom function that takes a predicate function as input and applies it to the column col_1
.
m_filter <- function(.df, fun) {
.df %>%
filter(fun(col_1))
}
df_2 = df %>% m_filter(is.na)
Here is how this works:
m_filter()
which takes a data frame .df
and a function fun
as inputs and uses the function fun
to filter the rows of the data frame .df
.m_filter()
filters the rows of the data frame df
by applying the function passed in fun
to the values of the column col_1
.is.na()
to the function m_filter()
.As String Variable
We wish to specify the function to use for row filtering via an environment variable that holds the name of the function as a string.
In this example, we wish to apply a function whose name is available as a string variable fun
to the column col_1
to filter the rows of the data frame df
.
fun = 'is.na'
df = df %>%
filter(get(fun)(col_1))
Here is how this works;
get()
from base R returns the value of a named object.get(fun)
, we use get()
to obtain a reference to the function whose name is in the string variable fun which, in this case, returns the function is.na()
.get(fun)
just like we would use is.na()
. In other words, get(fun)(col_1)
is equivalent to is.na(col_1)
where fun = 'is.na'
.We wish to dynamically specify a comparison operator, e.g. >
, to be used to filter the rows of a data frame.
As Function Variable
We wish to use a comparison operator referred to via an environment variable to filter the rows of a data frame
In this example, we wish to use a comparison operator referred to via an environment variable fun
to filter the rows of the data frame df
by comparing the value of the column col_1
with the integer 0.
fun = `>`
df_2 = df %>%
filter(fun(col_1, 0))
Here is how this works:
<
, to an environment variable, we need to quote it with backticks ` like so
<``.<
is a function with two arguments. In fun(col_1, 0)
, we pass col_1
to the first argument and 0 to the second argument. In other words, fun(col_1, 0)
is equivalent to col_1 > 0
where fun =
>``.As Function Argument
We wish to pass an operator as an argument to another function to be used to filter the rows of a data frame.
In this example, we wish to filter the rows of the data frame df
via a custom function that takes a comparison operator as input and applies it to filter the rows of the data frame df
by comparing the value of the column col_1
with the integer 0.
m_filter <- function(.df, fun) {
.df %>%
filter(fun(col_1, 0))
}
df_2 = df %>% m_filter(`>`)
Here is how this works:
m_filter()
which takes a data frame .df
and a function fun
as inputs and uses the function fun
to filter the rows of the data frame .df
.m_filter(
>)
, we pass the operator greater than >
surrounded by backticks to the function m_filter()
.As String Variable
We wish to use an operator that is available as a string variable to filter the rows of a data frame
In this example, we wish to use a comparison operator given its string name, the variable fun
, to filter the rows of the data frame df
by comparing the value of the column col_1
with the integer 0.
fun = ">"
df_2 = df %>%
filter(get(fun)(col_1, 0))
Here is how this works:
>
can be obtained via get()
by given the string ">"
.get()
.We wish to dynamically specify an anonymous function in formula notation , e.g. ~ .x > 5
, to be used to filter the rows of a data frame.
As Function Variable
fun = ~ .x > 0
df_2 = df %>%
filter(map_lgl(col_1, fun))
Here is how this works:
fun = ~ .x > 0
, we express the condition that we wish to apply as a one-sided formula and assign it to the environment variable fun
.purrr
function map_lgl()
which accepts a one-sided formula as a representation for an anonymous function. See List Operations.map_lgl(col_1, fun)
we apply the formula to the values of the column col_1
and return the output as a vector of logical TRUE
or FALSE
values.As Functon Argument
m_filter <- function(.df, fun) {
.df %>%
filter(map_lgl(col_1, fun))
}
df_2 = df %>% m_filter(~ .x > 5)
Here is how this works:
m_filter()
which takes a data frame .df
and an anonymous function (one-sided formula) fun
as inputs and uses fun
to filter the rows of the data frame .df
.m_filter(~ .x > 5)
, we pass the anonymous function (one-sided formula) ~ .x > 5
to the function m_filter()
.As String Variable
fun = "~ .x > 5"
df_2 = df %>%
filter(map_lgl(col_1, formula(fun)))
Here is how this works:
fun = "~ .x > 5"
, we create an environment variable holding a one-sided formula as a string.formula(fun)
, we create a formula from a string using formula()
from base R.As Function Variable: Explicit Application
funs = c(is.na, is.infinite)
df_2 = df %>%
filter(funs[[1]](col_1),
funs[[2]](col_2))
Here is how this works:
funs
.[[ ]]
. The single bracket operator [ ]
would return a list of length 1.As Function Variable: Implicit Application
funs = c(is.na, is.infinite, ~ .x > 0)
df_2 = df %>%
filter(if_any(col_1, funs))
Here is how this works:
if_any()
.if_any(col_1, funs)
, if_any()
applies each function in funs
to the column col_1
then returns TRUE
if any of the functions return TRUE
. See Implicit Filtering for a detailed description.As Function Argument: Explicit Application
m_filter <- function(.df, funs) {
.df %>%
filter(funs[[1]](col_1),
funs[[2]](col_2))
}
df_2 = df %>%
m_filter(c(is.na, is.infinite))
Here is how this works:
See “As Function Argument” under “Named Function” above.
As Function Argument: Implicit Application
m_filter <- function(.df, funs) {
.df %>%
filter(if_all(col_1, funs))
}
df_2 = df %>%
m_filter(c(is.na, is.infinite))
Here is how this works:
See “As Function Variable: Implicit Application” above.
As String Variable: Explicit Application
funs = c('is.na', 'is.infinite')
df_2 = df %>%
filter(get(funs[[1]])(col_6),
get(funs[[2]])(col_5))
Here is how this works:
See “As String Variable” under “Named Function” above.
As String Variable: Implicit Application
funs = c('is.na', 'is.infinite')
df_2 = df %>%
filter(if_any(col_1, map(funs, get)))
Here is how this works:
map(funs, get)
, we use map()
to iterate over the vector of string function names and apply the function get()
to return the corresponding functions as a list.if_any(col_1, map(funs, get))
, we apply each of the functions in the list returned by map(funs, get)
to the column col_1
and return TRUE
if any of these functions returns TRUE.filter()
returns the rows for which if_any()
returns TRUE
.