Dynamic Function Specification

We wish to dynamically specify the predicate function(s) that will be applied to filter the rows of a data frame.

A predicate function is one that returns a logical value; TRUE or FALSE.

We will cover how to dynamically specify functions in each of the following scenarios:

  • Named Function such as is.na().
  • Operation such as <, >, or ==.
  • Anonymous Function expressed in formula notation such as ~ .x > 5.
  • Multiple Functions specified as a vector or list such as c(is.na, is.infinite).

Named Function

We wish to dynamically specify a named function to be used to filter the rows of a data frame.

As Function Variable

We wish to use a function referred to via an environment variable to filter the rows of a data frame.

In this example, we wish to apply a function referred to via an environment variable fun to the column col_1 to filter the rows of the data frame df.

fun = is.na

df_2 = df %>% filter(fun(col_1))

Here is how this works:

  • The environment variable fun holds a reference to the function is.na().
  • We can use fun just like we would use is.na(). In other words, fun(col_1) is equivalent to is.na(col_1) where fun = is.na.

As Function Argument

We wish to pass a function as an argument to another function to be used (the first function) to filter the rows of a data frame.

In this example, we wish to filter the rows of the data frame df via a custom function that takes a predicate function as input and applies it to the column col_1.

m_filter <- function(.df, fun) { 
 .df %>%  
  filter(fun(col_1))
}

df_2 = df %>% m_filter(is.na)

Here is how this works:

  • We have a custom function m_filter() which takes a data frame .df and a function fun as inputs and uses the function fun to filter the rows of the data frame .df.
  • m_filter() filters the rows of the data frame df by applying the function passed in fun to the values of the column col_1.
  • We pass the function is.na() to the function m_filter().

As String Variable

We wish to specify the function to use for row filtering via an environment variable that holds the name of the function as a string.

In this example, we wish to apply a function whose name is available as a string variable fun to the column col_1 to filter the rows of the data frame df.

fun = 'is.na'

df = df %>%
 filter(get(fun)(col_1))

Here is how this works;

  • The function get() from base R returns the value of a named object.
  • In get(fun), we use get() to obtain a reference to the function whose name is in the string variable fun which, in this case, returns the function is.na() .
  • We can use get(fun) just like we would use is.na(). In other words, get(fun)(col_1) is equivalent to is.na(col_1) where fun = 'is.na'.

Operation

We wish to dynamically specify a comparison operator, e.g. >, to be used to filter the rows of a data frame.

As Function Variable

We wish to use a comparison operator referred to via an environment variable to filter the rows of a data frame

In this example, we wish to use a comparison operator referred to via an environment variable fun to filter the rows of the data frame df by comparing the value of the column col_1 with the integer 0.

fun = `>`

df_2 = df %>% 
    filter(fun(col_1, 0))

Here is how this works:

  • To assign an operator, such as <, to an environment variable, we need to quote it with backticks ` like so<``.
  • The comparison operator < is a function with two arguments. In fun(col_1, 0), we pass col_1 to the first argument and 0 to the second argument. In other words, fun(col_1, 0) is equivalent to col_1 > 0 where fun =>``.

As Function Argument

We wish to pass an operator as an argument to another function to be used to filter the rows of a data frame.

In this example, we wish to filter the rows of the data frame df via a custom function that takes a comparison operator as input and applies it to filter the rows of the data frame df by comparing the value of the column col_1 with the integer 0.

m_filter <- function(.df, fun) { 
 .df %>%  
  filter(fun(col_1, 0))
}

df_2 = df %>% m_filter(`>`)

Here is how this works:

  • We have a custom function m_filter() which takes a data frame .df and a function fun as inputs and uses the function fun to filter the rows of the data frame .df.
  • In m_filter(>), we pass the operator greater than > surrounded by backticks to the function m_filter().
  • The rest of the code works as described under the “As Function Variable” scenario right above.

As String Variable

We wish to use an operator that is available as a string variable to filter the rows of a data frame

In this example, we wish to use a comparison operator given its string name, the variable fun, to filter the rows of the data frame df by comparing the value of the column col_1 with the integer 0.

fun = ">"

df_2 = df %>%
 filter(get(fun)(col_1, 0))

Here is how this works:

  • A reference to the function form of the greater than operator > can be obtained via get() by given the string ">".
  • See “As Function Variable” for more on using the functional form of comparison operators.
  • See “As String Variable” under “Named Function” above for more on get().

Anonymous Function

We wish to dynamically specify an anonymous function in formula notation , e.g. ~ .x > 5, to be used to filter the rows of a data frame.

As Function Variable

fun = ~ .x > 0

df_2 = df %>%
 filter(map_lgl(col_1, fun))

Here is how this works:

  • In fun = ~ .x > 0, we express the condition that we wish to apply as a one-sided formula and assign it to the environment variable fun.
  • In order to execute the one-sided formula, we use the purrr function map_lgl() which accepts a one-sided formula as a representation for an anonymous function. See List Operations.
  • In map_lgl(col_1, fun) we apply the formula to the values of the column col_1 and return the output as a vector of logical TRUE or FALSE values.

As Functon Argument

m_filter <- function(.df, fun) { 
 .df %>%  
  filter(map_lgl(col_1, fun))
}

df_2 = df %>% m_filter(~ .x > 5)

Here is how this works:

  • We have a custom function m_filter() which takes a data frame .df and an anonymous function (one-sided formula) fun as inputs and uses fun to filter the rows of the data frame .df.
  • In m_filter(~ .x > 5), we pass the anonymous function (one-sided formula) ~ .x > 5 to the function m_filter().
  • The rest of the code works as described under the “As Function Variable” scenario right above.

As String Variable

fun = "~ .x > 5"

df_2 = df %>%
 filter(map_lgl(col_1, formula(fun)))

Here is how this works:

  • In fun = "~ .x > 5", we create an environment variable holding a one-sided formula as a string.
  • In formula(fun), we create a formula from a string using formula() from base R.
  • The rest of the code works as described under the “As Function Variable” scenario above.

Multiple Functions

As Function Variable: Explicit Application

funs = c(is.na, is.infinite)

df_2 = df %>% 
 filter(funs[[1]](col_1), 
     funs[[2]](col_2))

Here is how this works:

  • We define a vector of functions funs.
  • To extract a function from the vector we use the double bracket operator [[ ]]. The single bracket operator [ ] would return a list of length 1.
  • The rest of the code works as described in “As Function Variable” under “Named Function” above.

As Function Variable: Implicit Application

funs = c(is.na, is.infinite, ~ .x > 0)

df_2 = df %>% 
 filter(if_any(col_1, funs))

Here is how this works:

  • When we wish to apply multiple filtering expressions to the same column(s), a succinct approach may be to use if_any().
  • In if_any(col_1, funs), if_any() applies each function in funs to the column col_1 then returns TRUE if any of the functions return TRUE. See Implicit Filtering for a detailed description.

As Function Argument: Explicit Application

m_filter <- function(.df, funs) { 
 .df %>%  
  filter(funs[[1]](col_1), 
      funs[[2]](col_2))
}

df_2 = df %>% 
 m_filter(c(is.na, is.infinite))

Here is how this works:

See “As Function Argument” under “Named Function” above.

As Function Argument: Implicit Application

m_filter <- function(.df, funs) { 
 .df %>%  
  filter(if_all(col_1, funs))
}

df_2 = df %>% 
 m_filter(c(is.na, is.infinite))

Here is how this works:

See “As Function Variable: Implicit Application” above.

As String Variable: Explicit Application

funs = c('is.na', 'is.infinite')

df_2 = df %>% 
 filter(get(funs[[1]])(col_6),
     get(funs[[2]])(col_5))

Here is how this works:

See “As String Variable” under “Named Function” above.

As String Variable: Implicit Application

funs = c('is.na', 'is.infinite')

df_2 = df %>% 
 filter(if_any(col_1, map(funs, get)))

Here is how this works:

  • In map(funs, get), we use map() to iterate over the vector of string function names and apply the function get() to return the corresponding functions as a list.
  • In if_any(col_1, map(funs, get)), we apply each of the functions in the list returned by map(funs, get) to the column col_1 and return TRUE if any of these functions returns TRUE.
  • Finally, filter() returns the rows for which if_any() returns TRUE.
R
I/O