Dynamic Function Specification

We wish to specify the function to use to select columns dynamically.

In Implicit Selection, we covered how to use a function that evaluates a column and returns a boolean TRUE or FALSE value indicating whether a column should be selected or not (aka a predicate function). In this Section we cover how to pass such a function dynamically.

We will cover three scenarios as following:

  1. As Function Argument where selection happens inside a function A to which we pass a column selection function B as an argument.
  2. As Reference Variable where the function to use for column selection is specified as an environment variable holding a reference to the function.
  3. As String Variable where the name of the function to use for column selection is specified as a string (stored in an environment variable).

In this section we use an example column selection function that returns TRUE if the proportion of missing values in a column is less than 10% and FALSE otherwise.

As Function Argument

We wish to pass the function to use to select columns as an argument to another function where the actual column selection takes place.

col_select_fun <- function(col) {
  col %>% is.na() %>% mean() < 0.1
}

pipeline <- function(df, fun) {
  df %>%
    select(where(fun))
}

df_2 = df %>%
  pipeline(col_select_fun)

Here is how this works:

  • Column selection happens inside the custom function pipeline(). In real scenarios this would usually be a more elaborate chain of data transformations.
  • The function pipeline() has two arguments: the data frame df and the column selection function fun.
  • We pass the data frame to the first argument of the function pipeline() via the pipe %>%.
  • We pass the column selection function col_select_fun() as the second argument.
  • Inside col_select_fun(), we simply pass the argument fun to where() inside of select() just like we would pass the name of the function to carry out Implicit Selection.

As Reference Variable

We wish to specify the function to use for column selection via an environment variable that holds a reference to the function.

col_select_fun <- function(col) {
  col %>% is.na() %>% mean() < 0.1
}

fun = col_select_fun

df_2 = df %>%
  select(where(fun))

Here is how this works:

  • The environment variable fun holds a reference to the function col_select_fun().
  • We simply pass the variable fun to where() inside of select() just like we would pass the name of the function to carry out Implicit Selection.

As String Variable

We wish to specify the function to use for column selection via an environment variable that holds the name of the function as a string.

col_select_fun <- function(col) {
  col %>% is.na() %>% mean() < 0.1
}

fun = 'col_select_fun'

df_2 = df %>%
  select(where(get(fun)))

Here is how this works:

  • The environment variable fun holds the name of the function col_select_fun() as a string 'col_select_fun'.
  • We use R's get() function which returns a callable function (from the currently active namespace) given its string name.
R
I/O