We wish to specify the function to use to select columns dynamically.
In Implicit Selection, we covered how to use a function that evaluates a column and returns a boolean TRUE
or FALSE
value indicating whether a column should be selected or not (aka a predicate function). In this Section we cover how to pass such a function dynamically.
We will cover three scenarios as following:
In this section we use an example column selection function that returns TRUE
if the proportion of missing values in a column is less than 10% and FALSE
otherwise.
We wish to pass the function to use to select columns as an argument to another function where the actual column selection takes place.
col_select_fun <- function(col) {
col %>% is.na() %>% mean() < 0.1
}
pipeline <- function(df, fun) {
df %>%
select(where(fun))
}
df_2 = df %>%
pipeline(col_select_fun)
Here is how this works:
pipeline()
. In real scenarios this would usually be a more elaborate chain of data transformations.pipeline()
has two arguments: the data frame df
and the column selection function fun
.pipeline()
via the pipe %>%
.col_select_fun()
as the second argument.col_select_fun()
, we simply pass the argument fun
to where()
inside of select()
just like we would pass the name of the function to carry out Implicit Selection.We wish to specify the function to use for column selection via an environment variable that holds a reference to the function.
col_select_fun <- function(col) {
col %>% is.na() %>% mean() < 0.1
}
fun = col_select_fun
df_2 = df %>%
select(where(fun))
Here is how this works:
fun
holds a reference to the function col_select_fun()
.fun
to where()
inside of select()
just like we would pass the name of the function to carry out Implicit Selection.We wish to specify the function to use for column selection via an environment variable that holds the name of the function as a string.
col_select_fun <- function(col) {
col %>% is.na() %>% mean() < 0.1
}
fun = 'col_select_fun'
df_2 = df %>%
select(where(get(fun)))
Here is how this works:
fun
holds the name of the function col_select_fun()
as a string 'col_select_fun'
.get()
function which returns a callable function (from the currently active namespace) given its string name.