We wish to specify one or more logical expression(s) or predicate function(s) (functions that return TRUE
or FALSE
) to apply to each of the selected columns in an implicit filtering context.
In this section, we cover the following function specification scenarios:
~
.This section is complemented by
We wish to filter rows of a data frame by applying a named predicate function to each of a selected set of columns and then taking a logical combination of the results.
In this example, we wish to filter the rows of the data frame df
for which the value of any column whose name contains the string ‘cvr’
is missing (NA
).
df_2 = df %>%
filter(if_any(contains('cvr'), is.na))
Here is how this works:
df
to the function filter()
.filter()
we use if_any()
to specify our implicit filtering logic as followsif_any()
is a selection of columns. In this example, we use contains('cvr')
to select any column whose name contains the substring ‘cvr’
.if_any()
is the logical expression that we wish to apply to every column selected in the first argument. In this case the logical expression we wish to apply is the named function is.na()
which returns TRUE
if a value is NA
.is.na
(the second argument to if_any()
) is applied to each column selected via contains('cvr')
(the first argument to if_any()
) and the results are combined via an OR operation i.e. a row is retained (included in the output) if its value is NA
for any of the columns.if_any()
is if_all()
which requires that the logical expression evaluates to TRUE
for all columns. See Relationship Selection.We wish to filter rows of a data frame by applying an anonymous predicate function to each of a selected set of columns and then taking a logical combination of the results.
In this example, we wish to filter the rows of the data frame df
for which the value of any column whose name contains the string ‘cvr’
is less than 0.1
.
df_2 = df %>%
filter(if_any(contains('cvr'), ~ .x < 0.1))
Here is how this works:
if_any()
inside filter()
to carry out implicit filtering as described in the “Named Function” scenario above.~ .x < 0.1
as the second argument to if_any()
.~ .x < 0.1
(the second argument to if_any()
) is applied to each column selected via contains('cvr')
(the first argument to if_any()
) and the results are combined via an OR operation i.e. a row is retained (included in the output) if its value is less than 0.1
for any of the columns.We wish to filter rows of a data frame by applying multiple predicate function to each of a selected set of columns and then taking a logical combination of the results.
In this example, we wish to filter the rows of the data frame df
for which the value of any column whose name contains the string ‘cvr’
is missing (NA
) or infinite (Inf
).
df_2 = df %>%
filter(if_any(contains('cvr'),
list(is.na, is.infinite)))
Here is how this works:
if_any()
inside filter()
to carry out implicit filtering as described in the “Named Function” scenario above.if_any()
which in this case is list(is.na, is.infinite)
.list(is.na, is.infinite)
(the second argument to if_any()
) is applied to each column selected via contains('cvr')
(the first argument to if_any()
) and the results are combined via an OR operation i.e. a row is retained (included in the output) if its value is either NA
or Inf
for any of the columns.We wish to filter rows of a data frame by a logical expression that involves applying a non-vectorized function (i.e. one that acts on one row at a time) to a set of selected columns.
In this example, we wish to filter the rows of the data frame df
for which the mean of the values of the columns, whose names contain the string ‘cvr’
, is less than 0.1
.
df_2 = df %>%
rowwise() %>%
filter(mean(c_across(contains('cvr'))) < 0.1)
Here is how this works:
rowwise()
switches the mode of execution of the operations that follow from column wise operation to row wise operation which allows us to apply a non-vectorized function one row at a time.rowwise()
, the expression inside filter()
will be applied one row at a time (instead of the usual execution on entire columns).c_across()
works with rowwise()
to make it possible to select the columns on which to perform row-wise operations.c_across()
(just like we did inside of if_all()
or if_any()
).c_across(contains('cvr'))
, we use c_across()
to select all columns whose name contains the string ‘cvr’
.mean()
. The output of computing the mean()
is compared to 0.1
and the row is retained (included in the output) if the mean value is less than 0.1
.