We wish to identify the columns on each of which we will apply filtering logic.
We will cover the following scenarios
This section is complemented by
We wish to apply a logical expression to every column and to return any row for which any column satisfies that logical expression.
In this example, we wish to return any row in the data frame df
for which any column has a missing value NA
.
df_2 = df %>%
filter(if_any(everything(), is.na))
Here is how this works:
df
to the function filter()
.filter()
we use if_any()
to specify our implicit filtering logic as followsif_any()
is a selection of columns. In this example, we use everything()
to select all columns.if_any()
is the logical expression that we wish to apply to each column selected in the first argument. In this case the logical expression we wish to apply is the function is.na()
which returns TRUE
if a value is NA
(see Missing Values).is.na
(the second argument to if_any()
) is applied to each column selected via everything()
(the first argument to if_any()
) .TRUE
or FALSE
values (one for each column) are combined via an OR operation (because we used if_any()
) i.e. a row is retained (included in the output) if its value is NA
for any of the columns.if_any()
is if_all()
which requires that the logical expression evaluates to TRUE
for all columns. See Relationship Selection.We wish to apply a logical expressions to a set of explicitly specified column and to return any row for which any of those columns satisfies the logical expression.
In this example, we wish to return any row in the data frame df
for which any of the columns col_1
, col_2
or col_4
has a missing value NA
.
df_2 = df %>%
filter(if_any(c(col_1, col_2, col_4), is.na))
Here is how this works:
if_any()
inside filter()
to carry out implicit filtering as described in the “All Columns” scenario above.c(col_1, col_2, col_4)
we identify the columns we wish to select by name. See Basic Selection for a detailed coverage of explicit column selection scenarios.We wish to apply a logical expression to a set of implicitly specified columns and to return any row for which any of those columns satisfies that logical expression. Implicit column selection is when we do not spell out the column names or positions explicitly but rather identify the columns via a property of their name or their data.
In this example, we wish to return any row in the data frame df
for which any column whose name starts with the substring ‘cvr_’
is missing.
df_2 = df %>% filter(if_any(starts_with('cvr_'), is.na))
Here is how this works:
if_any()
inside filter()
to carry out implicit filtering as described in the “All Columns” scenario above.starts_with('cvr_')
to select all columns whose name starts with the substring ‘cvr_’
. See Implicit Selection for a coverage of the most common scenarios of implicit column selection including by name pattern, data type, and Criteria satisfied by the column’s data.We wish to apply a logical expressions to all but a set of columns and to return any row for which any of those columns satisfies the logical expression.
In this example, we wish to return any row in the data frame df
for which any column but the columns col_1
and col_2
is missing.
df_2 = df %>%
filter(if_any(!c(col_1, col_2), is.na))
Here is how this works:
if_any()
inside filter()
to carry out implicit filtering as described in the “All Columns” scenario above.!c(col_1, col_2)
we identify the columns we wish to exclude by name. See Exclude Columns for a coverage of column exclusion scenarios, all of which can be used for implicit filtering.