Relationship Specification

In an implicit filtering scenario, we wish to specify whether to AND or OR the logical values resulting from applying one or more logical expression(s) to each of a set of columns.

This section is complemented by

  • Column Selection where we cover how to select the columns to each of which we will apply filtering logic.
  • Function Specification where we cover how to specify one or more logical expressions or predicate functions (functions that return TRUE or FALSE) to apply to the selected set of columns

AND

We wish to filter rows for which a logical expression is TRUE for all of a selected set of columns.

In this example, we wish to filter the rows of the data frame df for which the value of every column whose name contains the string ‘cvr’ is less than 0.1.

df_2 = df %>% 
    filter(if_all(contains('cvr'), ~ .x < 0.1))

Here is how this works:

  • We use if_all() to specify that we wish to retain rows for which the given logical expression is TRUE for each of the selected columns.
  • if_all() accepts two inputs as follows:
    • The first argument to if_all() is a selection of columns. In this example, we use contains('cvr') to select any column whose name contains the substring ‘cvr’.
    • The second argument to if_all() is the logical expression that we wish to apply to every column selected in the first argument. In this case the logical expression we wish to apply is the anonymous function ~ .x < 0.1.
  • The anonymous function ~ .x < 0.1 (the second argument to if_all()) is applied to each column selected via contains('cvr') (the first argument to if_all()) and the results are combined via an AND operation i.e. a row is retained (included in the output) if its value is < 0.1 for all of the selected columns.

OR

We wish to filter rows for which a logical expression is TRUE for any of a selected set of columns.

In this example, we wish to filter the rows of the data frame df for which the value of any column whose name contains the string ‘cvr’ is less than 0.1.

df_2 = df %>% 
    filter(if_any(contains('cvr'), ~ .x < 0.1))

Here is how this works:

  • We use if_any() to specify that we wish to retain rows for which the given logical expression is TRUE for any of the selected columns.
  • if_any() takes inputs in exactly the same way as if_all(). See above.
  • The anonymous function ~ .x < 0.1 (the second argument to if_any()) is applied to each column selected via contains('cvr') (the first argument to if_any()) and the results are combined via an OR operation i.e. a row is retained (included in the output) if its value is < 0.1 for any of the selected columns.
R
I/O