Common Filtering Scenarios

In this section we cover the basic yet most common scenarios of filtering data frame rows by column values.

Logical Column

We wish to filter by a logical column to return the rows where the logical column is TRUE.

In this example, we wish to filter the rows of the data frame df where the logical column col_3 has a value of TRUE.

df_2 = df %>%
    filter(col_3)

Here is how this works:

  • We pass the data frame df to the filtering function filter() via the pipe %>%.
  • A column of a logical data type (comprised of TRUE and / or FALSE values) can be passed to filter() which would return the rows where the passed column (in this example: col_3) takes a value of TRUE.

Compare to Scaler

We wish to obtain (filter) the rows of a data frame where the value of a particular column meets a condition that involves comparison to a scaler value.

In this example, we wish to obtain the rows of the data frame df where the numerical column col_1 takes a value greater than 0.

df_2 = df %>%
    filter(col_1 > 0)

Here is how this works:

  • We pass the data frame df to the filtering function filter() via the pipe %>%.
  • We pass to filter the logical expression we wish to carry out which here is comparing the value of a column col_1 to a scaler value 0 via col_1 > 0.
  • filter() executes the logical expression then returns the rows for which the expression evaluates to TRUE which in this example those rows where col_1 takes a value greater than zero.
  • In this example we used a greater than comparison >. In general, we can use any comparison operation on numerical columns including smaller than <, equal ==, greater than or equal >=, or smaller than or equal <=. See Numerical Operations for more details.
  • In addition to applying comparison operations to numerical columns, we can apply the appropriate comparison operations to non-numeric columns i.e. String, Logical, Factor, and Date-Time. We cover those in their respective operations section.

Compare Columns

We wish to obtain rows where a comparison involving two of the data frame’s column evaluates to TRUE.

In this example, we wish to return rows where the value of one column col_1 is larger than the value of another column col_2.

df_2 = df %>%
    filter(col_1 > col_2)

Here is how this works:

  • We pass the data frame df to the filtering function filter() via the pipe %>%.
  • We pass to filter the logical expression we wish to carry out which here is comparing the value of a column col_1 to another column col_2 via col_1 > col_2.
  • filter() executes the logical expression then returns the rows for which the expression evaluates to TRUE which in this example those rows where col_1 takes a value greater than that of col_2.

Complement

We wish to obtain rows where a condition is not satisfied.

In this example, we wish to obtain rows where the value of a string column col_2 is neither ‘a’ nor ‘c’.

df_2 = df %>%
  filter(!(col_2 %in% c('a', 'c')))

Here is how this works:

  • We pass the data frame df to the filtering function filter() via the pipe %>%.
  • We use the %in% operator to check whether each value of col_2 is either ‘a’ or ‘c’. We cover checking for membership in a list in detail in List Operations.
  • The expression col_2 %in% c('a', 'c') returns TRUE for rows where the value of col_2 is either ‘a’ or ‘c’.
  • We are looking for rows where column col_2 is neither ‘a’ nor ‘c’. In other words, we are looking for the complement of the condition col_2 %in% c('a', 'c'). In R we obtain that by applying the complement operator ! which we add right before the condition inside filter().
R
I/O