Filtered Rows

In a data inspection context, we typically filter rows by column values to get a sense of the data e.g. to check for surprising values. We cover the basics of filtering here and in detail in Filtering.

Filter

We wish to extract rows that meet a certain condition on column values. In this example, we filter out rows where the column col_1 has a value of 0.

df %>% filter(col_1 == 0)

Here is how this works:

  • We pass the data frame df to the function filter().
  • We use filter() for filtering rows by a boolean condition on column values.
  • In this case, we compare the value of col_1 with the integer 0 and return only the rows where the condition evaluates to TRUE.

Selected Columns

We wish to extract rows that meet a certain condition on column values. We wish to return only a specified set of columns (and not all columns). This is useful while developing a filter to look at just the relevant columns to verify if the filtering works as expected.

In this example, we wish to return rows where the value of col_1 is greater than 0 and to return columns col_1 and col_3.

df %>% filter(col_1 > 0) %>% select(col_2, col_3)

Here is how this works:

  • We use select() to specify the column names of the columns of the data frame df that we wish to include in the output. In this example, the column names are col_1 and col_3. For a detailed coverage, see Selecting by Name.
  • We run filter() before select() so filter may have access to all the data frame’s columns while filtering and not just the selected columns.
R
I/O