In a data inspection context, we typically filter rows by column values to get a sense of the data e.g. to check for surprising values. We cover the basics of filtering here and in detail in Filtering.
We wish to extract rows that meet a certain condition on column values. In this example, we filter out rows where the column col_1
has a value of 0
.
df %>% filter(col_1 == 0)
Here is how this works:
df
to the function filter()
.filter()
for filtering rows by a boolean condition on column values.col_1
with the integer 0
and return only the rows where the condition evaluates to TRUE
.We wish to extract rows that meet a certain condition on column values. We wish to return only a specified set of columns (and not all columns). This is useful while developing a filter to look at just the relevant columns to verify if the filtering works as expected.
In this example, we wish to return rows where the value of col_1
is greater than 0
and to return columns col_1
and col_3
.
df %>% filter(col_1 > 0) %>% select(col_2, col_3)
Here is how this works:
select()
to specify the column names of the columns of the data frame df
that we wish to include in the output. In this example, the column names are col_1
and col_3
. For a detailed coverage, see Selecting by Name.filter()
before select()
so filter may have access to all the data frame’s columns while filtering and not just the selected columns.