In this section we cover the basic yet most common scenarios of filtering data frame rows by column values.
We wish to filter by a logical column to return the rows where the logical column is TRUE
.
In this example, we wish to filter the rows of the data frame df where the logical column col_3
has a value of TRUE
.
df_2 = df %>%
filter(col_3)
Here is how this works:
df
to the filtering function filter()
via the pipe %>%
.TRUE
and / or FALSE
values) can be passed to filter()
which would return the rows where the passed column (in this example: col_3
) takes a value of TRUE
.We wish to obtain (filter) the rows of a data frame where the value of a particular column meets a condition that involves comparison to a scaler value.
In this example, we wish to obtain the rows of the data frame df
where the numerical column col_1
takes a value greater than 0.
df_2 = df %>%
filter(col_1 > 0)
Here is how this works:
df
to the filtering function filter()
via the pipe %>%
.col_1
to a scaler value 0
via col_1 > 0
.filter()
executes the logical expression then returns the rows for which the expression evaluates to TRUE
which in this example those rows where col_1
takes a value greater than zero.>
. In general, we can use any comparison operation on numerical columns including smaller than <
, equal ==
, greater than or equal >=
, or smaller than or equal <=
. See Numerical Operations for more details.We wish to obtain rows where a comparison involving two of the data frame’s column evaluates to TRUE
.
In this example, we wish to return rows where the value of one column col_1
is larger than the value of another column col_2
.
df_2 = df %>%
filter(col_1 > col_2)
Here is how this works:
df
to the filtering function filter()
via the pipe %>%
.col_1
to another column col_2
via col_1 > col_2
.filter()
executes the logical expression then returns the rows for which the expression evaluates to TRUE
which in this example those rows where col_1
takes a value greater than that of col_2
.We wish to obtain rows where a condition is not satisfied.
In this example, we wish to obtain rows where the value of a string column col_2
is neither ‘a’
nor ‘c’
.
df_2 = df %>%
filter(!(col_2 %in% c('a', 'c')))
Here is how this works:
df
to the filtering function filter()
via the pipe %>%
.%in%
operator to check whether each value of col_2
is either ‘a’
or ‘c’
. We cover checking for membership in a list in detail in List Operations.col_2 %in% c('a', 'c')
returns TRUE
for rows where the value of col_2
is either ‘a’
or ‘c’
.col_2
is neither ‘a’
nor ‘c’
. In other words, we are looking for the complement of the condition col_2 %in% c('a', 'c')
. In R we obtain that by applying the complement operator !
which we add right before the condition inside filter()
.