Filter a grouped data frame to keep only the groups that meet certain criteria. Note that in this scenario, we are interested in filtering entire groups and not individual rows.
In this example, we wish to keep only groups that have more than one row and where the sum of values of the column col_2
is above 0 for the group.
df_2 = df %>%
group_by(col_1) %>%
filter(n() > 1, sum(col_2) > 0)
Here is how this works:
filter()
is preceded by group_by()
, the filter is applied to each group individually.n() > 1
is TRUE
if the current row belongs to a group with more than 1 row. n()
when called following a group_by()
returns the number of rows in the group.sum(col_2) > 0
is TRUE
if the current row belongs to a group where the sum of values for col_2
is greater than 0. When an aggregation function such as sum()
is computed following group_by()
, it is computed for each group.TRUE
, i.e. that belong to groups with a number of rows more than 1 and where the sum of values for col_2
is greater than 0, are retained (included in the output).