Filter Groups

Filter a grouped data frame to keep only the groups that meet certain criteria. Note that in this scenario, we are interested in filtering entire groups and not individual rows.

In this example, we wish to keep only groups that have more than one row and where the sum of values of the column col_2 is above 0 for the group.

df_2 = df %>% 
  group_by(col_1) %>% 
  filter(n() > 1, sum(col_2) > 0)

Here is how this works:

  • When filter() is preceded by group_by(), the filter is applied to each group individually.
  • The first condition n() > 1 is TRUE if the current row belongs to a group with more than 1 row. n() when called following a group_by() returns the number of rows in the group.
  • The second condition sum(col_2) > 0 is TRUE if the current row belongs to a group where the sum of values for col_2 is greater than 0. When an aggregation function such as sum() is computed following group_by(), it is computed for each group.
  • Rows where both conditions are TRUE, i.e. that belong to groups with a number of rows more than 1 and where the sum of values for col_2 is greater than 0, are retained (included in the output).
R
I/O