Grouped Sorting

We wish to sort in a grouping context. There are two common scenarios which we cover here:

  1. Sorting Groups where we sort the rows of a data frame in order of a property of the group they belong to, e.g. group size, where the groups are defined by another column of the data frame.
  2. Sorting Within Groups where we sort a grouped data frame such that sorting happens within groups.

Sorting Groups

We wish to sort the rows of a data frame in order of a property of the group they belong to where the groups are defined by another column of the data frame.

In this example, we wish to sort the rows of a data frame df in descending order of the size of the group they belong to where the groups are defined by the column col_1.

df %>% 
  group_by(col_1) %>% 
  mutate(group_size = n()) %>% 
  arrange(desc(group_size))

Here is how this works:

  • We use mutate() to add a new column called group_size to the data frame df. We can sort by other group attributes e.g. sum of a particular column. See Grouped Transformation.
  • To compute the group size, we use n().
  • Finally we use arrange() to sort the data frame df in descending order (hence desc()) of the value of group_size.
  • The resulting data frame will be sorted in descending order of the size of the group each row belongs to.

Sorting Within Groups

We wish to sort a grouped data frame such that sorting happens within groups.

By default, when sorting a grouped data frame via arrange(), the grouping is ignored and the data frame is sorted just like it were not grouped. In this section, we cover how to sort within groups.

In this example, we wish to group the data frame df by the column col_1 and then sort each group by the values of the column col_2 in ascending order.

df %>% 
    group_by(col_1) %>% 
    arrange(col_2, .by_group = TRUE)

Here is how this works:

  • We use group_by() to group the data frame df by the values of the column col_1.
  • The arrange() function has a parameter .by_group which, when sorting grouped data frames, determines whether sorting happens within groups or ignores grouping and sorts the data frame as a whole. By default .by_group=FALSE.
  • We set .by_group=TRUE to sort within groups. Essentially, arrange() first sorts by the grouping column and then sorts by the one or more sorting columns passed to arrange().
R
I/O