We wish to sort in a grouping context. There are two common scenarios which we cover here:
We wish to sort the rows of a data frame in order of a property of the group they belong to where the groups are defined by another column of the data frame.
In this example, we wish to sort the rows of a data frame df
in descending order of the size of the group they belong to where the groups are defined by the column col_1
.
df %>%
group_by(col_1) %>%
mutate(group_size = n()) %>%
arrange(desc(group_size))
Here is how this works:
mutate()
to add a new column called group_size
to the data frame df
. We can sort by other group attributes e.g. sum of a particular column. See Grouped Transformation.n()
.arrange()
to sort the data frame df
in descending order (hence desc()
) of the value of group_size
.We wish to sort a grouped data frame such that sorting happens within groups.
By default, when sorting a grouped data frame via arrange()
, the grouping is ignored and the data frame is sorted just like it were not grouped. In this section, we cover how to sort within groups.
In this example, we wish to group the data frame df
by the column col_1
and then sort each group by the values of the column col_2
in ascending order.
df %>%
group_by(col_1) %>%
arrange(col_2, .by_group = TRUE)
Here is how this works:
group_by()
to group the data frame df
by the values of the column col_1
.arrange()
function has a parameter .by_group
which, when sorting grouped data frames, determines whether sorting happens within groups or ignores grouping and sorts the data frame as a whole. By default .by_group=FALSE
..by_group=TRUE
to sort within groups. Essentially, arrange()
first sorts by the grouping column and then sorts by the one or more sorting columns passed to arrange()
.