We wish to sort in a grouping context. There are two common scenarios which we cover here:
We wish to sort the rows of a data frame in order of a property of the group they belong to where the groups are defined by another column of the data frame.
In this example, we wish to sort the rows of a data frame df
in descending order of the size of the group they belong to where the groups are defined by the column col_1
.
(df
.assign(group_size=df.groupby('col_1').transform('size'))
.sort_values('group_size', ascending=False))
Here is how this works:
assign()
to add a new column called group_size
to the data frame df
.transform()
to compute the ‘size’
of each group of the grouped data frame created by df.groupby(’col_1’)
. In place of size, we can use any other group attribute e.g. sum of a particular column, as the sorting quantity. See Grouped Transformation for more details.sort_values()
to sort the data frame df
in descending order (hence ascending=False
) of the value of ‘group_size’
.DataFrame
not a GroupedDataFrame
). We cover how to add a column to a grouped data frame in Grouped Transformation.We wish to sort a grouped data frame such that sorting happens within groups. By default, when sorting a grouped data frame, the grouping is ignored and the data frame is sorted just like it were not grouped. In this section, we cover how to sort within groups.
In this example, we wish to group the data frame df
by the column 'col_1'
and then sort each group by the values of the column ‘col_2'
in ascending order.
(df
.groupby('col_1', group_keys=False)
.apply(lambda x: x.sort_values('col_2', ignore_index=True)))
Here is how this works:
sort_values()
for nearly all data frame sorting scenarios. As it turns out, if we try to apply sort_values()
to a grouped data frame, we would get an error: 'DataFrameGroupBy' object has no attribute 'sort_values’
.apply()
to sort the rows of each group individually. apply()
passes each group as a data frame to the lambda
function within.sort_values()
to each of those group data frames to sort in ascending order of ‘col_2’
. We pass ignore_index=True
so a new index is created with the desired sorting.Alternatively
(df
.sort_values('col_2', ignore_index=True)
.groupby('col_3')
.apply(print))
Here is how this works:
groupby()
doesn’t change the order of the rows. Therefore,sort_values('col_2')
groupby('col_3')
DataFrameGroupBy
object can’t be viewed like a regular DataFrame
object, therefore we use apply(print)
so we may view the groups (and verify that they are indeed sorted)apply()
to sort individual groups as we did above.