Grouped Data Frame Summary

We wish to generate the common summary statistics for all columns in a data frame for each group, where the groups are defined by one or more grouping columns. Examples of summary statistics are quantiles for numeric columns and unique value count for non numeric columns. While we can compute each of these statistics for each column of a data frame individually, it would be efficient during data inspection to use a function that given a grouped data frame, computes the common statistics for each of the data frame’s columns appropriate for the column’s data type.

df.groupby('col_1').describe()

Here is how this works:

  • groupby() groups the data frame df by the column col_1. We cover grouping in detail in Aggregating.
  • describe() is then applied to the grouped data frame.
  • describe() returns a data frame where the rows are the groups, here the values of col_1, and each column holds one statistic for one of the non-grouping variables.
  • In particular, the column Index is a MultiIndex where the first level is the column names and the second level holds the statistics computed for each column e.g. count or mean.
PYTHON
I/O