We wish to get a summary of a numeric column (e.g. the mean and standard deviation) for each group where the groups are defined by the values of a categorical column.

While we can explicitly compute all the common summary statistics for a numeric column over groups (see below), it would be efficient during data inspection to use a single function that given a numeric column and one or more grouping columns, computes the common summary statistics over groups.

In this example, we group by `col_1`

then compute the common summary statistic of a numeric column named `col_2`

.

```
df.groupby('col_1')['col_2'].describe()
```

Here is how this works:

- We apply
`groupby()`

to`df`

to create a grouped data frame where the grouping variable is`col_1`

. - We then select the numeric column of interest
`col_2`

via the bracket operator`[’col_2’]`

. - We then apply
`describe()`

to the grouped`Series`

object returned by the bracket operator. `describe()`

returns the common summary statistics of numeric variables which are count mean std and the common percentiles.

We wish to compute the mean of a numerical column over groups defined by one categorical column.

In this example, we wish to compute the mean of the numeric column `col_2`

for each group where the groups are defined by the values of the categorical column `col_1`

.

```
df.groupby('col_1')['col_2'].mean()
```

Here is how this works:

- We apply
`groupby()`

to`df`

to create a grouped data frame where the grouping variable is`col_1`

. - We then select the numeric column of interest
`col_2`

via the bracket operator`[’col_2’]`

. - We then apply
`mean()`

to the grouped`Series`

object returned by the bracket operator to return the mean of the numeric variable`col_2`

for each group defined by the variable`col_1`

. - To round the output to say two digits we can add
`.round(2)`

. - See Summary Statistics for how to compute all the common summary statistics.

We wish to compute the sum of values of a numerical column over groups defined by one categorical column.

In this example, we wish to compute the sum of the values of the numeric column `col_2`

for each group where the groups are defined by the values of the categorical column `col_1`

.

```
df.groupby('col_1')['col_2'].sum()
```

Here is how this works:

This works similarly as above but we use `sum()`

instead of `mean()`

.

We wish to obtain the ratio between the sum of values of a numeric variable for each group to the total sum of values of the numeric variable where the groups are defined by a grouping variable.

In this example, we compute the ratio of the sum of values of a numeric column `col_2`

for each group defined by `col_1`

to the total sum of values of `col_2`

.

```
df.groupby('col_1')['col_2'].sum() / df['col_2'].sum()
```

Here is how this works:

- We compute the sum of values of
`col_2`

for each group where the groups are defined by`col_1`

as we did above. - We then divide by the sum of values of col_2 computed via
`df['col_2'].sum()`

.

PYTHON