We wish to get a summary of a numeric column (e.g. the mean and standard deviation) for each group where the groups are defined by the combination of values of two categorical columns.

While we can explicitly compute all the common summary statistics for a numeric column over groups (see below), it would be efficient during data inspection to use a single function that given a numeric column and one or more grouping columns, computes the common summary statistics over groups.

```
df.groupby(['col_1', 'col_2'])['col_3'].describe()
```

Here is how this works:

- We apply
`groupby()`

to the data frame`df`

to create a grouped data frame where the grouping variables are`[col_1, col_2]`

. - We then select the numeric column of interest
`col_3`

via the bracket operator`[’col_3’]`

. - We then apply
`describe()`

to the grouped`Series`

object returned by the bracket operator. `describe()`

returns the common summary statistics of numeric variables which are count mean std and the common percentiles.

We wish to compute the mean of a numerical column over groups defined by two categorical columns.

In this example, we wish to compute the mean of the numeric column `col_3`

for each group where the groups are defined by the values of the two categorical columns `col_1`

and `col_2.`

```
pd.crosstab(df['col_1'], df['col_2'],
values = df['col_3'],
aggfunc = 'mean')
```

Here is how this works:

- We recommend the use of
`crosstab()`

to summarize multiple variables in data inspection contexts. `crosstab()`

is capable of aggregating a numeric variable over grouping established by two (or more) categorical variables.- The arguments of
`crosstab()`

are as follows (in order):- The grouping variable to use for the rows of cross table; here
`df['col_1']`

. - The grouping variable to use for the columns of the cross table; here
`df['col_2']`

. - The numeric column to summarize in the cells of the cross table; here
`values = df['col_3']`

. - The aggregation function to use; here
`aggfun = “mean”`

.

- The grouping variable to use for the rows of cross table; here
- Any aggregation function can be passed to the
`aggfunc`

argument of`crosstab()`

. See Summary Statistics for the common summary statistics functions. - Here are two useful extensions to make this easier to inspect visually:
- To round the output to say two digits we can add
`.round(2)`

. - Sometimes having many
`nan`

values can make inspecting the output a little hard on the eye. It often helps to add`.fillna(’-’)`

to replace nans with a dash making the actual numbers visually standout.

- To round the output to say two digits we can add

We wish to compute the sum of a numerical column over groups defined by two categorical columns.

In this example, we wish to compute the sum of the numeric column `col_3`

for each group where the groups are defined by the values of the two categorical columns `col_1`

and `col_2.`

```
pd.crosstab(df['col_1'], df['col_2'],
values = df['col_3'],
aggfunc = "sum")\
.fillna(0)
```

Here is how this works:

- This works similarly as above except that the aggregation function
`aggfunc`

argument of`crosstab()`

is set to`"sum"`

. - We apply the function
`fillna(0)`

so that combinations of values of`col_1`

and`col_2`

that do not exist in the data would get a sum of`0`

and not be missing`NaN`

.

We wish to obtain the ratio between the sum of values of a numeric variable for each group to the total sum of values of the numeric variable where the groups are defined by two categorical columns.

In this example, we compute the ratio of the sum of values of a numeric column `col_3`

for each group defined by `col_1`

and `col_2`

to the total sum of values of `col_3`

.

```
pd.crosstab(df['col_1'], df['col_2'],
values = df['col_3'],
aggfunc = 'sum',
normalize='all')
```

Here is how this works:

- We use the
`crosstab()`

function to compute the sum of the numerical`col_3`

against the two grouping columns`col_1`

and`col_2`

. - We set the
`normalize`

argument of the`crosstab()`

function to`‘all’`

so it normalizes the sums computed earlier over the total sum of`col_3`

. - To normalize across rows we set
`normalize=’index’`

and to normalize across columns we set`normalize=’columns’`

.

We wish to compute multiple aggregation functions on a numeric column for each group where the groups are defined by two columns.

In this example, we compute sum and mean for a numeric column `col_3`

for each group where the groups are defined by `col_1`

and `col_2`

.

```
pd.crosstab(df['col_1'], df['col_2'],
values = df['col_3'],
aggfunc = ['sum', 'mean'])
```

Here is how this works:

- We use
`crosstab()`

as described above with the exception that we pass a list of two functions to the argument`aggfunc`

like so`aggfunc = ['sum', 'mean']`

. - The output has a
`MultiIndex`

for the columns where the top level is for the functions, which are here`sum`

and`mean`

, and the second level is for the values of the column(s) passed to the second argument of`crosstab(),`

which is here`col_2`

.

PYTHON