We wish to look at the variation in one categorical variable against another categorical variable. This is often referred to as *cross tabulation*.

We wish to know what are the distinct combinations that the values of two categorical columns take in a data frame.

In this example, we wish to get the unique combinations of the values of `col_1`

and `col_2`

.

```
df[['col_1', 'col_2']].drop_duplicates()
```

Here is how this works:

- We select the columns of interest via the bracket operator
`df[['col_1', 'col_2']]`

. - We then apply the function
`drop_duplicates()`

which retains one instance of each combination of possible values of`col_1`

and`col_2`

. The output of`drop_duplicates()`

is therefore the unique combinations of the values of`col_1`

and`col_2`

.

**Cross Table**

We wish to know the number of times each combination of values of categorical columns occurs in a data frame.

In this example, we wish to get the number of times each unique combination of the values of `col_1`

and `col_2`

occurs in a data frame.

```
pd.crosstab(df['col_1'], df['col_2'])
```

Here is how this works:

- We use the incredibly powerful
`crosstab()`

method of Pandas data frames. `crosstab()`

is quite versatile. In it’s simplest form, it takes two columns and cross tabulates them against each other; i.e. it returns a table where the rows are the values of the first variable (here`col_1`

) and the columns are the values of the second variable (here`col_2`

) and the cells contain the count of co-occurrences of the two corresponding row and column values.

**Add Totals**

Adding to the previous section, we wish to include the totals for each row and each column of the cross table (often referred to as marginal totals).

```
pd.crosstab(df['col_1'], df['col_2'],
margins = True,
margins_name = "Total")
```

Here is how this works:

- We instruct
`crosstab()`

to add the row and column totals by passing the argument`margins=True`

. - The default name of the margin row and column is
`“all”`

which may not be terribly intuitive. We can use the`margins_name`

argument of`crosstab()`

to specify the name we wish to use which in this case we set as`margins_name = "Total"`

.

We wish to know the proportion (percentage or density) of the total number of rows (observations) that take each possible combination of values of two columns (variables).

In order to compute a proportion we need to designate what is it that we are comparing i.e. what the numerator and denominator are. In this situation, the numerator is the frequency of each combination of values of the two categorical variables. The denominator, however, can take one of three forms:

*on Rows*: We divide by the sum of values for the row. In other words, we wish to know: of the rows where`col_1 == a`

, what proportion (percent) of those rows have`col_2 == b`

(essentially the conditional probability of`col_2 == b`

given that`col_1 == a`

).*on Columns*: We divide by the sum of values for the column. In other words, we wish to know of the rows where`col_2 == b`

, what proportion (percent) of those rows have`col_1 == a`

(essentially the conditional probability of`col_1 == a`

given that`col_2 == b`

).*on Table*: We divide by the sum of values for the entire table. In other words, we wish to know of the total number of rows, what proportion (percent) have`col_1 == a`

and`col_2==b`

.

**on Rows**

We wish to get the proportion of each combination of values of two columns relative to the first column (represented by the rows of the cross-table).

In this example, we compute a cross table between `col_1`

and `col_2`

and obtain the proportions of combinations relative to `col_1`

.

```
pd.crosstab(df['col_1'], df['col_2'],
normalize='index')
```

Here is how this works:

- Cross tabulation via
`crosstab()`

works as described in above. - To convert the counts to a proportion we use the
`normalize`

argument. - To normalize across the rows of the cross table, we set
`normalize=‘index’`

.

**on Columns**

We wish to get the proportion of each combination of values of two columns relative to the second column (represented by the columns of the cross-table).

In this example, we compute a cross table between `col_1`

and `col_2`

and obtain the proportions of combinations relative to `col_2`

.

```
pd.crosstab(df['col_1'], df['col_2'],
normalize='columns')
```

Here is how this works:

- Cross tabulation via
`crosstab()`

works as described in above. - To convert the counts to a proportion we use the
`normalize`

argument. - To normalize across the rows of the cross table, we set
`normalize=‘columns’`

.

**on Table**

We wish to get the proportion of each combination of values of two columns relative to the total number of rows in the data frame.

In this example, we compute a cross table between `col_1`

and `col_2`

and obtain the proportions of combinations relative to the number of rows in the data frame `df`

.

```
pd.crosstab(df['col_1'], df['col_2'],
normalize='all')
```

Here is how this works:

- Cross tabulation via
`crosstab()`

works as described in above. - To convert the counts to a proportion we use the
`normalize`

argument. - To normalize across the cross table (i.e. denominator is total number of rows in the original data frame
`df`

), we set`normalize=‘all’`

.

**Rounding**

We wish to set a level of precision for the percentages computed.

In this example, we set the level of precision to `2`

decimal places i.e. `0.xx`

.

```
pd.crosstab(df['col_1'], df['col_2'],
normalize='all')\
.round(2)
```

Here is how this works:

- We use
`crosstab()`

while setting`normalize`

to`‘index’`

,`‘columns’`

, and`‘all’`

as needed as described above. - We then apply
`round()`

while setting the argument`decimals`

(unstated) to`2`

to obtain a precision of 2 decimal places.

PYTHON