We wish to look at the variation in one categorical variable against another categorical variable. This is often referred to as *cross tabulation*.

We wish to know what are the distinct combinations that the values of two categorical columns take in a data frame.

In this example, we wish to get the unique combinations of the values of `col_1`

and `col_2`

.

```
df %>% distinct(col_1, col_2)
```

Here is how this works:

- We pass the data frame
`df`

to the function`distinct()`

. - We pass to
`distinct()`

the names of the columns whose value combinations we are interested in. In this case, those column names are`col_1`

and`col_2`

. - The output of
`distinct()`

is a data frame that has the input columns (in this case`col_1`

and`col_2`

) and one row for each unique combination of the values of the input columns. - Note: We can pass any number of columns to
`distinct()`

as per the needs of the situation.

**Cross Table**

We wish to know the number of times each combination of values of categorical columns occurs in a data frame.

In this example, we wish to get the number of times each unique combination of the values of `col_1`

and `col_2`

occurs in a data frame.

```
library(janitor)
df %>% tabyl(col_1, col_2)
```

Here is how this works:

- We pass a data frame
`df`

to the function`tabyl()`

. - We pass to
`tabyl()`

the names of the columns whose combinations we wish to count. In this case, those column names are`col_1`

and`col_2`

. - The output of
`tabyl()`

is a kind of data frame (a`tabyl`

object) where the values of the first column (in this example`col_1`

) are represented by the rows and the values of the second column (in this example`col_2`

) are represented by the columns. The cells hold the number of rows of the original data frame (in this example`df`

) where the value of`col_1`

designated by the current row occurs with the value of`col_2`

designated by the current column. - We recommend the use of
`tabyl()`

from the`janitor`

package instead of base R’s`table()`

because it returns a clean data frame, automatically returns the percent and has enhanced tabulation functionality which we will make use of in the remainder of this page.

**Add Totals**

Adding to the previous section, we wish to include the totals for each row and each column of the cross table (often referred to as marginal totals).

```
library(janitor)
df %>%
tabyl(col_1, col_2) %>%
adorn_totals(c("row", "col"))
```

Here is how this works:

- We use
`tabyl()`

to generate the cross table for`col_1`

and`col_2`

as described in the previous section. - We pass the output of
`tabyl()`

to`adorn_totals()`

to add the totals to the output summary data frame. - We pass the argument
`c("row", "col")`

to`adorn_totals()`

to instruct it to add totals for both the rows and columns. Alternative arguments are`“row“`

and`“col”`

. - The pattern we have seen here where we pass the output of
`tabyl()`

to one of many`adorn_`

functions is how the`tabyl`

set of functions is to be used.

We wish to know the proportion (percentage or density) of the total number of rows (observations) that take each possible combination of values of two columns (variables).

In order to compute a proportion we need to designate what is it that we are comparing i.e. what the numerator and denominator are. In this situation, the numerator is the frequency of each combination of values of the two categorical variables. The denominator, however, can take one of three forms:

*on Rows*: We divide by the sum of values for the row. In other words, we wish to know: of the rows where`col_1 == a`

, what proportion (percent) of those rows have`col_2 == b`

(essentially the conditional probability of`col_2 == b`

given that`col_1 == a`

).*on Columns*: We divide by the sum of values for the column. In other words, we wish to know of the rows where`col_2 == b`

, what proportion (percent) of those rows have`col_1 == a`

(essentially the conditional probability of`col_1 == a`

given that`col_2 == b`

).*on Table*: We divide by the sum of values for the entire table. In other words, we wish to know of the total number of rows, what proportion (percent) have`col_1 == a`

and`col_2==b`

.

**on Rows**

We wish to get the proportion of each combination of values of two columns relative to the first column (represented by the rows of the cross-table).

In this example, we compute a cross table between `col_1`

and `col_2`

and obtain the proportions of combinations relative to `col_1`

.

```
df %>%
tabyl(col_1, col_2) %>%
adorn_percentages('row')
```

Here is how this works:

- We use
`tabyl()`

to obtain the cross table between`col_1`

and`col_2`

of the data frame`df`

as described above. - We pass the output of
`tabyl()`

to the function`adorn_percentages()`

and pass the`denominator`

argument (unstated) as`‘row’`

to compute proportions over rows.

**on Columns**

We wish to get the proportion of each combination of values of two columns relative to the second column (represented by the columns of the cross-table).

In this example, we compute a cross table between `col_1`

and `col_2`

and obtain the proportions of combinations relative to `col_2`

.

```
df %>%
tabyl(col_1, col_2) %>%
adorn_percentages('col')
```

Here is how this works:

- We use
`tabyl()`

to obtain the cross table between`col_1`

and`col_2`

of the data frame`df`

as described above. - We pass the output of
`tabyl()`

to the function`adorn_percentages()`

and pass the`denominator`

argument (unstated) as`‘col’`

to compute proportions over columns.

**on Table**

We wish to get the proportion of each combination of values of two columns relative to the total number of rows in the data frame.

In this example, we compute a cross table between `col_1`

and `col_2`

and obtain the proportions of combinations relative to the number of rows in the data frame `df`

.

```
df %>%
tabyl(col_1_b, col_7) %>%
adorn_percentages('all')
```

Here is how this works:

- We use
`tabyl()`

to obtain the cross table between`col_1`

and`col_2`

of the data frame`df`

as described above. - We pass the output of
`tabyl()`

to the function`adorn_percentages()`

and pass the`denominator`

argument (unstated) as`‘all’`

to compute proportions over the table (total number of rows in the original data frame`df`

).

**Rounding**

We wish to set a level of precision for the percentages computed.

In this example, we set the level of precision to `2`

decimal places i.e. `0.xx`

.

```
df %>% tabyl(col_1, col_2) %>%
adorn_percentages('all') %>%
adorn_rounding(2)
```

Here is how this works:

- We use
`tabyl()`

and`adorn_percentages()`

as described above. - We pass the output of
`adorn_percentages()`

to`adorn_rounding()`

while setting the`digits`

argument (unstated) to`2`

to obtain a precision of 2 decimal places.

R