As we have seen in this section so far, a typical cross table involves two categorical variables; one represented by the rows of the cross table and the other by the columns. Occasionally, we need to work with more than two categorical variables and wish to group by more than one categorical variable across the rows or the columns of the cross table or both . We will cover how to do that in this section.
We have three grouping columns. We wish to represent two over the rows and the third over the columns of the cross table.
df %>%
group_by(col_1, col_2, col_3) %>%
count() %>%
pivot_wider(names_from = col_3, values_from = n)
Here is how this works:
group_by()
and count()
to generate a summary table containing the number of rows in each group where the groups are defined by the values of col_1
, col_2
, and col_3
.pivot_wider()
to pivot (move) the third column to the columns of the cross table. We pass the argument names_from = col_3
so that the names of the new columns would be the values of col_3
.values_from = n
so that the cells of the resulting cross table would have the values that were previously in the column n
created by count()
.We have three grouping columns. We wish to represent two over the columns and the third over the rows of the cross table.
df %>%
group_by(col_1, col_2, col_3) %>%
count() %>%
unite(key, col_2, col_3) %>%
pivot_wider(names_from = key, values_from = n)
Here is how this works:
group_by()
and count()
as described above.unite()
to merge together col_2
and col_3
in a unique identifier under a new column called key
. While we can have multiple grouping columns represented as rows, we can only have one represented as columns (R has no equivalent of the multi-index in Pandas).pivot_wider()
to pivot (move) the third column (the new compound column named key
) to the columns of the cross table. We pass the argument names_from = col_3
so that the names of the new columns would be the values of col_3
.values_from = n
so that the cells of the resulting cross table would have the values that were previously in the column n
created by count()
.We have four grouping columns. We wish to represent two over the rows and the other two over the columns of the cross table.
df %>%
group_by(col_1, col_2, col_3, col_4) %>%
count() %>%
unite(key, col_3, col_4) %>%
pivot_wider(names_from = key, values_from = n)
Here is how this works:
This works exactly as the “Columns” scenario described above except that when we pivot, we leave behind two columns col_1
and col_2
to be represented by the rows of the cross table.