Multivariate Data Summary

A common data inspection activity is to examine how different columns vary together. This is commonly referred to as multivariate data summary or, when the columns are categorical, cross tabulation.

In this chapter we will cover the following common multivariate data summary scenarios:

  1. Table Summary: Group the Table by one or more columns then obtain a consolidated summary of each column for each group.
  2. Numeric by Factor: Obtain a summary of a given numerical variable for each group, where groups are defined by a given categorical column.
  3. Factor by Factor: Obtain a summary of a given categorical column against another categorical column i.e. how likely are the combinations of values of the two categorical values to co-occur.
  4. Numeric by Two Factors: Obtain a summary of a given numerical column for each group, where groups are defined by two categorical columns.
  5. Multiple Facets: Generalizing the scenarios above for more than two categorical columns.
SQL
I/O