Data Frame Summary

We wish to generate the common summary statistics for all column in a data frame, such as quantiles for numeric columns and unique value count for non numeric columns. While we can compute each of those statistics for each column of a data frame individually, it would be efficient during data inspection to use a function that given a data frame computes the common statistics appropriate for the column’s data type.

library(skimr)

df %>% skim()

Here is how this works:

  • We pass the data frame df to the function skim().
  • skim(), from the skimr package, is a much more powerful alternative to R’s built in summary() function.
  • skim() separately describes numerical and non-numerical variables. In particular, it returns the following:
    1. Data Summary: observation count, column count.
    2. For each numerical column: missing count, completeness rate, mean, sd, percentiles, histogram
    3. For each non-numerical column: missing count, completeness rate, unique count, ..
R
I/O