Data Frame Summary

We wish to generate the common summary statistics for all column in a data frame, such as quantiles for numeric columns and unique value count for non numeric columns. While we can compute each of those statistics for each column of a data frame individually, it would be efficient during data inspection to use a function that given a data frame computes the common statistics appropriate for the column’s data type.

df.describe()

Here is how this works:

  • The Pandas data frame has a describe() method that returns summary statistics for the data in the data frame as follows:
    • For numerical columns: observation count, mean, std, min, percentiles, max.
    • For non-numerical columns: observation count, unique value count, most common value, frequency of most common value
  • It’s important to note that that if the data frame has both numerical and non-numerical columns, describe() will restrict the summary to include only numerical columns or, if there are no numerical column, only non-numerical columns. To force describe() to return a summary of non-numerical columns, while numerical columns exist, we can use df.describe(include=["object"]).
  • describe() doesn’t return stats on missing values. We cover that in Missing Values.
PYTHON
I/O