Dataset Structure Summary

When we wish to build familiarity with a new dataset, rather than running multiple commands to get the data frame’s size, column names, and column data types individually, it would be efficient if we can obtain a consolidated summary covering these common data frame structure aspects with one command. Luckily, we can do just that.

df.info()

Here is how this works:

  • For a given data frame, here df, we call the info() method which returns a helpful summary of the data frame.
  • Perhaps most useful, info() returns a data frame’s row count, column count, column names, column types, and number of non-null values per column.
  • info() also returns a helpful breakdown (frequency distribution) of columns by data type.
  • Finally, info() returns a rough estimate of the virtual memory consumed by the data frame. See Memory Use for how to obtain a more accurate memory consumption figure.
PYTHON
I/O