Column Data Types

It is prudent to inspect column data types at the onset of a data analysis project as well as before and after running data manipulation operations to spot any columns whose data type is not suited for the actions to be carried out (e.g. factors encoded as strings). We will cover inspecting column data types here. For a coverage of data types in Pandas and for how to set a column’s data type, please see Data Types.

All Columns

We wish to obtain the data type of each column in a data frame.

df.dtypes

Here is how this works:

  • Pandas data frames have a .dtypes attribute which returns a Series holding the data types of the DataFrame’s columns and whose Index is the DataFrame’s columns’ names.
  • Columns that contain strings take the data type 'object'.

Particular Column

We wish to obtain the data type of a particular column.

df.dtypes['col_1']

Here is how this works:

  • As described above, df.dtypes returns a series where the index is the DataFrame’s columns’s names.
  • We can use that to index a data type of a column by its name i.e. df.dtypes['col_1'].

Type Distribution

We wish to obtain a distribution of columns over data types i.e. the number of columns of each data type in a Data Frame.

df.dtypes.value_counts()

Here is how this works:

  • As described above, df.dtypes returns a series where the index is the DataFrame’s columns’s names.
  • The value_counts() function (which is one of the most used functions in data inspection) returns the number of occurrences of each data type.
PYTHON
I/O