Column Data Types

It is prudent to inspect column data types at the onset of a data analysis project as well as before and after running data manipulation operations to spot any columns whose data type is not suited for the actions to be carried out (e.g. factors encoded as strings). We will cover inspecting column data types here. For a coverage of data types in R and for how to set a column’s data type, please see Data Types.

Particular Column

We wish to obtain the data type of a particular column.

df %>% pull(col_1) %>% typeof()

Here is how this works:

  • We use pull() from dplyr to extract a particular column from the Data Frame df.
  • We pass the column col_1 to the function typeof() which returns the data type of the column as a string.
  • Alternatively, we could use typeof(df$col_1). pull() is preferable to the $ operator because it fits nicely in a piped flow and allows us to rename the column should we wish to.

All Columns

We wish to obtain the data type of each column in a data frame.

df %>% map_chr(typeof)

Here is how this works:

  • map_chr() applies the function typeof() to each column in the Data Frame df one by one.
  • The output from map_chr() is a vector of strings each of which is the data type of the corresponding column in the Data Frame df.

Type Distribution

We wish to obtain a distribution of columns over data types i.e. the number of columns of each data type in a Data Frame.

df %>% map_chr(typeof) %>% tabyl()

Here is how this works:

  • map_chr() applies the function typeof() to each column in the Data Frame df one by one.
  • The output from map_chr() is a vector of strings each of which is the data type of the corresponding column in the Data Frame df.
  • The vector of column data types is passed to tabyl() from the janitor package which returns a frequency table (a data frame) containing the number of columns of each data type.
R
I/O