It is prudent to inspect column data types at the onset of a data analysis project as well as before and after running data manipulation operations to spot any columns whose data type is not suited for the actions to be carried out (e.g. factors encoded as strings). We will cover inspecting column data types here. For a coverage of data types in R and for how to set a column’s data type, please see Data Types.
We wish to obtain the data type of a particular column.
df %>% pull(col_1) %>% typeof()
Here is how this works:
pull()
from dplyr
to extract a particular column from the Data Frame df.col_1
to the function typeof()
which returns the data type of the column as a string.typeof(df$col_1)
. pull()
is preferable to the $
operator because it fits nicely in a piped flow and allows us to rename the column should we wish to.We wish to obtain the data type of each column in a data frame.
df %>% map_chr(typeof)
Here is how this works:
map_chr()
applies the function typeof()
to each column in the Data Frame df
one by one.map_chr()
is a vector of strings each of which is the data type of the corresponding column in the Data Frame df
.We wish to obtain a distribution of columns over data types i.e. the number of columns of each data type in a Data Frame.
df %>% map_chr(typeof) %>% tabyl()
Here is how this works:
map_chr()
applies the function typeof()
to each column in the Data Frame df
one by one.map_chr()
is a vector of strings each of which is the data type of the corresponding column in the Data Frame df
.tabyl()
from the janitor
package which returns a frequency table (a data frame) containing the number of columns of each data type.