Often times we wish to select columns of one or more data types; for instance to apply a data type specific operation to all of them e.g. select columns with a numeric data type and then round each to 2 decimal places.
See General Operations for a coverage of data types and data type transformation.
We wish to select all columns of one particular data type.
In this example, we wish to select all columns of a numeric data type (i.e. float or integer).
df_2 = df %>% select(where(is.numeric))
Here is how this works:
TRUE
if the column’s data type matches the type the function is checking for and FALSE
otherwise. For instance, is.numeric()
returns TRUE
if the column is of any numeric data type (integer or float) and FALSE
otherwise.is.numeric()
is used in conjunction with where()
which returns the column names for those columns for which the is.numeric()
returns TRUE
.select()
then extracts those columns from the data frame df
.is.integer()
for columns of an integer data typeis.double()
for columns of a floating point data typeis.factor()
for columns of a factor (categorical) data typeis.logical()
for columns of a logical (boolean) data typeis.character()
for columns of a character (string) data typeWe wish to select all columns of two or more data types.
In this example, we wish to select all columns of integer or logical data types.
df_2 = df %>% select(where(~ is.integer(.) | is.logical(.)))
Here is how this works:
is.integer()
and is.logical()
to each column of the data frame df
.is.integer()
and is.logical()
, returns TRUE
for each column whose data type matches the type the function is checking for and FALSE
otherwise.|
(logical or) of the output of the two functions which would evaluate to TRUE
for any column of either an integer or a logical data type.~
to pass each column (via the dot .) to each of the two functions is.integer()
and is.logical()
.where()
maps the TRUE
values to column names and select()
extracts those columns from the data frame df
.