Selecting by Data Type

Often times we wish to select columns of one or more data types; for instance to apply a data type specific operation to all of them e.g. select columns with a numeric data type and then round each to 2 decimal places.

See General Operations for a coverage of data types and data type transformation.

Single Data Type

We wish to select all columns of one particular data type.

In this example, we wish to select all columns of a numeric data type (i.e. float or integer).

df_2 = df %>% select(where(is.numeric))

Here is how this works:

  • There is a set of predicate functions (i.e. ones that return True or False) that check if a column is of a certain data type and return TRUE if the column’s data type matches the type the function is checking for and FALSE otherwise. For instance, is.numeric() returns TRUE if the column is of any numeric data type (integer or float) and FALSE otherwise.
  • is.numeric() is used in conjunction with where() which returns the column names for those columns for which the is.numeric() returns TRUE.
  • select() then extracts those columns from the data frame df.
  • The other commonly used data type predicate functions are:
    • is.integer() for columns of an integer data type
    • is.double() for columns of a floating point data type
    • is.factor() for columns of a factor (categorical) data type
    • is.logical() for columns of a logical (boolean) data type
    • is.character() for columns of a character (string) data type

Multiple Data Types

We wish to select all columns of two or more data types.

In this example, we wish to select all columns of integer or logical data types.

df_2 = df %>% select(where(~ is.integer(.) | is.logical(.)))

Here is how this works:

  • We apply two function is.integer() and is.logical() to each column of the data frame df.
  • Each of the two functions, is.integer() and is.logical(), returns TRUE for each column whose data type matches the type the function is checking for and FALSE otherwise.
  • We take the | (logical or) of the output of the two functions which would evaluate to TRUE for any column of either an integer or a logical data type.
  • We used a formula ~ to pass each column (via the dot .) to each of the two functions is.integer() and is.logical().
  • where() maps the TRUE values to column names and select() extracts those columns from the data frame df.
R
I/O