Column Selection for Implicit Renaming

In Function Specification for Implicit Renaming, we cover how to specify the functions to be applied to the current column names to generate the desired names. In this section, we show how to select the column(s) to be renamed when we only wish to rename a subset of columns.

We will cover the following scenarios:

  • All Columns where we cover how to apply a function to rename all columns.
  • Explicit Selection where we cover how to apply a function to rename each of a set of explicitly selected columns of a data frame e.g. selecting columns by spelling out their names.
  • Implicit Selection where we cover how to apply a function to rename each of a set of implicitly selected columns of a data frame e.g. selecting columns whose names contain a certain substring.
  • Exclude Columns where we cover how to apply a function to rename each column of a data frame except for a set of excluded columns.

For more detailed coverage of column selection, see Selection.

All Columns

We wish to rename all columns of a data frame by applying a function to their current names to generate the desired names.

In this example, we wish to convert the names of all columns to lowercase.

df_2 = df %>% rename_with(str_to_lower)

Here is how this works:

  • We use the function rename_with() from the dplyr package to apply a function to the current column names to generate the desired column names for all columns.
  • We pass to rename_with() the name of the function that we wish to apply to each column name, which here is str_to_lower().

Explicit Selection

We wish to rename some columns of a data frame by applying a function to their current names to generate the desired names.

In this example, we wish to rename columns 2 and 3 by lowering the case of their current names and replacing dash separators ‘-’ with underscores ‘_’.

df_2 = df %>% 
  rename_with(~ .x %>% 
                str_to_lower() %>% 
                str_replace_all('-', '_'), 
              c(2, 3))

Here is how this works:

  • We use the function rename_with() from dplyr to carry out implicit renaming i.e. applying a function to rename some or all columns of a data frame.
  • We pass to rename_with():
    • .data A data frame whose columns we wish to rename, which here is df.
    • .fn A function or an anonymous function (a one-sided formula) will be applied to transform the current names to the desired names. See Function Specification for Implicit Renaming and String Operations.
    • .cols Column selection (Tidy Select) logic that specifies the columns that we wish to rename., which here is c(2, 3). See Basic Selection for a coverage of explicit column selection scenarios, all of which can be used to select columns for implicit transformation.
  • The output is a data frame df_2 is a copy of the input data frame with the renaming logic applied only to the selected columns.

Implicit Selection

We wish to rename a subset of the columns of the data frame. We wish to select that subset of columns implicitly; i.e. we do not spell out the column names or positions explicitly but rather identify the columns via a property of their name or their data.

In this example, we wish to add the suffix ‘_lgl’ to all columns whose data type is logical.

df_2 = df %>% 
  rename_with(~str_c(.x, '_lgl'), 
              where(is.logical))

Here is how this works:

  • We use rename_with() to carry out implicit column renaming as described in Explicit Selection above.
  • We pass to the second argument of rename_with(), the expression where(is.logical) to select all columns whose data type is logical. See Implicit Selection for coverage of common scenarios of implicit column selection including by name pattern, data type, and criteria satisfied by the column’s data.

Exclude Columns

We wish to apply a function to rename all but a set of columns.

In this example, we wish to add the prefix ‘attr_’ to all columns except columns whose current name includes the string ‘_id’.

df_2 = df %>% 
  rename_with(~str_glue('attr_{.x}'), 
              !contains('_id'))

Here is how this works:

  • We use rename_with() to carry out implicit column renaming as described in Explicit Selection above.
  • We pass to the second argument of rename_with(), the expression !contains('_id') to identify the columns we wish to exclude by name. See Exclude Columns for a coverage of column exclusion scenarios, all of which can be used for implicit renaming.
R
I/O