Columns Selection

We wish to identify the columns on each of which we will apply data transformation logic.

We will cover the following scenarios

  • All Columns where we cover how to apply a data transformation operation to all columns of a data frame.
  • Explicit Selection where we cover how to apply a data transformation operation to each of a set of explicitly selected columns of a data frame e.g. by spelling out their names.
  • Implicit Selection where we cover how to apply a data transformation operation to each of a set of implicitly selected columns of a data frame e.g. by selecting columns whose names contain a certain substring.
  • Exclude Columns where we cover how to apply a data transformation operation to each column of a data frame but a set of excluded columns.

This section is complemented by

  • Function Specification where we cover how to specify one or more data transformation operations to apply to the selected set of columns
  • Output Naming where we cover how to specify the name(s) of output column(s) created by the implicit data transformation operations.

All Columns

We wish to apply a data transformation operation to each column of a data frame.

In this example, we have a data frame df of numeric columns, and we wish to round all columns.

df_2 = df %>%
  mutate(across(everything(), round))

Here is how this works:

  • We pass the data frame df to the function mutate().
  • Inside mutate() we use across() as follows:
    • The first argument to across() is a selection of columns. In this example, we use everything() to select all columns.
    • The second argument to across() is the data transformation expression that we wish to apply to each column selected in the first argument. In this case the data transformation we wish to apply is the function round().
  • The function round() is applied to each column of the data frame df as selected via everything().
  • The output columns overwrite the original columns. See Output Naming for how to append new columns instead.

Explicit Selection

We wish to apply a data transformation operation to each of a set of explicitly selected columns.

In this example, we wish to apply the round() function to columns col_1, col_2, and col_4 of a data frame df.

df_2 = df %>%
  mutate(across(c(col_1, col_2, col_4), round))

Here is how this works:

  • We use across() inside mutate() to carry out implicit data transformation as described in the “All Columns” scenario above.
  • In c(col_1, col_2, col_4) we identify the columns we wish to select by name. See Basic Selection for a coverage of explicit column selection scenarios, all of which can be used to select columns for implicit transformation.

Implicit Selection

We wish to apply a data transformation operation to each of a set of implicitly selected columns. Implicit column selection is when we do not spell out the column names or positions explicitly but rather identify the columns via a property of their name or their data.

In this example, we wish to apply the round() function to each column of the data frame df of a double data type.

df_2 = df %>%
  mutate(across(where(is.double), round))

Here is how this works:

  • We use across() inside mutate() to carry out implicit data transformation as described in the “All Columns” scenario above.
  • We use where(is.double) to select all columns whose data type is double. See Implicit Selection for a coverage of the most common scenarios of implicit column selection including by name pattern, data type, and Criteria satisfied by the column’s data.

Exclude Columns

We wish to apply a data transformation operation to all but a set of columns.

In this example, we wish to apply the round() function to each column of the data frame df except the columns col_1 and col_2.

df_2 = df %>%
  mutate(across(!c(col_1, col_2), round))

Here is how this works:

  • We use across() inside mutate() to carry out implicit transformation as described in the “All Columns” scenario above.
  • In !c(col_1, col_2) we identify the columns we wish to exclude by name. See Exclude Columns for a coverage of column exclusion scenarios, all of which can be used for implicit transformation.
R
I/O