Grouped Transformation

We wish to carry out data transformation implicitly on a grouped data frame. Implicit data transformation allows us to apply one or more data transformation expressions to a one or more columns without having to spell out each transformation explicitly.

In this example, we wish to scale each numeric column in the data frame df by subtracting the group’s mean and dividing by the group’s standard deviation where groups are defined by the value of the column col_1.

df_2 = df %>%
  group_by(col_1) %>%
  mutate(across(where(is.numeric), 
                ~((. - mean(.))/(sd(.))),
                .names = "{.col}_s"))

Here is how this works:

  • To execute grouped transformations implicitly we simply call group_by() before applying mutate(across(..)).
  • Inside mutate() we use across() as follows:
    • The first argument to across() is a selection of columns. In this example, we use where(is.numeric) to select all columns of a numeric data type.
    • The second argument to across() is the data transformation expression that we wish to apply to each column selected in the first argument. In this case the data transformation we wish to apply is the anonymous function ~((. - mean(.))/(sd(.))) which subtracts from each value for the current column the mean for the current column for the current group and divides by the standard deviation of the same.
    • The third argument to across() is the template to use to name the output columns "{.col}_s" where {.col} is the input column name.
  • All the implicit data transformation scenarios covered in Column Selection , Function Specification, and Output Naming can be extended to a grouped scenario by adding group_by() before mutate().
R
I/O