We wish to carry out data transformation implicitly on a grouped data frame. Implicit data transformation allows us to apply one or more data transformation expressions to a one or more columns without having to spell out each transformation explicitly.
In this example, we wish to scale each numeric column in the data frame df
by subtracting the group’s mean and
dividing by the group’s standard deviation where groups are defined by the value of the column col_1
.
df_2 = df %>%
group_by(col_1) %>%
mutate(across(where(is.numeric),
~((. - mean(.))/(sd(.))),
.names = "{.col}_s"))
Here is how this works:
group_by()
before applying mutate(across(..))
.mutate()
we use across()
as follows:across()
is a selection of columns. In this example, we use where(is.numeric)
to select all columns of a numeric data type.across()
is the data transformation expression that we wish to apply to each column selected in the first argument. In this case the data transformation we wish to apply is the anonymous
function ~((. - mean(.))/(sd(.)))
which subtracts from each value for the current column the mean for the
current column for the current group and divides by the standard deviation of the same.across()
is the template to use to name the output columns "{.col}_s"
where {.col}
is the input column name.group_by()
before mutate()
.