Quite often we need to apply data transformation operations to sub data frames or groups of a data frame individually. We call this Grouped Transformation. A common grouped transformation scenario is replacing missing values with the mean or the median for the group. Another common grouped transformation scenario is scaling the data by subtracting the group’s mean and dividing by the group’s standard deviation.
df_2 = df %>%
group_by(col_1) %>%
mutate(
col_4 = (col_2 - mean(col_2)) / sd(col_2),
col_5 = max(col_3),
col_6 = col_2 - mean(col_3),
col_7 = max(col_4) - max(col_5)
)
Here is how this works:
group_by()
prior to executing mutate()
.mutate()
, when called after group_by()
, would be executed for each group individually.mutate()
, we can carry out all the typical data transformation scenarios, such as those we cover in Basic Transformation. In particular:col_5 = max(col_3)
, the scalar value will be replicated as many times as the number of rows in the group.mutate()
statement as inputs to data transformation expressions.