Quite often we need to apply data transformation operations to groups of a table individually. We call this Grouped Transformation. A common grouped transformation scenario is replacing missing values with the mean or the median for the group. Another common grouped transformation scenario is scaling the data by subtracting the group’s mean and dividing by the group’s standard deviation.
WITH col_stats AS
(SELECT col_10,
AVG(col_1) AS col_1_avg,
STDDEV(col_1) AS col_1_std,
MAX(col_3) AS col_3_max,
AVG(col_3) AS col_3_avg
FROM refcon.dataset.table_1
GROUP BY col_10)
SELECT t.*,
SAFE_DIVIDE(col_1 - col_1_avg,
col_1_std) AS col_13,
col_3_max AS col_14,
col_6 - col_3_avg AS col_15
FROM refcon.dataset.table_1 t
LEFT JOIN col_stats s ON s.col_10 = t.col_10;
Here is how this works:
col_stats
we group by col_10
and calculate all the columns we will use in our
transformation.LEFt JOIN
on col_10
to combine col_stats
with table_1
.SELECT
statement, we can carry out all the typical data transformation scenarios,
such as those we cover
in Basic Transformation. In particular:MAX(col_3) AS col_3_max
, the scalar
value will be replicated as many times as the number of rows in the group.