We wish to aggregate data in a way that can only be produced by carrying out multiple aggregation operations in sequence.
An example is computing the monthly average of number of daily events. To do that we would need to carry out to aggregations in sequence:
We will cover two common scenarios of sequential aggregation:
We wish to aggregate data in a way that can only be produced by carrying out multiple aggregation operations in sequence.
In this example, for each value of the column col_1
in the table table_1
, we wish to obtain the
average number of rows and average sum of values of col_3
for groups defined by the combinations
of col_1
and col_2
.
WITH table_2 AS
(SELECT col_1,
col_2,
COUNT(1) AS row_count,
SUM(col_3) AS col_3_sum
FROM table_1
GROUP BY col_1, col_2)
SELECT col_1,
AVG(row_count) avg_count,
AVG(col_3_sum) avg_value
FROM table_2
GROUP BY col_1;
Here is how this works:
col_1
and col_2
table_1
by the values in the col_1
and col_2
columns using
the GROUP BY()
.col_3
column in table_2 CTE.col_1
and col_2
, and two columns row_count
and col_3_sum
.col_1
col_1
.col_1
and two columns avg_count
and avg_value
.We wish to compute a lower resolution aggregation of a higher resolution aggregation of data.
In this example, we wish to aggregate the table table_1
over a date time column col_1
such that
for each day, we obtain the average hourly number of rows and average hourly sum of values of the
column col_2
.
WITH table_2 AS
(SELECT DATETIME_TRUNC(col_1, HOUR) AS col_1_hour,
COUNT(1) hour_count,
SUM(col_2) AS hour_value,
FROM table_1
GROUP BY DATETIME_TRUNC(col_1, HOUR))
SELECT DATETIME_TRUNC(col_1_hour, DAY) AS col_1_day,
AVG(hour_count) AS avg_count,
AVG(hour_value) AS avg_count,
FROM table_2
GROUP BY DATETIME_TRUNC(col_1_hour, DAY);
Here is how this works:
table_1
by the values in the column col_1
, rounded to the
nearest hour using the DATETIME_TRUNC(datetime, HOUR)
function.GROUP BY
caluse as we cannot refer to newly created columns
within the same step.col_2
.col_1
column, and the values in the hour_count
and hour_value
columns are the count
and the sum of values of the column col_2
for each hour, respectively.avg_count
and avg_value
.