We wish to carry out data aggregation operations conditionally. In other words, the data aggregation operation we wish to carry out is not simply applied to all rows, rather rows or groups are treated differently based on one or more conditions.
In this section, we will cover the following conditional aggregation scenarios:
We wish to include only a subset of a column’s values in the aggregation operation
In this example, we wish to summarize the table table_1
grouped by the values of the
column col_1
by computing two data aggregations: (1) The sum of positive values of col_2
for
each group and (2) the sum of values of col_2
for the rows in each group where the values in
the col_3
column are equal to "b"
.
SELECT col_1,
SUM(CASE WHEN col_2 > 0 THEN col_2 END) col_2_s1,
SUM(CASE WHEN col_3 = 'b' THEN col_2 END) col_2_s2,
FROM table_1
GROUP BY col_1;
Here is how this works:
GROUP BY
accepts one or more columns and then assigns the rows of the table to a group
according to the values of the grouping column(s).
See Summary Table.CASE
expression is being used to determine which rows should be included in each sum.CASE
expression to subset the values of the column being aggregated i.e. to select
only the values that meet a particular condition:WHEN col_2 > 0
, we select only the positive values of column col_2
.WHEN col_3 = 'b'
, we select only the values of column col_2
that have corresponding
values of b
in the column col_3
.THEN
is used to return the value we want to aggregate on based on the condition.We wish carry out data aggregation operations on a subset of the rows of each group.
In this example, we wish to summarize the table table_1
grouped by the values of the
column col_1
by computing two data aggregations: (1) The sum of the values of the column col_3
for the rows where the column col_2
takes its three largest values and (2) the mean of the values
of the column col_4
also for the rows where the column col_2
takes its three largest values.
WITH table_2 AS (SELECT *
FROM table_1
QUALIFY ROW_NUMBER() OVER
(PARTITION BY col_1 ORDER BY col_2 DESC ) <= 3)
SELECT col_1, SUM(col_3) col_3_sum, AVG(col_4) col_3_avg
FROM table_2
GROUP BY col_1;
Here is how this works:
col_2
has its 3 largest
values using QUALIFY ROW_NUMBER()
.
See Filtering Groups by Rank.ROW_NUMBER()
is a window function and cannot be used with a GROUP BY
, we do the
operation over two steps.GROUP BY
.col_3_sum
are computed by using the SUM()
function to compute the sum of the
values in column col_3
for each group.col_4_mean
are computed by using the AVG()
function to compute the mean of
the values in column col_4
for each group.We wish to use a pick the aggregation function to use for a group based on a condition and return the outputs for the groups as a single summary column.
In this example, we wish to summarize the table table_1
grouped by the values of the
column col_1
by computing a data aggregation operation that depends on the size of the group. If
the group size is less than 3 we return 0.5 times the group size but if the group size is greater
than 3 we return the sum of 0.1 times the values of the column col_2
.
SELECT col_1,
CASE
WHEN COUNT(1) < 3 THEN 0.5 * COUNT(1)
ELSE SUM(0.1 * col_2) END AS col_2_s
FROM table_1
GROUP BY col_1;
Here is how this works:
GROUP BY
accepts one or more columns and then assigns the rows of the table to a group
according to the values of the grouping column(s).
See Summary Table.CASE
expression to decide on the appropriate aggregation function to execute for each
group according the value of the condition as follows:0.5 * COUNT(1)
,
where COUNT(1)
is the number of rows in the group.SUM(0.1 * col_2)
, which
is the sum of the values in the col_2
column multiplied by 0.1.CASE
expression and other constructs for conditional transformation
in Conditional Transformation.We wish to conditionally alter the data and then carry out an aggregation operation on the altered data.
In this example, we wish to summarize the table table_1
grouped by the values of the
column col_1
by computing the weighted sum of the values of the column col_2
where we wish to
have values smaller than 10 be given a weight of 2.
SELECT col_1,
SUM(CASE
WHEN col_2 < 10 THEN col_2 * 2
ELSE col_2 END) col_2_s
FROM table_1
GROUP BY col_1
Here is how this works:
GROUP BY
accepts one or more columns and then assigns the rows of the table to a group
according to the values of the grouping column(s).
See Summary Table.CASE
expressoin to check if each value is less than 10. If it is, the value is multiplied
by 2. Otherwise, the original value is used. We cover this in detail
in Conditional Transformation.SUM()
method is called to sum up all the values in the group (after they have been
multiplied by 2 or not), and this sum is returned as the value for that group.