We wish to get a summary of a numeric column (e.g. the mean and standard deviation) for each group where the groups are defined by the combination of values of two categorical columns.
We wish to compute the mean of a numerical column over groups defined by two categorical columns.
In this example, we wish to compute the mean of the numeric column col_1
for each group where the
groups are defined
by the values of the two categorical columns col_10
and col_12.
BEGIN
DECLARE v_pivot_column STRING;
SET v_pivot_column =
(SELECT STRING_AGG(DISTINCT CONCAT('"', CAST(col_12 AS STRING), '"')),
FROM refcon.dataset.table_1);
EXECUTE IMMEDIATE
FORMAT("""
SELECT *
FROM (SELECT col_10,
col_12,
AVG(col_1) AS col_1_avg
FROM refcon.dataset.table_1
GROUP BY col_10, col_12) PIVOT
(avg(col_1_avg) AS col_12 FOR CAST(col_12 AS STRING) IN (%s));
""", v_pivot_column);
END;
Here is how this works:
"
and then use STRING_AGG DISTINCT
to get the unique
values for
col_12 in the following format :
"0","3"`.v_pivot_column
to pivot col_12
values to columns and keep col_10
as rows.AVG(col_1)
to calculate the mean of col_1 per combination.We wish to compute the sum of a numerical column over groups defined by two categorical columns.
In this example, we wish to compute the sum of the numeric column col_1
for each group where the
groups are defined
by the values of the two categorical columns col_10
and col_12.
BEGIN
DECLARE v_pivot_column STRING;
SET v_pivot_column =
(SELECT STRING_AGG(DISTINCT CONCAT('"', CAST(col_12 AS STRING), '"')),
FROM refcon.dataset.table_1);
EXECUTE IMMEDIATE
FORMAT("""
SELECT *
FROM (SELECT col_10,
col_12,
SUM(col_1) AS col_1_sum
FROM refcon.dataset.table_1
GROUP BY col_10, col_12) PIVOT
(avg(col_1_sum) AS col_12 FOR CAST(col_12 AS STRING) IN (%s));
""", v_pivot_column);
END;
This works similarly to the "Mean" scenario above, except we use SUM
.
We wish to obtain the ratio between the sum of values of a numeric variable for each group to the total sum of values of the numeric variable where the groups are defined by two categorical columns.
In this example, we compute the ratio of the sum of values of a numeric column col_1
for each
group defined
by col_10
and col_12
to the total sum of values of col_1
.
BEGIN
DECLARE v_pivot_column STRING;
SET v_pivot_column =
(SELECT STRING_AGG(DISTINCT CONCAT('"', CAST(col_12 AS STRING), '"')),
FROM refcon.dataset.table_1);
EXECUTE IMMEDIATE
FORMAT("""
SELECT *
FROM (SELECT col_10,
col_12,
SUM(col_1) / (SUM(SUM(col_1)) OVER ()) AS pct
FROM refcon.dataset.table_1
GROUP BY col_10, col_12) PIVOT
(avg(pct) AS col_12 FOR CAST(col_12 AS STRING) IN (%s));
""", v_pivot_column);
END;
This works similarly to the "Mean" scenario above, except we
use SUM(col_1) / (SUM(SUM(col_1)) OVER ())
to calculate
the proportion.
We wish to compute multiple aggregation functions on a numeric column for each group where the groups are defined by two columns.
In this example, we compute sum and mean for a numeric column col_1
for each group where the
groups are defined
by col_10
and col_12
.
SELECT col_10,
col_12,
AVG(col_1) AS col_1_avg,
SUM(col_1) AS col_1_sum
FROM refcon.dataset.table_1
GROUP BY col_10, col_12;
Here is how this works:
GROUP BY
col_10
and col_12
.SUM()
and AVG()
.