Summary Table

We wish to carry out multiple data aggregations on an input table and return an output summary table. The output summary table would have one column for each data aggregation operation in addition to one column for each grouping variable (if any). As for rows, the output summary table will have one row for each combination of values of grouping variables or just one row if we are summarizing a table that is not grouped.

In this example, we wish to produce a grouped summary of a table table_1. The grouping variables are the columns col_1 and col_2.

SELECT col_1,
       col_2,
       MIN(col_3)                         col_3_min,
       MAX(col_3)                         col_3_max,
       SUM(col_4)                         col_4_sum,
       SUM(col_3 * col_4) / SUM(col_4) AS w_mean_3_4

FROM table_1
GROUP BY col_1, col_2;

Here is how this works:

  • GROUP BY accepts one or more columns and then assigns the rows of the table to a group according to the values of the grouping column(s).
  • GROUP BY col_1, col_2 groups the table table_1 by two columns col_1 and col_2.
    • In general, we could group by any number of columns. See Grouping.
  • The output summary table would have
    • the following columns:
      • The grouping columns col_1 and col_2
      • The aggregation outputs col_3_min, col_3_max, col_4_sum,and w_mean_3_4.
    • As for rows, the output summary table will have one row for each combination of values of grouping variables col_1 and col_2.
  • In this example, we carried out a few basic data aggregation operations such as MAX() and SUM(). See Common Aggregation Operations for a coverage of the most common data aggregation operations.
  • The data aggregation operations that we use can involve one or more columns. For instance, in this example, we calculate the weighted mean that takes two columns col_3 and col_4 as input.
SQL
I/O