Individual Aggregation

We wish to carry out an individual data aggregation operation e.g. we wish to obtain the sum of the values of a particular numeric column. One common situation where quick individual data aggregation is needed is during interactive inspection of data.

We will cover two scenarios:

  1. Performing a data aggregation operation on an entire table (not grouped) reducing one or more columns to a single scalar value.
  2. Performing a data aggregation operation on a grouped table reducing one or more columns into a set of values one for each group.

This section is complemented by:

Table

We wish to perform one data aggregation operation on a table that is not grouped.

One Column Function

We wish to perform one data aggregation operation on a column of a table (that is not grouped) reducing that one column to a single scalar value.

In this example we wish to compute the sum of the column col_2 of the table table_1.

SELECT  SUM(col_2) col_2_sum
FROM table_1;

Here is how this works:

  • We use the function SUM() to calculate the sum of the values in the col_2.
  • The output is a table with one value. The table has one column called col_2_sum and one row holding the sum of the values of the column col_2 of the input table table_1.
  • See Common Aggregation Operations for a coverage of the most common data aggregation operations.

Multiple Column Function

We wish to perform an aggregation operation on an entire table (that is not grouped) reducing multiple columns to a single scalar value.

In this example, we wish to compute the weighted mean of the column col_2 where the weights are given by the column col_3 of the table table_1.

SELECT SUM(col_2 * col_3) / SUM(col_3) AS w_mean_2_3
FROM table_1

Here is how this works:

  • First we calculate the sum of the product of col_2 and col_3 for each row in the table SUM(col_2 * col_3). This is the numerator of the weighted mean formula.
  • Then, we calculate the sum of the values in the col_3 column SUM(col_3), which is the denominator of the weighted mean formula.
  • Finally, it divides the numerator by the denominator to calculate the weighted mean. The result is returned as a new column named w_mean_2_3.
  • See Common Aggregation Operations for a coverage of the most common data aggregation operations.

Grouped Table

We wish to perform one data aggregation operation on a grouped table.

One Column Function

We wish to perform an aggregation on a grouped table. The aggregation operation we wish to perform is a function of one column.

In this example, we wish to obtain the sum of the value of the column col_2 for each group where the groups are defined by the column col_1.

SELECT col_1, SUM(col_2) col_2_sum
FROM table_1
GROUP BY col_1;

Here is how this works:

  • GROUP BY accepts one or more columns and then assigns the rows of the table to a group according to the values of the grouping column(s). See Grouping for a coverage of common grouping scenarios.
  • We use function SUM() to compute the sum of the values of the column col_2 (for each group). See Common Aggregation Operations for a coverage of the most common aggregation operations.
  • The output is a table with one column called col_2_sum and as many rows as there are groups i.e. as many rows as the number of unique values of the grouping column col_1. Each row of the output table holds the sum of the values of the column col_2 of the input table table_1 for the corresponding group.

Multiple Column Function

We wish to perform an aggregation on a grouped table. The aggregation operation we wish to perform is a function of multiple columns.

In this example, we wish to compute the weighted mean of the column col_2 where the weights are given by the column col_3. We wish to perform the aggregation for each group where the groups are defined by the column col_1.

SELECT col_1, SUM(col_2 * col_3) / SUM(col_3) AS w_mean_2_3
FROM table_1
GROUP BY col_1;

Here is how this works:

  • GROUP BY accepts one or more columns and then assigns the rows of the table to a group according to the values of the grouping column(s). See Grouping for a coverage of common grouping scenarios.
  • We calculate the sum of the product of col_2 and col_3 for each row in the table SUM(col_2 * col_3). This is the numerator of the weighted mean formula.
  • Then, we calculate the sum of the values in the col_3 column SUM(col_3), which is the denominator of the weighted mean formula.
  • Finally, it divides the numerator by the denominator to calculate the weighted mean. The result is returned as a new column named w_mean_2_3.
  • See Common Aggregation Operations for a coverage of the most common data aggregation operations.
SQL
I/O