Basic Aggregation

In its simplest form, a typical data aggregation operation involves three components: (1) One or more grouping columns. (2) One or more aggregated columns, and (3) One or more aggregating functions. Data aggregation in SQL is carried out by using GROUP BY like so:

SELECT col_1, MIN(col_2) col_2_min, SUM(col_3) col_3_sum
FROM table_1
GROUP BY col_1;

At a high level, there are two situations where we need to perform data aggregation:

  1. When we wish to perform an individual aggregation operation, often in an interactive inspection setting, e.g. we wish to obtain the sum of the values of one particular numeric column.
  2. When we wish to reduce an input table into a summary table i.e. a table where the columns correspond to summary operations performed on the input table.

Moreover, a data aggregation operation is either performed on:

  1. An entire table, typically generating a single scalar value.
  2. A grouped table, typically generating a vector containing one summary value per group.

Following the above distinctions, this section is organized as follows:

  • Individual Aggregation where we cover how to carry out one aggregation operation on an entire table or on a grouped table.
  • Summary Table where we cover how to carry out multiple data aggregation operations on a table, that is typically grouped by one or more variables, and return a summary table.
  • Common Operations where we cover some of the most common data aggregation operations for each data type e.g. sum and mean for numeric columns, and string concatenation for string columns.
SQL
I/O