In its simplest form, a typical data aggregation operation involves three components: (1) One or
more grouping columns. (2) One or more aggregated columns, and (3) One or more aggregating
functions. Data aggregation in SQL is carried out by using GROUP BY like so:
SELECT col_1, MIN(col_2) col_2_min, SUM(col_3) col_3_sum
FROM table_1
GROUP BY col_1;
At a high level, there are two situations where we need to perform data aggregation:
When we wish to perform an individual aggregation operation, often in an interactive
inspection setting, e.g. we wish to obtain the sum of the values of one particular numeric
column.
When we wish to reduce an input table into a summary table i.e. a table where the columns
correspond to summary operations performed on the input table.
Moreover, a data aggregation operation is either performed on:
An entire table, typically generating a single scalar value.
A grouped table, typically generating a vector containing one summary value per group.
Following the above distinctions, this section is organized as follows:
Summary Table where we cover how to
carry out multiple data aggregation operations on a table, that is typically grouped by one or
more variables, and return a summary table.
Common Operations where we
cover some of the most common data aggregation operations for each data type e.g. sum and mean for
numeric columns, and string concatenation for string columns.