Aggregating

One of the most common data manipulation tasks is data aggregation where, in its simplest form, we carry out operations that reduce multiple values of a column into a single value e.g. compute the sum of values of a numeric column. Typically, we perform data aggregation on mutually exclusive groups of rows then combine the results vertically into a new column or table.

In its simplest form, a typical data aggregation operation in SQL looks like so:

SELECT col_1, MIN(col_2) col_2_min, SUM(col_3) col_3_sum
FROM table_1
GROUP BY col_1;

where we use GROUP BY to create mutually exclusive groups of rows, and then we carry out data aggregation operations on each group.

This section is organized as follows:

  • Basic Aggregation where we cover how to perform basic data aggregation operations in two contexts: (1) Individual aggregation operations typically as part of interactive data inspection and (2) Generating a Summary Table by carrying out multiple data aggregation operations on a table.
  • Conditional Aggregation where we cover how to carry out data aggregation operations that are not simply applied to all rows, rather rows or groups are treated differently based on one or more conditions. For example, compute the sum of values of a column where another column has a particular value.
  • Sequential Aggregation where we cover how to aggregate data in a way that can only be produced by carrying out multiple aggregation operations in sequence. For example, compute the monthly average of number of daily events.
  • Grouping where we cover aspects of groupings and working with groups including grouping by a function of existing columns, order of groups, and iterating over groups.
  • Multi-Value Aggregation where we cover data aggregation operations that return multiple values (as opposed to the common scenario of a data aggregation operation returning a single scalar value).
  • Pivot Aggregation where we cover generating data summaries in the form of pivot tables. Oftentimes, a pivot table organization makes it easier to understand and draw insights from a data summary.
SQL
I/O