Implicit Aggregation

At times the data aggregation we wish to perform involves applying the same data aggregation operation to multiple columns. Implicit Aggregation is a data manipulation pattern that allows us to succinctly apply one or more data aggregation expressions to a selected set of columns without having to spell out each operation explicitly.

In its simplest form, a typical implicit data aggregation expression looks like so:

df_2 = df.groupby('col_1')[['col_2', 'col_3']].agg('sum')

where we group the data frame in the desired way, then select the columns that we wish to aggregate and then use agg() to execute one or more data aggregation operations on each of the selected columns.

This section is organized to cover the aspects of Implicit data aggregation as follows:

  1. Column Selection where we cover how to select the column(s) to each of which we will implicitly apply data aggregation operations.
  2. Function Specification where we cover how to specify the data aggregation expressions to apply to each of the selected columns.
  3. Output Naming where we cover how to specify the name(s) of output column(s) created by the implicit data aggregation operations.
PYTHON
I/O