Implicit Grouped Transformation

In this section, we cover implicit grouped transformation i.e. applying one or more data transformation expressions to one or more columns of a grouped data frame in a succinct manner without having to spell out each data transformation explicitly.

We will cover two scenarios:

Single Function where we cover how to apply one function or lambda function to multiple columns of a grouped data frame (a DataFrameGroupBy object).
Multiple Functions where we cover how to apply multiple functions or lambda functions to multiple columns of a grouped data frame (a DataFrameGroupBy object).

Single Function

df_n = (df
        .groupby('col_3')
        [df.columns[(df.dtypes == 'float64')]]
        .transform(lambda x: (x - x.mean()) / x.std())
        .add_suffix('_' + 'scaled'))
df_2 = pd.concat([df, df_n], axis=1)

Here is how this works:

We group by 'col_3' in groupby('col_3').
We then select all columns of a double data type via [df.columns[(df.dtypes == 'float64')]]. All implict column selection approaches described Columns Selection for Implicit Transforamtion can be applied.
The selected columns are then each passed individually to transform() which applies the given function to each of them.
If we do not wish to append the new columns to the original data frame, we can skip the concat() step and possibly the add_suffix() step too.

Multiple Functions

def m_transform(df_g, apply_fns):
    df_n = pd.concat([
        df_g
        .transform(fn)
        .add_suffix('_' + fn_name)
        for fn_name, fn in apply_fns.items()], axis=1)
    return df_n

df_n = (df.groupby('col_3')
        [df.columns[(df.dtypes == 'float64')]]
        .pipe(m_transform,
              {'max': max,
               'scaled': lambda x: (x - x.mean()) / x.std()}))
df_2 = pd.concat([df, df_n], axis=1)

Here is how this works:

The transform() method of DataFrameGroupBy objects only accepts one function. It doesn't accept a list of functions.
We defined a custom function m_transform() which takes a grouped dataframe df_g whose columns are the columns we wish to transform and a dictionary apply_fns where the values hold the functions we wish to apply and the keys hold the names we wish to attach to the column names for the corresponding transformation.
The custom function m_transform() returns a dataframe of the same number of rows as the input and with one column for each input column function pair; i.e. if we have two selected columns and two transformations that we wish to apply we would expect the output to have 4 columns.
We pipe the grouped data frame and the function dictionary to m_transform() and store the output in a new data frame df_n.
We use the Pandas pd.contact() function to column bind (hence axis=1) the data frame of new columns df_n returned by m_transform() to the original data frame df.
m_transform() works as follows:
- A dictionary comprehension is used to iterate over the input function dictionary and apply each function to the grouped data frame via df_g.transform(fn).
- The output is a list of data frames corresponding to applying df_g.transform(fn) for each function in apply_fns.
- We use the add_suffix() function to rename the output columns by adding the function name to the column name col_fn.
- We use the Pandas pd.contact() function to column bind the data frames resulting from all transformations.
Worth noting that lambda functions have no name. So whereas for named functions we could use a simple list of named functions, as opposed to a dictionary, then use fn.__name__ to extract the function name and use that as an argument to add_suffix() for renaming the output columns of applying lambda functions, we need to explicitly pass the names to use for the lambda functions.

Alternatively:

df_g = df.groupby('col_3')
selected_columns = df.columns[(df.dtypes == 'float64')]
df_n_1 = (df_g[selected_columns]
          .transform(max)
          .add_suffix('_' + 'max'))
df_n_2 = (df_g[selected_columns]
          .transform(lambda x: (x - x.mean()) / x.std())
          .add_suffix('_' + 'scaled'))
df_2 = pd.concat([df, df_n_1, df_n_2], axis=1)

Here is how this works:

transform() when applied to a grouped data frame can only take a single function at a time. Therefore, we need to call transform() separately for each transformation we wish to carry out.
To avoid redundancy, we start by creating a DataFrameGroupBy object df_g and selecting the target columns selected_columns.
We use the Pandas pd.contact() function to column bind (hence axis=1) the data frames df_n_1 and df_n_2 to the original data frame df.

Optima.io Reference beta

Implicit Grouped Transformation

Single Function

Multiple Functions