Filter Groups By Column Value

We have a grouped data frame, and we wish to apply the row filtering logic to each group separately.

In this example, we have a data frame df that is grouped by the column col_1 and we wish to filter rows where the value of the column col_2 is greater than the value of the mean of col_2 for the group.

df_2 = (df
 .groupby('col_1')
 .apply(lambda x: x.loc[x['col_2'] > x['col_2'].mean()]))

Here is how this works:

loc[] can’t be applied to DataFrameGroupBy object. If we attempt to apply it, we get a no attribute error.
In order to apply a filter to each group of a grouped data frame (a DataFrameGroupBy object), the preferred approach is to use apply().
Inside apply() we use a lambda function to which each group is passed as a DataFrame (in this example referred to as x).
We can then apply our filtering logic to each group (a DataFrame) via loc[] like we would filter any regular DataFrame.
In this example, we compare the value of x['col_2'] to the mean of x['col_2'] for the group which is computed via x['col_2'].mean().
For each group, rows where the value of col_2 is larger than the mean value of col_2 yield True and are returned by loc[].
The rows returned from each group are then appended together in the same order as the groups and the resulting DataFrame is returned.

Alternatively:

We can perform grouped filtering by performing a grouped transformation inside loc[].

df_2 = df.loc[
    df['col_2'] >  df.groupby('col_1')['col_2'].transform('mean')]

Here is how this works:

We use groupby() and transform() to perform a grouped transformation where we compute the mean for each group.
transform() repeats the value produced for each group as many times as there are rows in each group. The result is a Series with the same number of rows as the original data frame (in this example df) where the value is the same for each group. See Grouped Transformation.
We compare that resulting Series of mean group values with col_2 and return True where col_2 takes a value greater than the mean value for the group. The corresponding columns are then returned by loc[].
This solution may be an appropriate approach when we wish to apply a filter in a grouped manner while we do not necessarily have to start from a data frame that is already grouped; i.e. we do not have to start with a DataFrameGroupBy object.

Optima.io Reference beta

Filter Groups By Column Value