Filter a grouped data frame to keep only the groups that meet certain criteria. Note that in this scenario, we are interested in filtering entire groups and not individual rows.

In this example, we wish to keep only groups that have more than one row and where the sum of values of the column `col_2`

is above 0 for the group.

```
df_2 = (df
.groupby('col_1')
.filter(lambda x: (len(x) > 1) & (sum(x['col_2']) > 0)))
```

Here is how this works:

- We use the method
`filter()`

of`DataFrameGroupBy`

objects to filter groups. `filter()`

expects a function that when applied to each group (a sub data frame), returns`True`

or`False`

.- Groups for which the function returns
`True`

are retained and groups for which the function returns`False`

are dropped. - In the code above we pass the filtering conditional expression as a
`lambda`

function to`filter()`

. - In
`len(x) > 1`

, we check if the current group has more than 1 row. - In
`sum(x['col_2']) > 0`

, we check if the sum of values of`col_2`

is greater than 0 for the current group. - Groups for which both conditions are
`True`

, i.e. that have more than 1 row and that have a sum of values for`col_2`

greater than 0, are retained (included in the output).

*Alternatively:*

We can perform grouped filtering by performing a grouped transformation inside `loc[]`

.

```
g_df = df.groupby('col_1')
df.loc[g_df.transform('size').gt(1) &
g_df['col_2'].transform('sum').gt(0)]
```

Here is how this works:

- We use
`groupby()`

and`transform()`

to perform a grouped transformations which we then apply logical conditions to inside`loc[]`

. `transform()`

repeats the value produced for each group as many times as there are rows in each group. The result is a`Series`

with the same number of rows as the original data frame (in this example`df`

) where the value is the same for each group.- In
`g_df = df.groupby('col_1')`

, we group the data frame`df`

by the values of the column`col_1`

. We store the output`DataFrameGroupBy`

object in the object`g_df`

to use later and spare ourselves the code repetition. - In
`g_df.transform('size').gt(1)`

we compute the size of each group of the grouped data frame`g_df`

and compare that with the value 1. This expression returns`True`

for rows belonging to groups where the group size is greater than 1. - In
`g_df['col_2'].transform('sum').gt(0)`

we compute the sum of values of the column`col_2`

for each group of the grouped data frame`g_df`

and compare that with the value 0. This expression returns`True`

for rows belonging to groups where the sum of values of the column`col_2`

is greater than 0. - Rows where both conditions are
`True`

, i.e. that belong to groups with a number of rows more than 1 and where the sum of values for`col_2`

is greater than 0, are retained (included in the output). - This may be an appropriate approach when we wish to apply a filter in a grouped manner while we do not necessarily have to start from a data frame that is already grouped; i.e. we do not have to start with a
`DataFrameGroupBy`

object.

PYTHON