For each group in a grouped data frame, we wish to filter rows by their position (row number) in the group.

This section is organized to cover the most common "filtering groups by row position" scenarios as follows:

**First**where we cover how to return the first row from each group.**Last**where we cover how to return the last row from each group.**nth**where we cover how to return the nth row from each group.**Head**where we cover how to return the top most rows, defined by a number of rows or a proportion of rows, from each group.**Tail**where we cover how to return the bottom most rows, defined by a number of rows or a proportion of rows, from each group.**Range**where we cover how to return a range of rows, defined by a start and an end position, from each group.**List**where we cover how to return specific rows from each group.**Random**where we cover how to return some random rows from each group.

We wish to return the first row from each group.

In this example, we wish to return the first row of each group where the groups are defined by the column `‘col_1'`

and
the rows are sorted by the column `‘col_2'`

.

```
df_2 = (df
.sort_values(by='col_2')
.groupby('col_1', as_index=False)
.first())
```

Here is how this works:

`first()`

, when called on the output of`groupby()`

(i.e. a`DataFrameGroupBy`

object), returns the first row from each group.- The notion of “first” is relative to whatever the current sorting of the rows is. Typically, we need to sort the rows
in a particular way before we pull the first row. In the code above, we do that via
`soft_values()`

which sorts the rows of each group in ascending order of the values of the column`‘col_2'`

(see Sorting). Should the existing sorting be suitable for the task at hand, we can omit the call to`sort_values()`

. `groupby()`

preserves the order of rows within each group so grouping after sorting maintains the sorting.- Specifying
`as_index=False`

in`groupby()`

keeps the original index, i.e. doesn't convert the grouping column to an index (see Aggregating). Please omit if that is not the desired behavior.

*Alternatively:*

```
df_2 = (df
.sort_values(by='col_2')
.groupby('col_1', as_index=False)
.apply(lambda x: x.iloc[0]))
```

Here is how this works:

`iloc[]`

is the go-to indexer in Pandas for filtering rows by their position (purely integer based indexing).`iloc[]`

can’t be applied directly to the output of`groupby()`

(i.e. can’t be applied to a`DataFrameGroupBy`

object) . To get around that we use`apply()`

to apply`iloc[]`

to each group (sub data frame).- In lambda
`x: x.iloc[0]`

we extract the row at position zero (i.e. the first row) from each group.

We wish to return the last row from each group.

In this example, we wish to return the last row of each group where the groups are defined by the column `‘col_1'`

and
the rows are sorted by the column `‘col_2'`

.

```
df_2 = (df
.sort_values(by='col_2')
.groupby('col_1', as_index=False)
.last())
```

Here is how this works:

The code works similarly to the code in “First” scenario above except that we replace `first()`

with `last()`

.

*Alternatively:*

```
df_2 = (df
.sort_values(by='col_2')
.groupby('col_1', as_index=False)
.apply(lambda x: x.iloc[-1]))
```

Here is how this works:

- This code works similarly to the alternative solution under “First” above.
- The ****
`iloc[]`

indexer accepts negative positions to refer to row positions relative to the end of the data frame.

We wish to return the nth row from each group.

**From Top**

In this example, we wish to return the second row of each group where the groups are defined by the column `‘col_1'`

and
the rows are sorted by the column `‘col_2'`

.

```
df_2 = (df
.sort_values(by='col_2')
.groupby('col_1', as_index=False)
.nth(1)
.reset_index(drop=True))
```

Here is how this works:

- This code works similarly to the code in the “First” scenario above except that we replace
`first()`

with`nth()`

. `nth()`

is a versatile function that allows us to refer to rows by their integer position in groups.- Regarding the row index:
- Specifying
`as_index=False`

in`groupby()`

keeps the original index, i.e. doesn't convert the grouping column to an index (see Aggregating). Please omit if that is not appropriate for your situation. - We get a patchy index i.e. rows have their original index value. To get a fresh index, we
add
`.reset_index(drop=True)`

to the chain. Please omit if that is not appropriate for your situation.

- Specifying

*Alternatively:*

```
df_2 = (df
.sort_values(by='col_2')
.groupby('col_1', as_index=False)
.apply(lambda x: x.iloc[1]))
```

Here is how this works:

This code works similarly to the alternative solution under “First” above except that we use `iloc[1]`

to extract the
second row from each group.

**From Bottom**

In this example, we wish to return the row before the last of each group where the groups are defined by the
column `‘col_1'`

and the rows are sorted by the column `‘col_2'`

.

```
df_2 = (df
.sort_values(by='col_2')
.groupby('col_1', as_index=False)
.nth(-2))
```

Here is how this works:

- This code works similarly to the code in the “From Top” solution above.
`nth()`

accepts negative positions to refer to row positions relative to the end of the data frame.

*Alternatively:*

```
df_2 = (df
.sort_values(by='col_2')
.groupby('col_1', as_index=False)
.apply(lambda x: x.iloc[-2]))
```

Here is how this works:

This code works similarly to the alternative solution under “First” above except that we use `iloc[-2]`

to extract the
second row from the end in each group.

We wish to return the top n rows from each group (which are often referred to as the *head*).

**Count**

We wish to return a specific number of rows from the top of each group.

In this example, we wish to return the top two rows from each group where the groups are defined by the column `‘col_1'`

and the rows are sorted by the column `‘col_2'`

.

```
df_2 = (df
.sort_values(by='col_2')
.groupby('col_1')
.head(n=2)
.reset_index(drop=True))
```

Here is how this works:

- We apply
`head()`

to the output of`groupby()`

(i.e. a`DataFrameGroupBy`

object) to return the top rows of each group. - Since we are looking to return the top two rows from each group we set the argument
`n`

of`head()`

to`n=2`

. - We get a patchy index i.e. rows have their original index value. To get a fresh index, we
add
`.reset_index(drop=True)`

to the chain. Please omit if that is not appropriate for your situation.

*Alternatively:*

In cases where we start out with a grouped data frame (a `DataFrameGroupBy`

object) and wish to sort rows by one or more
columns and then take the top n rows.

```
df_2 = (df
.groupby('col_1')
.apply(lambda x: (x
.sort_values('col_2', ignore_index=True)
.head(n=2)))
.reset_index(drop=True))
```

Here is how this works:

- Using
`apply()`

we apply a lambda function to each group. - In the lambda function, we first sort the rows of the group via
`sort_values('col_2')`

and then extract the top 2 rows of the group via`head(n=2)`

. - We get a patchy index i.e. rows have their original index value. To get a fresh index, we
add
`.reset_index(drop=True)`

to the chain. Please omit if that is not appropriate for your situation.

**Proportion**

We wish to return a proportion (percent) of the total number of rows from each group taken from the top of the group.

In this example, we wish to return the top 20% of the rows of each group where the groups are defined by the
column `‘col_1'`

and the rows are sorted by the column `‘col_2'`

.

```
df_2 = (df
.groupby('col_1')
.apply(lambda x: (x
.sort_values('col_2', ignore_index=True)
.head(n=int(len(x) * 0.2))))
.reset_index(drop=True))
```

Here is how this works:

- The
`head()`

function can only accept a number of rows and not a proportion. - To convert the proportion of rows we are after to a number of rows, we use
`int(len(x)*0.2)`

. - In order to access the number of rows in a group
`len(x)`

, we use a lambda function inside`apply()`

. - We get a patchy index i.e. rows have their original index value. To get a fresh index, we
add
`.reset_index(drop=True)`

to the chain. Please omit if that is not appropriate for your situation.

We wish to return the bottom n rows from each group (which are often referred to as the tail).

**Count**

We wish to return a specific number of rows from the bottom of each group.

In this example, we wish to return the bottom two rows from each group where the groups are defined by the
column `‘col_1'`

and the rows are sorted by the column `‘col_2'`

.

```
df_2 = (df
.sort_values(by='col_2')
.groupby('col_1')
.tail(n=2)
.reset_index(drop=True))
```

Here is how this works:

- The code works similarly to the code in “Head / Count” scenario above except that we replace
`head()`

with`tail()`

. - See the Alternative solution under “Head / Count” for an alternative that uses
`iloc[]`

(while replacing`head()`

with`tail()`

).

**Proportion**

We wish to return a proportion (percent) of the total number of rows from each group taken from the bottom of the group.

In this example, we wish to return the bottom 20% of the rows of each group where the groups are defined by the
column `‘col_1'`

and the rows are sorted by the column `‘col_2'`

.

```
df_2 = (df
.groupby('col_1')
.apply(lambda x: (x
.sort_values('col_2', ignore_index=True)
.tail(n=int(len(x) * 0.2))))
.reset_index(drop=True))
```

Here is how this works:

The code works similarly to the code in “Head / Proportion” scenario above except that we replace `head()`

with `tail()`

.

We wish to return a range of rows (also known as a slice), between a given start and end row positions, from each group.

In this example, we wish to return the second through to the second last rows (i.e. drop the first and the last) from
each group where the groups are defined by the column `‘col_1'`

and the rows are sorted by the column `‘col_2'`

.

```
df_2 = (df
.sort_values(by='col_2')
.groupby('col_1', as_index=False)
.nth(slice(1, -1))
.reset_index(drop=True))
```

Here is how this works:

- We use
`nth()`

to extract a range of rows from each group in a grouped data frame (a`DataFrameGroupBy`

object). - To define the range of interest, we use the
`slice()`

function whose syntax is`slice(start, end, step)`

to create a Python`slice`

object. A`slice`

object is used to specify how to slice a sequence. - In
`nth(slice(1,-1))`

we pass a`slice`

object to nth specifying that we wish to extract all rows starting at the second row and ending at the second last row (excluding the first and last rows). - Regarding the row index:
- Specifying
`as_index=False`

in`groupby()`

keeps the original index, i.e. doesn't convert the grouping column to an index (see Aggregating). Please omit if that is not appropriate for your situation. - We get a patchy index i.e. rows have their original index value. To get a fresh index, we
add
`.reset_index(drop=True)`

to the chain. Please omit if that is not appropriate for your situation.

- Specifying

*Alternatively:*

```
df_2 = (df
.groupby('col_1', as_index=False)
.apply(lambda x: (x
.sort_values('col_2', ignore_index=True)
.iloc[1:-1]))
.reset_index(drop=True))
```

Here is how this works:

This code works similarly to the alternative solution under “First” above except that we use `iloc[1:-1]`

to extract the
second through to the second last rows in each group.

We wish to obtain specific rows, given their row numbers, from each group.

In this example, we wish to return the first, second, second last, and last rows of each group of the data frame `df`

where the groups are defined by the column `‘col_1'`

and the rows are sorted by the column `‘col_2'`

.

```
df_2 = (df
.sort_values(by='col_2')
.groupby('col_1', as_index=False)
.nth([0, 1, -2, -1])
.reset_index(drop=True))
```

Here is how this works:

- We group the data frame by one or more columns via
`groupby()`

. - We use
`nth()`

to extract specific rows from each group given their row positions which we do here via`nth([0, 1, -2, -1])`

. - Often times when extracting rows by position, we need to first sort the rows in a particular way. We do so here
via
`sort_values(by='col_2'))`

. The`sort_values()`

function sorts a data frame in ascending order of the column(s) passed to it (See Sorting). - Regarding the row index:
- Specifying
`as_index=False`

in`groupby()`

keeps the original index, i.e. doesn't convert the grouping column to an index (see Aggregating). Please omit if that is not appropriate for your situation. - We get a patchy index i.e. rows have their original index value. To get a fresh index, we
add
`.reset_index(drop=True)`

to the chain. Please omit if that is not appropriate for your situation.

- Specifying
- Note that
`loc[]`

and`iloc[]`

do not work with grouped data frames. Also, regular Pandas data frames do not have a method`nth()`

(only grouped data frames do).

*Alternatively:*

```
df_2 = (df
.groupby('col_1', as_index=False)
.apply(lambda x: (x
.sort_values('col_2', ignore_index=True)
.iloc[0, 1, -2, -1]))
.reset_index(drop=True))
```

Here is how this works:

This code works similarly to the alternative solution under “First” above except that we use `iloc[0, 1, -2, -1]`

to
extract the specific rows of interest from each group.

We wish to return a set of rows taken at random from each group.

**Count**

We wish to return a specific number of rows taken at random positions from each group.

In this example, we wish to return 10 randomly selected rows from each group where the groups are defined by the
column `‘col_1'`

.

```
df_2 = (df
.groupby('col_1')
.sample(n=10, random_state=1234))
```

Here is how this works:

- We use the function
`sample()`

which returns a random set of rows from a data frame. `sample()`

, when called on a grouped data frame (i.e. a`DataFrameGroupBy`

object), operates on the individual groups to return a random set of rows from each group.`sample(n=10)`

returns a random sample of 10 rows since we set`n=10`

. If`n`

is not set,`sample()`

returns 1 row by default.- • It is good practice to always set a seed value when generating random numbers (or using operations that generate
random numbers) to ensure reproducibility of results (i.e. that we would get the same results when we run the same
code later). We do that by here by passing a seed of
`1234`

(but can be any integer) to the`random_state`

argument of`sample()`

.

**Proportion**

We wish to return a proportion (percent) of the total number of rows from each group taken at random positions.

In this example, we wish to return 20% of the rows of each group taken at random positions where the groups are defined
by the column `‘col_1'`

.

```
df_2 = (df
.groupby('col_1')
.sample(frac=0.2, random_state=1234))
```

Here is how this works:

- This works similarly to the “Count” scenario except that we use the
`frac`

argument of`sample()`

instead of the`n`

argument. - Setting
`frac=0.2`

returns a randomly selected set of 20% of the rows of each group.

PYTHON