For each group in a grouped data frame, we wish to filter rows by the rank of a value of a specific column.

This section is organized as follows:

**Largest**where we cover how to return the row (or rows) where a particular column has its largest value for each group.**Smallest**where we cover how to return the row (or rows) where a particular column has its smallest value for each group.**nth**where we cover how to return the row (or rows) where a particular column has its nth largest or smallest value for each group.**Largest Values**where we cover how to return the rows where a particular column has its largest values, defined by a count or a proportion, for each group.**Smallest Values**where we cover how to return the rows where a particular column has its smallest values, defined by a count or a proportion, for each group.**Range**where we cover how to return the rows where a particular column takes a value whose rank falls in a given range of value ranks for each group.**List**where we cover how to return the rows where a particular column takes a value whose rank falls in a given list of value ranks for each group.

We wish to return the row (or rows) where a particular column has its largest value for each group.

In this example, we wish to return the row (or rows) of each group where the column `col_2`

has its
max value for the group. The groups are defined by the column `col_1`

.

```
df_2 = df %>%
group_by(col_1) %>%
slice_max(col_2)
```

Here is how this works:

`slice_max()`

when called on a grouped data frame, returns the row(s) of each group where the given column takes its largest value.- Since we are interested in the rows where the column
`col_2`

takes its largest value, we use`slice_max(col_2)`

. - If more than one row in a group takes the largest value of
`col_2`

, we would get back more than one row for that group.

Alternatively,

```
df_2 = df %>%
group_by(col_1) %>%
filter(col_2 == max(col_2))
```

Here is how this works:

`filter()`

when called on a grouped data frame, is applied to each group separately.- The expression
`col_2 == max(col_2)`

is`TRUE`

for the rows of the current group where the value of`col_2`

is equal to the largest value that`col_2`

takes for the group. - Solutions based on
`filter()`

are often easier to remember and are typically more appropriate when we wish to combine multiple filtering conditions.

We wish to return the row (or rows) where a particular column has its smallest value for each group.

In this example, we wish to return the row (or rows) of each group where the column `col_2`

has its
smallest value for the group. The groups are defined by the column `col_1`

.

```
df_2 = df %>%
group_by(col_1) %>%
slice_min(col_2)
```

Here is how this works:

`slice_min()`

when called on a grouped data frame, returns the row(s) of each group where the given column takes its smallest value.- Since we are interested in the rows where the column
`col_2`

takes its smallest value, we use`slice_min(col_2)`

. - If more than one row in a group takes the smallest value of
`col_2`

, we would get back more than one row for that group.

*Alternatively:*

```
df_2 = df %>%
group_by(col_1) %>%
filter(col_2 == min(col_2))
```

Here is how this works:

- This works similarly to the alternative solution under the "Largest" scenario above.
- The expression
`col_2 == min(col_2)`

is`TRUE`

for the rows of the current group where the value of`col_2`

is equal to the smallest value that`col_2`

takes for the group.

We wish to return the row (or rows) where a particular column has its nth largest or smallest value for each group.

**Smallest**

In this example, we wish to return the row (or rows) of each group where the column `col_2`

has its
third-smallest value for the group. The groups are defined by the column `col_1`

.

```
df_2 = df %>%
group_by(col_1) %>%
filter(dense_rank(col_2) == 3)
```

Here is how this works:

`filter()`

when called on a grouped data frame, is applied to each group separately.`dense_rank()`

returns a ranking of the rows by values of a given column. It is different from regular ranking in that it leaves no gaps between ranks. See Ranking.- The expression
`dense_rank(col_2) == 3`

evaluates to`TRUE`

for rows where the rank of the value of`col_2`

is 3 i.e. at its third-smallest value.

**Largest**

In this example, we wish to return the row (or rows) of each group where the column `col_2`

has its
third-largest value for the group. The groups are defined by the column `col_1`

.

```
df_2 = df %>%
group_by(col_1) %>%
filter(dense_rank(desc(col_2)) == 3)
```

Here is how this works:

This works similarly to the “Smallest” scenario above except that we wrap the column to be ranked
in `desc()`

so that we get a ranking that starts at the largest value i.e. rank 1 is the largest not
the smallest value.

We wish to return the rows where a particular column has its largest values, defined by a count or a proportion, for each group.

**Count**

We have a data frame that is grouped by one or more columns. For each group, we wish to return the rows for which a particular column takes its n largest values.

In this example, we wish to return the rows from each group where the column `col_2`

takes its three
largest values. The groups are defined by the column `col_1`

.

```
df_2 = df %>%
group_by(col_1) %>%
slice_max(col_2, n = 3)
```

Here is how this works:

`slice_max()`

when called on a grouped data frame, returns the row(s) of each group where the given column takes its largest value(s).- In
`slice_max(col_2, n = 3)`

we specify that we wish to return rows where the column`col_2`

takes its three highest values (hence`n=3`

). - How about duplicates?
- If there are no duplicates in the values of the column of interest,
`slice_max()`

will return`n`

rows each with a unique value for the column of interest (`col_2`

in this example). - However, if there are duplicates,
`slice_max()`

will return a number of rows that is greater than`n`

and the number of unique values returned will likely be smaller than`n`

.

- If there are no duplicates in the values of the column of interest,
- What to do in the case of duplicates?
- If the desired behavior is to get the number of rows closest to
`n`

including the duplicates, and we do not care that we get`n`

unique values for the column of interest then the solution above with`slice_max()`

is the way to go. Example input: 2, 2, 2, 3, 4. Example Output: 2, 2, - For
`n = 2`

. - If we wish to drop duplicates and return the first row for each of the largest values of the
column of interest we can set the argument
`with_ties = FALSE`

when calling`slice_max()`

. We would get`n`

rows each with a unique value for the column of interest just like we would get if there were no duplicates. Example input: 2, 2, 2, 3, 4. Example Output: 2, 3. For`n = 2`

. - If we wish to get all the rows where the column of interest takes its
`n`

largest unique values then the alternative solution below is the way to go. Example input: 2, 2, 2, 3, 4. Example Output: 2, 2, 2, 3. For`n = 2`

.

- If the desired behavior is to get the number of rows closest to

*Alternatively:*

```
df_2 = df %>%
group_by(col_1) %>%
filter(dense_rank(desc(col_2)) <= 3))
```

Here is how this works:

`filter()`

when called on a grouped data frame, is applied to each group separately.`dense_rank()`

returns a ranking of the rows by values of a given column. It is different from regular ranking in that it leaves no gaps between ranks. See Ranking.- We wrap the column to be ranked in
`desc()`

so that we get a ranking that starts at the largest value i.e. rank 1 is the largest not the smallest value. - The expression
`dense_rank(col_2) <= 3`

evaluates to`TRUE`

for rows where the rank of the value of`col_2`

is any of its three largest values for the group.

**Proportion**

We have a data frame that is grouped by one or more columns. For each group, we wish to return a proportion of the rows for which a particular column takes its largest values.

In this example, we wish to return rows from each group where the column `col_2`

takes a value
within its 20% largest values. The groups are defined by the column `col_1`

.

```
df_2 = df %>%
group_by(col_1) %>%
slice_max(col_2, prop = 0.2)
```

Here is how this works:

- We use the
`prop`

argument of`slice_max()`

to specify the proportion of rows to return (as opposed to using the argument`n`

to specify the number of rows to return). - In
`slice_max(col_2, prop = 0.2)`

we specify that we wish to return the 20% of the rows where the column`col_2`

has its largest values. - The discussion on duplicates under “Count” applies here too.

*Alternatively:*

```
df_2 = df %>%
group_by(col_1) %>%
filter(
dense_rank(desc(col_2)) <= 0.2 * n_distinct(col_2))
```

Here is how this works:

- This works similarity to the alternative scenario under “Count” above.
`n_distinct(col_2)`

returns the number of unique values that the column`col_2`

takes. See Uniqueness.`0.2 * n_distinct(col_2)`

tell us how many values is 20% of the total number of possible values of the column`col_2`

.`dense_rank(desc(col_2)) <= 0.2 * n_distinct(col_2)`

evaluates to`TRUE`

for rows where the rank of the value of`col_2`

is within the top 20% ranked values.

We wish to return the rows where a particular column has its smallest values, defined by a count or a proportion, for each group.

In this example, we wish to return the rows from each group where the column `col_2`

takes its three
smallest values. The groups are defined by the column `col_1`

.

```
df_2 = df %>%
group_by(col_1) %>%
slice_min(col_2, n = 3)
```

Here is how this works:

- This works similarly to the “Largest Values” scenario above except that we replace
`slice_max()`

with`slice_min()`

. - We can return a proportion rather than a count by using the
`prop`

argument of`slice_min()`

instead of the`n`

argument. See “Proportion” under “Largest Values” above.

*Alternatively:*

```
df_2 = df %>%
group_by(col_1) %>%
filter(dense_rank(col_2) <= 3))
```

Here is how this works:

- This works similarly to the alternative solution for the “Count” scenario under “Largest Values”
above except that we use
`dense_rank(col_2)`

instead of`dense_rank(desc(col_2))`

because we are looking for the smallest values so ranking in ascending order is appropriate. - We can return a proportion rather than a count. See the alternative solution under “Proportion” under “Largest Values” above.

For each group, we wish to return the rows where a particular column takes a value whose rank falls in a given range of value ranks.

In this example, we wish to return rows that take the second through to the fifth ranked possible
values of the column `col_2`

. The groups are defined by the column `col_1.`

```
df_2 = df %>%
group_by(col_1) %>%
filter(between(dense_rank(desc(col_2)), 2, 5))
```

Here is how this works:

`filter()`

when called on a grouped data frame, is applied to each group separately.`dense_rank()`

returns a ranking of the rows by values of a given column. It is different from regular ranking in that it leaves no gaps between ranks. See Ranking.- The expression
`between(dense_rank(desc(col_2)), 2, 5)`

evaluates to`TRUE`

for rows where the rank of the value of`col_2`

is between 2 and 5 (inclusive) for the group. - We use
`desc(col_2)`

so that we get a ranking that starts at the largest value i.e. rank 1 is the largest not the smallest value.

For each group, we wish to return the rows where a particular column takes a value whose rank is one of a set of value ranks.

In this example, for each group, we wish to return rows where the column `col_2`

takes values that
are ranked first, third or fifth among the possible values of the column `col_2`

. The groups are
defined by the column `col_1.`

```
df_2 = df %>%
group_by(col_1) %>%
filter(dense_rank(desc(col_2)) %in% c(1, 3, 5))
```

Here is how this works:

- This works similarly to the “Range” scenario above.
- The expression
`dense_rank(desc(col_2)) %in% c(1, 3, 5)`

evaluates to`TRUE`

for rows where the rank of the value of`col_2`

is between 2 and 5 (inclusive) for the group.

R