Extreme Values

We wish to get rows of a data frame where a particular column takes its largest or smallest values.

Note that the n largest or smallest values might not necessarily correspond to n rows. If there are rows that take the same values, n values would correspond to more than n rows.

Largest Values

We wish to get the rows with the largest values for a particular column. In this example we wish to get the rows where col_1 has its 5 highest values.

df %>% slice_max(col_1, n=5)

Here is how this works:

  • We use the nlargest() method of Pandas data frames which takes the number or rows to return as it’s first argument and the numerical column whose largest values we are interested in as the second argument.
  • While slice_max() in Tidy R works for textual columns (sorting alphabetically), nlargets() in Pandas only works for numerical columns.

Smallest Values

We wish to get the rows with the smallest values for a particular column. In this example we wish to get the rows where col_1 has its 5 smallest values.

df %>% slice_min(col_1, n=5)

Here is how this works:

This code works similarly to the code above except that we use slice_min() instead of slice_max().

R
I/O