We wish to get rows of a data frame where a particular column takes its largest or smallest values.
Note that the n
largest or smallest values might not necessarily correspond to n
rows. If there are rows that take the same values, n
values would correspond to more than n
rows.
We wish to get the rows with the largest values for a particular column. In this example we wish to get the rows where col_1
has its 5
highest values.
df %>% slice_max(col_1, n=5)
Here is how this works:
nlargest()
method of Pandas data frames which takes the number or rows to return as it’s first argument and the numerical column whose largest values we are interested in as the second argument.slice_max()
in Tidy R works for textual columns (sorting alphabetically), nlargets()
in Pandas only works for numerical columns.We wish to get the rows with the smallest values for a particular column. In this example we wish to get the rows where col_1
has its 5
smallest values.
df %>% slice_min(col_1, n=5)
Here is how this works:
This code works similarly to the code above except that we use slice_min()
instead of slice_max()
.