Sorting by Multiple Columns

We wish to sort the rows of a data frame by the values of multiple columns.

When sorting by multiple columns, sorting is carried out in sequence where rows that have the same value for a former column are sorted by the value of a later column. For instance, say we are sorting by two columns ‘col_1' and ‘col_2', rows that have the same value for ‘col_1' will be sorted by the value of ‘col_2'.

Ascending

We wish to sort the rows of a data frame in increasing order of the values of two or more columns.

In this example, we wish to sort the rows of a data frame df in ascending order of the values of a column ‘col_1' and to sort rows that have the same value for ‘col_1' in ascending order of the values of ‘col_2'.

df = df.sort_values(by=['col_1', 'col_2'])

Here is how this works:

  • We apply the function sort_values() to the data frame df.
  • We pass to sort_values() the names of the columns we want to sort by which here are ['col_1', 'col_2'].
  • By default, sort_values() sorts in ascending order.
  • The order in which we specify the sort columns matters. In this example, the rows of the data frame are first sorted in ascending order of the values of ‘col_1' and then rows that have the same value for col_1 are sorted in ascending order of the values of ‘col_2'.

Descending

We wish to sort the rows of a data frame in descending order of the values of two or more columns.

In this example, we wish to sort the rows of a data frame df in descending order of the values of a column ‘col_1' and to sort rows that have the same value for ‘col_1' in descending order of the values of ‘col_2'.

df = df.sort_values(by=['col_1', 'col_2'], ascending=False)

Here is how this works:

  • We apply the function sort_values() to the data frame df.
  • We pass to sort_values() the names of the columns we want to sort by which here are ['col_1', 'col_2'].
  • To sort in descending order of the values of the sorting column, we set ascending=False. The default, as we saw above, is to sort in ascending order; i.e.ascending=True.

Multiple Directions

We wish to sort the rows of a data frame in ascending order of the value of one or more columns and in descending order of the values of one or more columns.

In this example, we wish to sort the rows of a data frame df in ascending order of the values of a column ‘col_1' and to sort rows that have the same value for ‘col_1' in descending order of the values of ‘col_2'.

df = df.sort_values(by=['col_1', 'col_2'], ascending=(True, False))

Here is how this works:

  • We apply the method sort_values() to the data frame df and pass the names of the columns we wish to sort by ['col_1', 'col_2'] like we did above.
  • The parameter ascending of the sort_values() method takes a tuple of booleans corresponding to the column names defined in by.
  • We set ascending to True for the columns that we wish to sort in ascending order and to False for the columns that we wish to sort in descending order. In this example, we are sorting in ascending order of ‘col_1' and in descending order of ‘col_2'.

Ignore Index

Upon sorting a data frame, the rows keep their original row index. In some situations, we want to reset the row index so the rows of the sorted data frame would have an ordered index starting at 0 with the first row and ending at n-1 for the last row (where n is the number of rows in the data frame).

In this example, we wish to sort a data frame df in ascending order of the values of a column ‘col_1' and to sort rows that have the same value for ‘col_1' in ascending order of the values of ‘col_2'. In addition, we wish to have the row index of the resulting sorted data frame be ordered.

df = df.sort_values(by=['col_1', 'col_2'], ignore_index=True)

Here is how this works:

  • We use sort_values() as described above to sort the data frame df in ascending order of the values of the columns ['col_1', 'col_2'].
  • To get an ordered axis index, we set the ignore_index parameter to True. The default is ignore_index=False.
PYTHON
I/O