We wish to identify the columns to use for sorting, not by explicitly spelling out their names but by specifying criteria satisfied by the desired columns.
In this example we wish to sort the rows of the data frame df
by all the columns whose names start with the string prefix cvr_
. Note: We are assuming that the order of the columns is appropriate for the task at hand.
def m_sort_values(df, select_fn):
selected_cols = select_fn(df).to_list()
return df.sort_values(by=selected_cols)
df_2 = df \
.pipe(m_sort_values,
lambda x: x.columns[x.columns.str.startswith('cvr_')])
Here is how this works:
sort_values()
can’t take a callable for the by
argument. We can work around that by creating a custom function (where we call here m_sort_values()
) that takes a data frame and a column selection function and handles the column selection then data frame sorting.df
) and the column selection lambda
function to our custom sorting function m_sort_values()
.str.startswith()
to select all columns whose names start with the string suffix ‘cvr_’
. See Implicit Selection for a coverage of the most common scenarios of implicit column selection including by name pattern, data type, and Criteria satisfied by the column’s data.to_list()
to convert the Index returned by the columns
attribute of a data frame to a list that can be passed to sort_values()
.Alternatively,
selected_cols = df.columns[df.columns.str.startswith('cvr_')].to_list()
df.sort_values(by=selected_cols)
Here is how this works: