Implicit Relocating

We do not explicitly specify the columns we wish to relocate by name or position, rather we refer to them implicitly.

In this example, we wish to relocate columns ending with the suffix ‘_id’ to be the end of the data frame.

s_cols = df.columns[df.columns.str.endswith('_id')]
df_2 = df.loc[:, df.columns.difference(s_cols).append(s_cols)]

Here is how this works:

  • In df.columns[df.columns.str.endswith('_id')], we select all columns whose name ends with ‘_id’. See Implicit Selection for coverage of common scenarios of implicit column selection, including by name pattern, data type, and criteria satisfied by the column’s data.
  • We then modify the column index to be in the desired order as follows:
    • We first remove the selected columns (those whose name ends in ‘_id’) via df.columns.difference()
    • We then append the selected columns at the end via append().
  • Finally, we use loc[] to extract the columns in the order specified by the modified column index. See Relative Relocating.
  • The output data frame df_2 will be a copy of the input data frame df but with columns ending with the suffix ‘_id’ to be the end of the data frame.

Extension: Relative to Implicitly Selected Group

We wish to relocate a set of implicitly selected columns to be located relative to, i.e. before or after, another set of implicitly selected columns.

In this example, we wish to have character columns come before numeric columns (which is oftentimes a good practice when working with actual datasets).

s_cols_1 = df.select_dtypes('object').columns
s_cols_2 = df.select_dtypes('number').columns
s_cols = s_cols_1.append(s_cols_2)
df_2 = df.loc[:, s_cols.append(df.columns.difference(s_cols))]

Here is how this works:

  • In df.select_dtypes('object').columns, we select columns whose data type is object (string) and then extract the names of those columns (as an index). We do the same for numeric columns. See Implicit Selection.
  • Like in the primary solution above, we modify the column index to be in the desired order as follows:
    • We append the two sets of columns s_cols_1 and s_cols_2 corresponding to string and numeric columns respectively.
    • We then extract any columns that are not in s_cols via df.columns.difference(s_cols) and append that to s_cols
  • Finally, we use loc[] to extract the columns in the order specified by the modified column index. See Relative Relocating.
  • The output data frame df_2 will be a copy of the input data frame df but with character columns appearing before numeric columns and any other columns appearing after.

Alternative: Append Columns

df_a = df.select_dtypes('object')
df_b = df.select_dtypes('integer')
df_c = pd.concat([df_a, df_b], axis=1)
df_2 = pd.concat([df_c, df.drop(columns=df_c)], axis=1)

Here is how this works:

Since select_dtypes() returns a data frame, it may be a more natural approach to work with data frames rather than column names. See Relative Relocating.

PYTHON
I/O