Relative Relocating

We wish to relocate one or more columns relative to the current column locations while keeping all other column locations relatively unchanged.

We will cover the following scenarios:

  • Relative to Beginning: Move one or more columns to the beginning (the left of the data frame) while leaving the rest of the columns where they are.
  • Relative to Column: Move one or more columns to positions specified relative to, i.e. before or after, a particular column.
  • Relative to Position: Move one or more columns to positions specified relative to, i.e. before or after, a particular column position.

Relative to Beginning

We wish to move one or more columns to the beginning (the left of the data frame) while leaving the rest of the columns where they are.

In this example, we wish to move columns col_3 and col_4 of the data frame df to be the leftmost columns.

def bring_to_front(p_df, p_cols):
    o_cols = p_cols + list(p_df.columns.difference(p_cols))
    return p_df.loc[:, o_cols]

df_2 = df.pipe(bring_to_front, ['col_3', 'col_4'])

Here is how this works:

  • Our approach is to construct a list of column names that follows the desired order and then use loc[] to select and return the columns in that order. See Basic Relocating.
  • To construct the list of appropriately ordered columns, which here is o_cols, we append together via the + operator two lists:
    • The list of one or more columns that we wish to move to the front, which here is held by the function argument p_cols to which we pass ['col_3', 'col_4'].
    • The rest of the columns which we obtain by taking the set difference and then converting it to a list in list(p_df.columns.difference(p_cols)). See Exclude Columns.
  • We then use the ordered list to extract the columns in the desired order in p_df.loc[:, o_cols].
  • To make the code easier to read and reuse, we encapsulate the functionality in a custom bring_to_front() function which we call via the pipe() method.
  • The output data frame df_2 will be a copy of the input data frame df with columns col_3 and col_4 moved to the first and second leftmost positions respectively.

Alternative: via Reindexing

s_cols = ['col_3', 'col_4']
o_cols = s_cols + list(df.columns.difference(s_cols))
df_2 = df.reindex(columns = o_cols)

Here is how this works:

This works similarly to the primary solution above except that we use the data frame method reindex() instead of loc[].

Alternative: Append Columns

s_cols = ['col_3', 'col_4']
df_2 = pd.concat([df.loc[:, s_cols], 
                  df.loc[:,df.columns.difference(s_cols)]], 
                 axis=1)

Here is how this works:

  • Our approach is to construct the desired data frame by appending together (a) the columns we wish to bring to the front and (b) the rest of the columns.
  • We use pd.concat() to column bind two data frames. See Appending.
  • The two data frames are:
    • df.loc[:, s_cols] which holds the columns we wish to move to the front
    • df.loc[:,df.columns.difference(s_cols)]] which holds the rest of the columns. We use the function difference() to identify columns that are not in s_cols.
  • The output data frame df_2 will be a copy of the input data frame df with columns col_3 and col_4 moved to the first and second leftmost positions respectively.

Alternative: Pop and Insert

s_cols = ['col_3', 'col_4']
df_2 = df.copy()
[df_2.insert(i, col, df_2.pop(col)) for i, col in enumerate(s_cols)]

Here is how this works:

  • Our approach is to extract each of the columns we wish to bring to the front via pop() and then insert it at the left of the data frame via insert().
  • The method pop() removes and returns one column of a data frame given its name.
  • The method insert() inserts a column into a specific location in a data frame and expects three arguments as follows:
    • loc is the position at which we wish to insert a column in the data frame
    • column is the name to give the new column
    • value are the values of the new column as an array or a Series
  • Both pop() and insert() act on one column at a time, therefore:
    • We use a list comprehension to iterate over the list of columns to be moved, which here is s_cols.
    • We use enumerate() so we may access the current column and the current column’s index; e.g. the first column in s_cols is ‘col_3’ and its index is 0.
    • We call insert() for each element in s_cols to extract the corresponding column via pop(col) and insert it in the corresponding location i.
  • If we only wish to relocate only one column to the beginning, say col_3, we can drop the list comprehension and simply use df_2.insert(0, 'col_3', df_2.pop('col_3')).
  • Both pop() and insert() modify the calling data frame in place which may be okay for many applications but makes them ill-suited for chaining. This is the reason we started out by copying the original data frame in df_2 = df.copy().
  • The output data frame df_2 will be a copy of the input data frame df with columns col_3 and col_4 moved to the first and second leftmost positions respectively.

Extension: Reverse Order

We wish to inverse the order of columns of a data frame; i.e. the last column becomes the first column and the second last column becomes the second column, etc…

df_2 = df.iloc[:, ::-1]

Here is how this works:

  • The indexer iloc[] refers to columns by their position. See Basic Selection.
  • One way to use iloc[] to extract columns is iloc[:, start:end:step], where:
    • start is the position of the first element in the desired range and is left empty here to refer to the first column
    • end is the position of the last element in the desired range and is left empty here to refer to the last column
    • step is how many columns to skip after each column is extracted and can be negative to imply stepping from the right and is set here to -1 to mean select every column starting at the end.
  • The output data frame df_2 will be a copy of the input data frame df with columns in the reverse order; i.e. the last column becomes the first column and the second last column becomes the second column, etc…

Relative to Position

We wish to move one or more columns to positions specified relative to, i.e. before or after, a particular column position.

In this example, we wish to move columns col_1 and col_4 of the data frame df to be before position 2 (third column) relative to the current column order of the data frame.

def relocate(p_df, p_cols, p_loc):
    l_cols = df.columns[:p_loc].difference(p_cols).tolist()
    r_cols = df.columns[p_loc:].difference(p_cols).tolist()
    return p_df.loc[:, l_cols + p_cols + r_cols]

df_2 = df.pipe(relocate, ['col_1', 'col_4'], 2)

Here is how this works:

  • The challenge with relocating arbitrary columns to arbitrary positions is:
    • If we remove columns, the insertion position that we started with may no longer be valid since it was relative to the original data frame before removing columns.
    • If we insert first, we would have duplicate column names and renaming may not be a safe strategy.
  • For easier readability and reuse, we isolated the logic in a custom function relocate() which accepts a data frame p_df, a list of the names of the columns to be moved p_cols, and the desired insertion position relative to the original ordering of the columns of the data frame df.
  • Our approach is to:
    • Split the list of column names at the insertion position, which here is 2, into two lists; l_cols and r_cols (short for left and right).
    • We then exclude the names of the columns that we wish to move, which here is the list p_cols, from both the left and right column name lists.
    • We append the three lists; l_cols, p_cols, and r_cols into a single list that holds column names in the desired order.
    • Finally, we use the ordered list to extract the columns in the desired order via loc[].
  • The output data frame df_2 will be a copy of the input data frame df with columns col_1 and col_4 moved to be before position 2 of the original order.

Extension: After

def relocate(p_df, p_cols, p_loc):
    p_loc = p_loc + 1
    l_cols = df.columns[:p_loc].difference(p_cols).tolist()
    r_cols = df.columns[p_loc:].difference(p_cols).tolist()
    return p_df.loc[:, l_cols + p_cols + r_cols]

df_2 = df.pipe(relocate, ['col_1', 'col_2'], 2)

Here is how this works:

  • This works similarly to the primary solution above except that we add 1 to the given insertion location so columns are inserted after rather than before the insertion position.
  • Note that in this solution we do not check if the given location is within the data frame’s bounds.

Extension: Relative to End

In this example, we wish to move columns col_1 and col_4 of the data frame df to be before the second last column from the end.

def relocate(p_df, p_cols, p_loc):
    l_cols = df.columns[:p_loc].difference(p_cols).tolist()
    r_cols = df.columns[p_loc:].difference(p_cols).tolist()
    return p_df.loc[:, l_cols + p_cols + r_cols]

df_2 = df.pipe(relocate, ['col_1', 'col_4'], -2)

Here is how this works:

  • To specify a location relative to the end of the data frame, we use a negative number which here is -2 to refer to the second last position.
  • No change to the code presented in the primary solution above is needed to move columns to a location specified relative to the end of the data frame because python indexing interprets negative numbers as relative to the end of the iterable being indexed.

Alternative: Append Columns

def relocate(p_df, p_cols, p_loc):
    l_df = p_df.loc[:, df.columns[:p_loc].difference(p_cols)]
    r_df = p_df.loc[:, df.columns[p_loc:].difference(p_cols)]
    return pd.concat([l_df, p_df.loc[:, p_cols], r_df], axis=1)

df_2 = df.pipe(relocate, ['col_1', 'col_4'], 2)

Here is how this works:

  • Our approach here is similar to the primary solution above except that we work with columns rather than column names.
  • For easier readability and reuse, we isolated the logic in a custom function relocate() which accepts a data frame p_df, a list of the names of the columns to be moved p_cols, and the desired insertion position relative to the original ordering of the columns of the data frame df.
  • We use pd.concat() to construct the desired data frame by binding together three sets of columns. See Appending.
  • The three sets of columns are:
    • l_df columns before the insertion position p_loc excluding any column in the set of columns to be moved p_cols
    • p_cols the set of columns to be moved
    • r_df columns after the insertion position p_loc excluding any column in the set of columns to be moved p_cols
  • The output data frame df_2 will be a copy of the input data frame df with columns col_1 and col_4 moved to be before position 2 of the original order.

Alternative: Pop and Insert

def relocate(p_df, p_cols, p_loc):
    u_loc = p_loc - pd.Series(p_cols).map(p_df.columns.get_loc).lt(p_loc).sum().tolist()
    o_df = p_df.copy().drop(p_cols, axis=1)
    [o_df.insert((u_loc + i), col, p_df.loc[:, col]) for i, col in enumerate(p_cols)]
    return o_df

df_2 = df.pipe(relocate, ['col_1', 'col_4'], 2)

Here is how this works:

  • This solution follows a different approach to the challenge of shifting column positions on removing the columns that we wish to move.
  • The key idea is this: We adjust the insertion position to account for any shifting. In other words, if any of the columns to be moved is originally located before the insertion point, the insertion point needs to be reduced by one to account for shifting due to removing that column.
  • The adjusted insertion position u_loc, is obtained by subtracting from the given insertion position p_loc the number of columns to be moved that were originally located to the left of p_loc:
    • We convert p_cols to a series, so we may use the convenient and chainable series functions.
    • We use the get_loc() method to get the integer position of a column in a data frame via its name. We use map() to iterate over each element in p_cols.
    • We then use lt(p_loc) to return a Series of logical values that is True for elements of p_cols that lie to the left of p_loc.
    • We then use sum() to get the number of columns that lie before p_loc
    • We use tolist() to convert the numpy int64 data type <class 'numpy.int64'> to the native Python int data type <class 'int'> to avoid the error TypeError: loc must be int returned by insert().
    • Finally, we subtract the output from p_loc to obtain the adjusted insertion position.
  • In p_df.copy().drop(p_cols, axis=1), we create a copy of the data frame df via copy() and then exclude the columns that we wish to move via drop(). See Exclude Columns.
  • We then use a list comprehension to insert() every element of the list of columns to be moved p_cols at the adjust position. See “Alternative: Pop and Insert” under Relative to Beginning above for more on how we are using insert() and pop() to relocate columns.

Relative to Column

We wish to move one or more columns to positions specified relative to, i.e. before or after, a particular column.

def relocate(p_df, p_cols, p_col):
    p_loc = df.columns.get_loc(p_col)
    l_cols = df.columns[:p_loc].difference(p_cols).tolist()
    r_cols = df.columns[p_loc:].difference(p_cols).tolist()
    return p_df.loc[:, l_cols + p_cols + r_cols]

df_2 = df.pipe(relocate, ['col_1', 'col_4'], 'col_3')

Here is how this works:

  • For easier readability and reuse, we isolated the logic in a custom function relocate() which accepts a data frame p_df, a list of the names of the columns to be moved p_cols, and the name of the column p_col before which we wish to insert the given columns p_cols.
  • We use get_loc() method to get the integer position of a column in a data frame via its name. This allows us to convert the given column name, which here is 'col_3' that is passed to the argument p_col, to a column position p_loc.
  • The rest of the code works similarly to the primary solution presented under Relative to Position above.
  • The alternative solutions presented under Relative to Position above may also be adapted for relocating relative to a column by converting the column name to a location via df.columns.get_loc() like we did here.
PYTHON
I/O