We wish to relocate one or more columns relative to the current column locations while keeping all other column locations relatively unchanged.
We will cover the following scenarios:
We wish to move one or more columns to the beginning (the left of the data frame) while leaving the rest of the columns where they are.
In this example, we wish to move columns col_3
and col_4
of the data frame df
to be the leftmost columns.
def bring_to_front(p_df, p_cols):
o_cols = p_cols + list(p_df.columns.difference(p_cols))
return p_df.loc[:, o_cols]
df_2 = df.pipe(bring_to_front, ['col_3', 'col_4'])
Here is how this works:
loc[]
to select and return the columns in that order. See Basic Relocating.o_cols
, we append together via the +
operator two lists:p_cols
to which we pass ['col_3', 'col_4']
.list(p_df.columns.difference(p_cols))
. See Exclude Columns.p_df.loc[:, o_cols]
.bring_to_front()
function which we call via the pipe()
method.df_2
will be a copy of the input data frame df
with columns col_3
and col_4
moved to the first and second leftmost positions respectively.Alternative: via Reindexing
s_cols = ['col_3', 'col_4']
o_cols = s_cols + list(df.columns.difference(s_cols))
df_2 = df.reindex(columns = o_cols)
Here is how this works:
This works similarly to the primary solution above except that we use the data frame method reindex()
instead of loc[]
.
Alternative: Append Columns
s_cols = ['col_3', 'col_4']
df_2 = pd.concat([df.loc[:, s_cols],
df.loc[:,df.columns.difference(s_cols)]],
axis=1)
Here is how this works:
pd.concat()
to column bind two data frames. See Appending.df.loc[:, s_cols]
which holds the columns we wish to move to the frontdf.loc[:,df.columns.difference(s_cols)]]
which holds the rest of the columns. We use the function difference()
to identify columns that are not in s_cols
.df_2
will be a copy of the input data frame df
with columns col_3
and col_4
moved to the first and second leftmost positions respectively.Alternative: Pop and Insert
s_cols = ['col_3', 'col_4']
df_2 = df.copy()
[df_2.insert(i, col, df_2.pop(col)) for i, col in enumerate(s_cols)]
Here is how this works:
pop()
and then insert it at the left of the data frame via insert()
.pop()
removes and returns one column of a data frame given its name.insert()
inserts a column into a specific location in a data frame and expects three arguments as follows:loc
is the position at which we wish to insert a column in the data framecolumn
is the name to give the new columnvalue
are the values of the new column as an array or a Seriespop()
and insert()
act on one column at a time, therefore:s_cols
.enumerate()
so we may access the current column and the current column’s index; e.g. the first column in s_cols
is ‘col_3’
and its index is 0
.insert()
for each element in s_cols
to extract the corresponding column via pop(col)
and insert it in the corresponding location i
.col_3
, we can drop the list comprehension and simply use df_2.insert(0, 'col_3', df_2.pop('col_3'))
.pop()
and insert()
modify the calling data frame in place which may be okay for many applications but makes them ill-suited for chaining. This is the reason we started out by copying the original data frame in df_2 = df.copy()
.df_2
will be a copy of the input data frame df
with columns col_3
and col_4
moved to the first and second leftmost positions respectively.Extension: Reverse Order
We wish to inverse the order of columns of a data frame; i.e. the last column becomes the first column and the second last column becomes the second column, etc…
df_2 = df.iloc[:, ::-1]
Here is how this works:
iloc[]
refers to columns by their position. See Basic Selection.iloc[]
to extract columns is iloc[:, start:end:step]
, where:start
is the position of the first element in the desired range and is left empty here to refer to the first columnend
is the position of the last element in the desired range and is left empty here to refer to the last columnstep
is how many columns to skip after each column is extracted and can be negative to imply stepping from the right and is set here to -1 to mean select every column starting at the end.df_2
will be a copy of the input data frame df
with columns in the reverse order; i.e. the last column becomes the first column and the second last column becomes the second column, etc…We wish to move one or more columns to positions specified relative to, i.e. before or after, a particular column position.
In this example, we wish to move columns col_1
and col_4
of the data frame df
to be before position 2 (third column) relative to the current column order of the data frame.
def relocate(p_df, p_cols, p_loc):
l_cols = df.columns[:p_loc].difference(p_cols).tolist()
r_cols = df.columns[p_loc:].difference(p_cols).tolist()
return p_df.loc[:, l_cols + p_cols + r_cols]
df_2 = df.pipe(relocate, ['col_1', 'col_4'], 2)
Here is how this works:
relocate()
which accepts a data frame p_df
, a list of the names of the columns to be moved p_cols
, and the desired insertion position relative to the original ordering of the columns of the data frame df
.l_cols
and r_cols
(short for left and right).p_cols
, from both the left and right column name lists.l_cols
, p_cols
, and r_cols
into a single list that holds column names in the desired order.loc[]
.df_2
will be a copy of the input data frame df
with columns col_1
and col_4
moved to be before position 2 of the original order.Extension: After
def relocate(p_df, p_cols, p_loc):
p_loc = p_loc + 1
l_cols = df.columns[:p_loc].difference(p_cols).tolist()
r_cols = df.columns[p_loc:].difference(p_cols).tolist()
return p_df.loc[:, l_cols + p_cols + r_cols]
df_2 = df.pipe(relocate, ['col_1', 'col_2'], 2)
Here is how this works:
Extension: Relative to End
In this example, we wish to move columns col_1
and col_4
of the data frame df
to be before the second last column from the end.
def relocate(p_df, p_cols, p_loc):
l_cols = df.columns[:p_loc].difference(p_cols).tolist()
r_cols = df.columns[p_loc:].difference(p_cols).tolist()
return p_df.loc[:, l_cols + p_cols + r_cols]
df_2 = df.pipe(relocate, ['col_1', 'col_4'], -2)
Here is how this works:
Alternative: Append Columns
def relocate(p_df, p_cols, p_loc):
l_df = p_df.loc[:, df.columns[:p_loc].difference(p_cols)]
r_df = p_df.loc[:, df.columns[p_loc:].difference(p_cols)]
return pd.concat([l_df, p_df.loc[:, p_cols], r_df], axis=1)
df_2 = df.pipe(relocate, ['col_1', 'col_4'], 2)
Here is how this works:
relocate()
which accepts a data frame p_df
, a list of the names of the columns to be moved p_cols
, and the desired insertion position relative to the original ordering of the columns of the data frame df
.pd.concat()
to construct the desired data frame by binding together three sets of columns. See Appending.l_df
columns before the insertion position p_loc
excluding any column in the set of columns to be moved p_cols
p_cols
the set of columns to be movedr_df
columns after the insertion position p_loc
excluding any column in the set of columns to be moved p_cols
df_2
will be a copy of the input data frame df
with columns col_1
and col_4
moved to be before position 2 of the original order.Alternative: Pop and Insert
def relocate(p_df, p_cols, p_loc):
u_loc = p_loc - pd.Series(p_cols).map(p_df.columns.get_loc).lt(p_loc).sum().tolist()
o_df = p_df.copy().drop(p_cols, axis=1)
[o_df.insert((u_loc + i), col, p_df.loc[:, col]) for i, col in enumerate(p_cols)]
return o_df
df_2 = df.pipe(relocate, ['col_1', 'col_4'], 2)
Here is how this works:
u_loc
, is obtained by subtracting from the given insertion position p_loc
the number of columns to be moved that were originally located to the left of p_loc
:p_cols
to a series, so we may use the convenient and chainable series functions.get_loc()
method to get the integer position of a column in a data frame via its name. We use map()
to iterate over each element in p_cols
.lt(p_loc)
to return a Series of logical values that is True
for elements of p_cols
that lie to the left of p_loc
.tolist()
to convert the numpy int64 data type <class 'numpy.int64'>
to the native Python int data type <class 'int'>
to avoid the error TypeError: loc must be int
returned by insert()
.p_df.copy().drop(p_cols, axis=1)
, we create a copy of the data frame df
via copy()
and then exclude the columns that we wish to move via drop()
. See Exclude Columns.insert()
every element of the list of columns to be moved p_cols
at the adjust position. See “Alternative: Pop and Insert” under Relative to Beginning above for more on how we are using insert()
and pop()
to relocate columns.We wish to move one or more columns to positions specified relative to, i.e. before or after, a particular column.
def relocate(p_df, p_cols, p_col):
p_loc = df.columns.get_loc(p_col)
l_cols = df.columns[:p_loc].difference(p_cols).tolist()
r_cols = df.columns[p_loc:].difference(p_cols).tolist()
return p_df.loc[:, l_cols + p_cols + r_cols]
df_2 = df.pipe(relocate, ['col_1', 'col_4'], 'col_3')
Here is how this works:
relocate()
which accepts a data frame p_df
, a list of the names of the columns to be moved p_cols
, and the name of the column p_col
before which we wish to insert the given columns p_cols
.get_loc()
method to get the integer position of a column in a data frame via its name. This allows us to convert the given column name, which here is 'col_3'
that is passed to the argument p_col
, to a column position p_loc
.df.columns.get_loc()
like we did here.