We wish to rename columns of a data frame by their position without providing the current name(s).
We should rename columns by mapping the current name to the desired name where possible. Sometimes though we need to identify the columns whose names we wish to change by their position instead of their current name; e.g. when the current names are not valid or too long.
Note that referring to columns by their position is rather fragile and should be handled with care else risk mislabeling columns if we accidentally get the positions of columns wrong.
This section is organized as follows:
MultiIndex
into flat column names (a single index). We do so by setting the columns index to a flattened version of the original MultiIndex
.We wish to rename all columns by providing a set of new column names in the right order.
In this example, we have a data frame of three columns, and we wish to set their names to col_1
, col_2
, and col_3
.
df_2 = df.set_axis(['col_1', 'col_2', 'col_3'], axis=1)
Here is how this works:
set_axis()
method of a Pandas data frame can set the indexes for columns i.e. column names (when axis=1
) or rows (when axis=0
) to a given list that has the same size as the index to be changed.ValueError: Length mismatch: Expected axis has n elements, new values have m elements
.set_axis()
is a data frame method, and it returns a copy of the data frame. It can therefore be chained.Alternative: Set columns
Attribute
df_2 = df.copy()
df_2.columns = ['col_1', 'col_2', 'col_3']
Here is how this works:
columns
attribute of a data frame.Extension: Level of Multi-Index
We wish to set the labels of a particular level of a MultiIndex
.
df_2 = (df
.loc[:, ['col_2', 'col_3']]
.apply([abs, round]))
df_2.columns = \
df_2.columns.set_levels(['mag', 'rnd'], level=1)
Here is how this works:
MultiIndex
following a data transformation operation where we use apply()
to apply multiple functions to selected columns of a data frame. See Implicit Transformation.apply()
, the output data frame has a MultiIndex
where level 0 is the original column names and level 1 is the function names.set_levels()
method of a MultiIndex
to specify the column names. Note that set_levels()
is a method of the Index and not of the data frame therefore we call it on df_2.columns
.MultiIndex
, we pass to set_levels():
['mag', 'rnd']
.level
index, which in this case is level=1
(second level).We wish to set the name for a specific column given its position.
In this example, we have a data frame of three columns, and we wish to change the name of the third column to col_c
.
df_2 = df\
.rename(columns = {df.columns[2]:'col_c'})
Here is how this works:
df.columns[2]='col_c’
. If we do, we get a TypeError: Index does not support mutable operations
. This is because the Pandas index can’t be altered.columns[2]
to retrieve the current name of the column whose name we wish to change given its position (in this example that position is 2).rename()
to map the current name to the desired name. See Map Names.inplace=False
which is the right setting when chaining. If chaining is not a concern, we could set inplace=True
in which case the original DataFrame is changed and the function would return None
.Extension: In a Chain after Column Creation
df_2 = df.pipe(
lambda x: x.rename(columns={x.columns[2]: 'col_c'}))
Here is how this works:
df.rename(columns = {df.columns[2]:'col_c'})
wouldn't work if we wish to refer to columns that were created earlier in the same chain or if columns of the original DataFrame
were removed or relocated earlier in the chain.pipe()
to pass the current version of the DataFrame
to a lambda
function that handles the renaming.We wish to set the name for some columns given their position.
In this example, we wish to change the name of the first and third columns to col_a
and col_c
respectively.
df_2 = df \
.rename(columns = {df.columns[0]: 'col_a', df.columns[2]: 'col_c'})
Here is how this works:
This works similarly to the One Column scenario above except that we pass multiple columns instead of just one.
Extension: In a Chain after Column Creation
df_2 = df.pipe(
lambda x: x.rename(columns = {x.columns[0]: 'col_a',
x.columns[2]:'col_c'}))
Here is how this works:
DataFrame
were removed or relocated earlier in the chain.pipe()
to pass the current version of the DataFrame
to a lambda
function that handles the renaming.We wish to flatten a MultiIndex into a single Index.
In this example, we have a data frame with a two-level column MultiIndex
and we wish to flatten that MultiIndex
so that the column names are made up of the index levels separated by underscores.
df_2 = (df
.loc[:, ['col_2', 'col_3']]
.apply([abs, round]))
df_2.columns = df_2.columns.map('_'.join)
Here is how this works:
df_2
is produced by applying an Implicit Transformation operation where we compute the absolute value, via the function abs()
, and round, via the function round()
, the columns col_2
and col_3
resulting in four columns with a two-level MultiIndex
. See Implicit Transformation.MultiIndex
, we:df_2.columns
to obtain an array-like structure of tuples each holding the values of the two levels for each columnmap()
to iterate over each tuple and apply…'_'.join
which appends the two values in the tuple separated by an underscore ‘_’
.map()
operation to df_2.columns
as the new Index.Alternative: via List Comprehension
df_2 = (df
.loc[:, ['col_2', 'col_3']]
.apply([abs, round]))
df_2.columns = [' '.join(col) for col in df_2.columns]
Here is how this works:
Instead of map()
, we can use a list comprehension to flatten the labels of the MultiIndex
into labels (names) of the form {level_0}_{level_1}
where in this example {level_o}
holds the original column names and {level_1}
holds the function names.