Set Names

We wish to rename columns of a data frame by their position without providing the current name(s).

We should rename columns by mapping the current name to the desired name where possible. Sometimes though we need to identify the columns whose names we wish to change by their position instead of their current name; e.g. when the current names are not valid or too long.

Note that referring to columns by their position is rather fragile and should be handled with care else risk mislabeling columns if we accidentally get the positions of columns wrong.

This section is organized as follows:

  • All Columns: Set the names of all columns given their positions and the desired names.
  • One Column: Set the name of one column given its position and the desired name.
  • Some Columns: Set the names of some columns given their positions and the desired names.
  • Flatten Index: Often times we wish to convert a column MultiIndex into flat column names (a single index). We do so by setting the columns index to a flattened version of the original MultiIndex.

All Columns

We wish to rename all columns by providing a set of new column names in the right order.

In this example, we have a data frame of three columns, and we wish to set their names to col_1, col_2, and col_3.

df_2 = df.set_axis(['col_1', 'col_2', 'col_3'], axis=1)

Here is how this works:

  • The set_axis() method of a Pandas data frame can set the indexes for columns i.e. column names (when axis=1) or rows (when axis=0) to a given list that has the same size as the index to be changed.
  • The list of column names must include a name for each column even for those columns whose names we do not wish to change. See Some Columns below for an alternative solution to renaming some columns without having to provide names for all columns.
  • The passed list of column names must have as many elements (column names) as there are columns in the data frame otherwise an error is raised. ValueError: Length mismatch: Expected axis has n elements, new values have m elements.
  • We need to be especially careful to get the order of columns right when setting all column names, so we do not accidentally mislabel columns.
  • set_axis() is a data frame method, and it returns a copy of the data frame. It can therefore be chained.

Alternative: Set columns Attribute

df_2 = df.copy()
df_2.columns = ['col_1', 'col_2', 'col_3']

Here is how this works:

  • We can assign a vector of column names to the columns attribute of a data frame.
  • The passed list of column names must have as many elements (column names) as there are columns in the data frame otherwise an error is raised.
  • This approach is not chaining-friendly (unless wrapped in a function).

Extension: Level of Multi-Index

We wish to set the labels of a particular level of a MultiIndex.

df_2 = (df
        .loc[:, ['col_2', 'col_3']]
        .apply([abs, round]))

df_2.columns = \
    df_2.columns.set_levels(['mag', 'rnd'], level=1)

Here is how this works:

  • In this example, we demonstrate setting the labels of a level of a MultiIndex following a data transformation operation where we use apply() to apply multiple functions to selected columns of a data frame. See Implicit Transformation.
  • When we pass multiple functions to apply(), the output data frame has a MultiIndex where level 0 is the original column names and level 1 is the function names.
  • We use the set_levels() method of a MultiIndex to specify the column names. Note that set_levels() is a method of the Index and not of the data frame therefore we call it on df_2.columns.
  • To set the labels of a particular level of a MultiIndex, we pass to set_levels():
    • A list of the desired level labels that must have as many elements as the labels in the level, which in this case is ['mag', 'rnd'].
    • The level index, which in this case is level=1 (second level).

One Column

We wish to set the name for a specific column given its position.

In this example, we have a data frame of three columns, and we wish to change the name of the third column to col_c.

df_2 = df\
    .rename(columns = {df.columns[2]:'col_c'})

Here is how this works:

  • We can’t simply set df.columns[2]='col_c’. If we do, we get a TypeError: Index does not support mutable operations. This is because the Pandas index can’t be altered.
  • We use columns[2] to retrieve the current name of the column whose name we wish to change given its position (in this example that position is 2).
  • We then use rename() to map the current name to the desired name. See Map Names.
  • The default is inplace=False which is the right setting when chaining. If chaining is not a concern, we could set inplace=True in which case the original DataFrame is changed and the function would return None.

Extension: In a Chain after Column Creation

df_2 = df.pipe(
    lambda x: x.rename(columns={x.columns[2]: 'col_c'}))

Here is how this works:

  • Using df.rename(columns = {df.columns[2]:'col_c'}) wouldn't work if we wish to refer to columns that were created earlier in the same chain or if columns of the original DataFrame were removed or relocated earlier in the chain.
  • Instead, we used pipe() to pass the current version of the DataFrame to a lambda function that handles the renaming.

Some Columns

We wish to set the name for some columns given their position.

In this example, we wish to change the name of the first and third columns to col_a and col_c respectively.

df_2 = df \
    .rename(columns = {df.columns[0]: 'col_a', df.columns[2]: 'col_c'})

Here is how this works:

This works similarly to the One Column scenario above except that we pass multiple columns instead of just one.

Extension: In a Chain after Column Creation

df_2 = df.pipe(
    lambda x: x.rename(columns = {x.columns[0]: 'col_a',
                                  x.columns[2]:'col_c'}))

Here is how this works:

  • The primary solution above wouldn't work if we wish to refer to columns that were created earlier in the same chain or if columns of the original DataFrame were removed or relocated earlier in the chain.
  • Instead, we used pipe() to pass the current version of the DataFrame to a lambda function that handles the renaming.

Flatten Multi-Index

We wish to flatten a MultiIndex into a single Index.

In this example, we have a data frame with a two-level column MultiIndex and we wish to flatten that MultiIndex so that the column names are made up of the index levels separated by underscores.

df_2 = (df
        .loc[:, ['col_2', 'col_3']]
        .apply([abs, round]))

df_2.columns = df_2.columns.map('_'.join)

Here is how this works:

  • In this example, the data frame with the MultiIndex df_2 is produced by applying an Implicit Transformation operation where we compute the absolute value, via the function abs(), and round, via the function round(), the columns col_2 and col_3 resulting in four columns with a two-level MultiIndex. See Implicit Transformation.
  • To flatten the MultiIndex, we:
    • Use df_2.columns to obtain an array-like structure of tuples each holding the values of the two levels for each column
    • We use map() to iterate over each tuple and apply…
    • the expression '_'.join which appends the two values in the tuple separated by an underscore ‘_’.
  • We assign the flat Index resulting from the map() operation to df_2.columns as the new Index.

Alternative: via List Comprehension

df_2 = (df
        .loc[:, ['col_2', 'col_3']]
        .apply([abs, round]))

df_2.columns = [' '.join(col) for col in df_2.columns]

Here is how this works:

Instead of map(), we can use a list comprehension to flatten the labels of the MultiIndex into labels (names) of the form {level_0}_{level_1} where in this example {level_o} holds the original column names and {level_1} holds the function names.

PYTHON
I/O