Function Specification for Implicit Renaming

We wish to specify the function to be applied to the current column names to generate the desired names.

In this section, we cover the following function specification scenarios:

  • Named Function where we cover how to apply a built-in function or a custom function to each column name. The column names will be passed one by one as a string and the function and the function is expected to return a single string each time.
  • Anonymous Function where we cover how to apply a lambda function to each column name. Passing an anonymous (lambda) function which works similarly to a function but is often more convenient especially when parameters need to be passed.

The examples we show here rename all columns of a data frame. See Column Selection for Implicit Renaming for how to apply the renaming function to a selected set of columns only.

Named Function

We wish to rename columns by applying a named function to the current column names to generate the desired column names.

Built-In Function

In this example, we wish to convert the upper-case column names to lower-case.

df_2 = df.rename(columns=str.lower)

Here is how this works:

  • The columns argument of rename() accepts a function to which the existing names of all columns are passed one by one.
  • The function is expected to take a single string value as input (the existing column name) and to return a single string value as output (the new column name).
  • Note that since the function operates on one string at a time, we can’t use the Pandas vectorized string manipulation functions. Instead, we use the core Python string functions e.g. str.lower() in the example above. See String Operations.
  • In this solution, the renaming logic is applied to all columns. See Column Selection for Implicit Renaming for how to apply the renaming function to a selected set of columns only.

Custom Function

In this example, we wish to have column names follow the template 'col_{col}_v2' where {col} is the current column name.

def format_name(p_col):
    return 'col_{col}_v2'.format(col=p_col)

df_2 = df.rename(columns=format_name)

Here is how this works:

  • This works similarly to the “Built-In Function” scenario described above except that we pass to the columns argument of rename() our custom function format_name().
  • We use the function format() to insert each current column name in the desired naming template. See String Interpolation.

Anonymous Function

We wish to rename columns by applying a lambda function to the current column names to generate the desired column names.

df_2 = df.rename(columns = lambda x: x.strip().lower().replace(' ', '_'))

Here is how this works:

  • The columns argument of rename() accepts a lambda function (an anonymous function) to which the existing names of all columns are passed one by one.
  • The lambda function is expected to take a single string value as input (the existing column name) and return a single string value as output (the new column name).
  • Note that since the lambda operates on one string at a time, we can’t use the Pandas vectorized string manipulation functions. Instead, we use the core Python string functions e.g. x.strip() in this example. See String Operations.
  • An alternative that uses the Pandas vectorized string manipulation functions is given below.
  • In this example, the lambda function comprises a chain of string manipulation functions each acting on the output of the one prior.
  • In this solution, the renaming logic is applied to all columns. See Column Selection for Implicit Renaming for how to apply the renaming function to a selected set of columns only.

Alternative: Using Vectorized String Manipulation

df_2 = df\
    .set_axis(df.columns.str.strip().str.lower().str.replace(" ", "_"), 
              axis=1)

Here is how this works:

  • The advantages of this alternative are that it is chainable and uses the Pandas vectorized string manipulation functions described in String Operations.
  • The function set_axis() expects a set of column names of the same length as columns. We set axis=1 to specify that we wish to change column names not row indices. See Set Names.

Alternative: Set columns Attribute

df.columns = df.columns.str.strip().str.lower().str.replace(" ", "_")

Here is how this works:

  • We can assign a vector of column names to the columns attribute of a data frame.
  • The passed list of column names must have as many elements (column names) as there are columns in the data frame otherwise an error is raised.
  • In this approach, we can use the Pandas vectorized string manipulation functions described in String Operations.
  • This approach is not chaining-friendly (unless wrapped in a function).
PYTHON
I/O