In Function Specification for Implicit Renaming, we cover how to specify the functions to be applied to the current column names to generate the desired names. In this section, we show how to select the column(s) to be renamed when we only wish to rename a subset of columns.
We will cover the following scenarios:
For more detailed coverage of column selection, see Selection.
We wish to some columns of a data frame by applying a function to their current names to generate the desired names. We wish to select the columns to be renamed via their names.
In this example, we wish to convert the names of all columns to lowercase.
df_2 = df.rename(columns=str.lower)
Here is how this works:
columns
argument of rename()
accepts a function to which the existing names of all columns are passed one by one.rename()
the name of the function that we wish to apply to each column name, which here is the core Python function str.lower()
.We wish to rename some columns of a data frame by applying a function to their current names to generate the desired names.
In this example, we wish to rename the columns in positions 1
and 2
by lowering the case of their current names and replacing dash separators ‘-’
with underscores ‘_’
.
selected_cols = df.columns[[1, 2]]
renamed_cols = selected_cols.str.lower().str.replace('-', '_')
df_2 = df\
.rename(columns=dict(zip(selected_cols, renamed_cols)))
Here is how this works:
df.columns[[1, 2]]
. See Basic Selection. str.lower()
and str.replace()
. See Function Specification for Implicit Renaming and String Operations. rename()
to map from the original names (step 1) to the modified names (step 2). See Map Names.dict(zip(selected_cols, renamed_cols))
to convert the two lists into a dictionary of key-value pairs where the keys are the current column names and the values are the desired column names. This is the format that rename()
expects.df_2
is a copy of the input data frame with the renaming logic applied only to the selected columns.Extension: Custom Function
def m_rename(df, select_fn, rename_fn):
selected_cols = select_fn(df)
renamed_cols = rename_fn(selected_cols)
return df.rename(columns=dict(zip(selected_cols, renamed_cols)))
df_2 = df \
.pipe(m_rename,
lambda x: x.columns[[1, 2]],
lambda x: x.str.lower().str.replace('-', '_'))
Here is how this works:
pipe()
method to rename selected columns implicitly in a chained manner.m_rename()
expects the following:df
A data frame whose columns (or some of them) are to be renamed.select_fn
A function or lambda
function that can be applied to the data frame to obtain the names of the columns to be renamed as strings.rename_fn
A function or lambda
function that can be applied to the column names returned by select_fn
to obtain the desired column names.We wish to rename a subset of the columns of the data frame. We wish to select that subset of columns implicitly; i.e. we do not spell out the column names or positions explicitly but rather identify the columns via a property of their name or their data.
In this example, we wish to replace the suffix ‘_num’
with the suffix ‘_int’
for all columns whose data type is integer.
selected_cols = df.select_dtypes('integer').columns
renamed_cols = selected_cols.str.replace('_num', '_int')
df_2 = df.rename(columns=dict(zip(selected_cols, renamed_cols)))
Here is how this works:
df.select_dtypes('integer').columns
, we obtain the names of the columns whose data type is integer
. See Implicit Selection for a coverage of the most common scenarios of implicit column selection including by name pattern, data type, and Criteria satisfied by the column’s data.Extension: Custom Function
def m_rename(df, select_fn, rename_fn):
selected_cols = select_fn(df)
renamed_cols = rename_fn(selected_cols)
return df.rename(columns=dict(zip(selected_cols, renamed_cols)))
df_2 = df \
.pipe(m_rename,
lambda x: x.select_dtypes('integer').columns,
lambda x: x.str.replace('_num', '_int'))
Here is how this works:
See “Extension: Custom Function” under Explicit Selection above.
We wish to apply a function to rename all but a set of columns.
In this example, we wish to add the prefix ‘attr_’
to all columns except columns whose current name includes the string ‘_id’
.
selected_cols = df.columns[~df.columns.str.contains('_id')]
renamed_cols = selected_cols.map('attr_{}'.format)
df_2 = df.rename(columns=dict(zip(selected_cols, renamed_cols)))
Here is how this works:
df.columns[~df.columns.str.contains('_id')]
, we obtain the names of all columns except those whose name contains the substring ‘_id’
. See Exclude Columns for coverage of column exclusion scenarios, all of which can be used for implicit renaming.Extension: Custom Function
def m_rename(df, select_fn, rename_fn):
selected_cols = select_fn(df)
renamed_cols = rename_fn(selected_cols)
return df.rename(columns=dict(zip(selected_cols, renamed_cols)))
df_2 = df \
.pipe(m_rename,
lambda x: x.columns[~x.columns.str.contains('_id')],
lambda x: x.map('attr_{}'.format))
Here is how this works:
See “Extension: Custom Function” under Explicit Selection above.