Multiple Transformations

We wish to carry out multiple data transformation operations on the same data frame.

df_2 = df\
    .assign(
        col_4 = ~df['col_3'],
        col_5 = df['col_1'].abs(),
        col_6 = df['col_2'].round(2),
        col_7 = lambda x: x['col_5'] / x['col_6'])

Here is how this works:

  • We can create or modify multiple columns in a single call to assign(). To do so, we pass to assign() multiple data transformation expressions, such as those covered in Common Transformation Scenarios, separated by commas.
  • In this example, the data transformation expressions are:
    • col_4 = ~df['col_3'] where we use the logical complement operator ~ to create a new column col_4 that is the logical complement of column col_3 which is of a logical data type, i.e. it can take the values True or False.
    • col_5 = df['col_1'].abs() where we create a new column col_5 whose values are the absolute values of the corresponding values of the numeric column col_1.
    • col_6 = df['col_2'].round(2)) where we create a new column col_6 whose values are the rounding to 2 decimal places of the corresponding values of the numeric column col_2.
    • col_7 = lambda x: x['col_5'] / x['col_6'] where we create a new column col_7 whose values are the ratio of the two columns col_5 and col_6. See the next point.
  • In order to refer to columns created earlier in the same assign() statement (more precisely; in the same data manipulation chain), we need to use a lambda function like we do to create col_7 in this example.
  • The resulting columns will be added on the right end of the original data frame and in the same order they are defined in the assign() function. If an existing column is overwritten, it’s position is not changed.
PYTHON
I/O