Transforming

In this section, we cover data transformation which is the creation of new columns in a data frame or the modification of existing columns typically as a function of one or more existing columns.

This section is organized as follows:

  • Basic Transformation where we cover the basic yet commonly used data transformation scenarios. For instance, create a new column that is the ratio of two existing columns.
  • Conditional Transformation where we cover how to carry out data transformation operations conditionally. In other words, we wish to modify the values of the column for some rows differently from others based on one or more conditions. For instance, we wish to cap the values of a column to a certain range by setting values outside the range to the limits of the range while not changing values that fall within the range.
  • Grouped Transformation where we cover how to carry out data transformation within groups. For instance, we wish to subtract from a column the mean value for each group, where the group is defined by another column.
  • Non Vectorized Transformation where we cover how to execute data transformation operations that can not be applied in a vectorized manner but rather needs to be applied in a non-vectorized manner to each row individually. For instance, we wish to create a new column whose value for a row is the mean of two other columns.
  • Implicit Transformation where we cover how to succinctly apply one or more data transformation operations to one or more columns without repeating code. For instance, we wish to round the values of several columns to the nearest integer without spelling out each operation.
  • Dynamic Transformation where we cover how to specify aspects of data transformation dynamically i.e. through environment variables or function arguments. For instance, we wish to pass to a function the names of columns to which a set of data transformations would be applied. This is especially useful when creating reusable functions of packages.
PYTHON
I/O