Conditional Transformation

We wish to carry out data transformation operations conditionally. In other words, the data transformation operation we wish to carry out is not the same for all rows, rather it is different based on one or more conditions.

We will cover two scenarios:

  • One Condition where we cover conditional transformation scenarios involving one condition.
  • Multiple Conditions where we cover conditional transformation scenarios involving multiple conditions.

One Condition

We wish to perform a data transformation operation that involves one condition.

In this example, we wish to create a new numerical column col_3 whose value is a function of the value of the numerical column col_1. The value of col_3 should be 50% higher than the corresponding value of col_1 if the value of col_1 is greater than 10. Otherwise, the value of col_3 should be just 20% higher than the corresponding value of col_1.

df_2 = df %>% 
  mutate(
    col_3 = if_else(col_1 > 10, 
                   col_1 * 1.5, 
                   col_1 * 1.2))

Here is how this works:

  • In order to carry out a data transformation operation in a conditional manner, we can use if_else() inside mutate().
  • Most commonly, we pass to if_else() three arguments as follows:
    • A logical condition which in this example is col_1 > 10.
    • A data transformation expression that provides the value to assign to rows for which the condition evaluates to TRUE , which in this example is col_1 * 1.5.
    • A data transformation expression that provides the value to assign to rows for which the condition evaluates to FALSE , which in this example is col_1 * 1.2.
  • To recap, the output column col_3 will take the value produced by col_1 * 1.5 for rows where the condition col_1 > 10 is TRUE, and the values produced by col_1 * 1.2 elsewhere.
  • See General Operations for a coverage of if_else() and other conditional statements.

Multiple Conditions

We wish to perform a data transformation operation that involves multiple conditions.

In this example, we wish to create a new numerical column col_3 whose value is a function of the values of the columns col_1 and col_2 and where the function that maps col_1 and col_2 to col_3 is different for different ranges of values of col_1 and col_2.

df_2 = df %>% 
  mutate(
    col_3 = case_when(
      col_1 > 20 & col_2 > 2 ~ col_1 * 1.5,  
      col_1 > 15 & col_2 > 1 ~ col_1 * 1.2,  
      col_1 > 10 | col_2 > 0 ~ col_1 * 1.1,
      TRUE ~ col_1))

Here is how this works:

  • In order to carry out a data transformation operation in a conditional manner involving multiple conditions, we can use case_when() inside mutate().
  • While possible, it can get quite messy to use if_else() (see “One Condition” above) to perform a conditional data transformation operation that involves multiple conditions. The recommended approach is to use case_when().
  • case_when() accepts any number of two-sided formulas of the form < condition ~ expression > where
    • the left side is a logical condition that determines which rows match this case e.g. col_1 > 20 & col_2 > 2.
    • the right hand side is a data transformation expression that provides the value to assign e.g. col_1 * 1.5.
  • The last condition TRUE is a catch-all case that is applied to if all prior cases do not match.
  • See General Operations for a coverage of case_when() and other conditional statements.
R
I/O