We wish to carry out data transformation operations conditionally. In other words, the data transformation operation we wish to carry out is not the same for all rows, rather it is different based on one or more conditions.
We will cover two scenarios:
We wish to perform a data transformation operation that involves one condition.
In this example, we wish to create a new numerical column col_3
whose value is a function of the value of the numerical column col_1
. The value of col_3
should be 50% higher than the corresponding value of col_1
if the value of col_1
is greater than 10. Otherwise, the value of col_3
should be just 20% higher than the corresponding value of col_1
.
df_2 = df %>%
mutate(
col_3 = if_else(col_1 > 10,
col_1 * 1.5,
col_1 * 1.2))
Here is how this works:
if_else()
inside mutate()
.if_else()
three arguments as follows:col_1 > 10
.TRUE
, which in this example is col_1 * 1.5
.FALSE
, which in this example is col_1 * 1.2
.col_3
will take the value produced by col_1 * 1.5
for rows where the condition col_1 > 10
is TRUE
, and the values produced by col_1 * 1.2
elsewhere.if_else()
and other conditional statements.We wish to perform a data transformation operation that involves multiple conditions.
In this example, we wish to create a new numerical column col_3
whose value is a function of the values of the columns col_1
and col_2
and where the function that maps col_1
and col_2
to col_3
is different for different ranges of values of col_1
and col_2
.
df_2 = df %>%
mutate(
col_3 = case_when(
col_1 > 20 & col_2 > 2 ~ col_1 * 1.5,
col_1 > 15 & col_2 > 1 ~ col_1 * 1.2,
col_1 > 10 | col_2 > 0 ~ col_1 * 1.1,
TRUE ~ col_1))
Here is how this works:
case_when()
inside mutate()
.if_else()
(see “One Condition” above) to perform a conditional data transformation operation that involves multiple conditions. The recommended approach is to use case_when()
.case_when()
accepts any number of two-sided formulas of the form < condition ~ expression >
wherecol_1 > 20 & col_2 > 2
.col_1 * 1.5
.TRUE
is a catch-all case that is applied to if all prior cases do not match.case_when()
and other conditional statements.