Conditional Transformation

We wish to carry out data transformation operations conditionally. In other words, the data transformation operation we wish to carry out is not the same for all rows, rather it is different based on one or more conditions.

We will cover two scenarios:

  • One Condition where we cover conditional transformation scenarios involving one condition.
  • Multiple Conditions where we cover conditional transformation scenarios involving multiple conditions.

One Condition

We wish to perform a data transformation operation that involves one condition.

In this example, we wish to create a new numerical column col_13 whose value is a function of the value of the numerical column col_1. The value of col_13 should be 50% higher than the corresponding value of col_1 if the value of col_1 is greater than 1000. Otherwise, the value of col_13 should be just 20% higher than the corresponding value of col_1.

SELECT *,
       IF(col_1 > 1000, col_1 * 1.5, col_1 * 1.2) AS col_13
FROM refcon.dataset.table_1;

Here is how this works:

  • To carry out a data transformation operation in a conditional manner, we can use IF() conditional expression.
  • We pass to IF() three arguments as follows:
    • A logical condition which in this example is col_1 > 1000.
    • A data transformation expression that provides the value to assign to rows for which the condition evaluates to TRUE, which in this example is col_1 * 1.5.
    • A data transformation expression that provides the value to assign to rows for which the condition evaluates to FALSE, which in this example is col_1 * 1.2.
  • To recap, the output column col_13 will take the value produced by col_1 * 1.5 for rows where the condition col_1 > 10 is TRUE, and the values produced by col_1 * 1.2 elsewhere.
  • See General Operations: Conditional for a coverage of IF() and other conditional expressions.

Multiple Conditions

We wish to perform a data transformation operation that involves multiple conditions.

In this example, we wish to create a new numerical column col_13 whose value is a function of the values of the columns col_1 and col_11 and where the function that maps col_1 and col_11 to col_13 is different for different ranges of values of col_1 and col_11.

SELECT *,
       CASE
           WHEN col_1 > 1000 AND col_11 > 10 THEN col_1 * 1.5
           WHEN col_1 > 1500 AND col_11 > 20 THEN col_1 * 1.2
           WHEN col_1 > 2000 AND col_11 > 50 THEN col_1 * 1.1
           ELSE col_1 END AS col_13
FROM refcon.dataset.table_1;

Here is how this works:

  • To carry out a data transformation operation in a conditional manner involving multiple conditions, we can use CASE expression.
  • While possible, it can get quite messy to use IF() (see “One Condition” above) to perform a conditional data transformation operation that involves multiple conditions. The recommended approach is to use CASE expression.
  • CASE statement accepts any number of two-sided formulas of the form < WHEN condition THEN expression > where
    • the left side is a logical condition that determines which rows match this case e.g. col_1 > 1000 AND col_11 > 10.
    • the right-hand side is a data transformation expression that provides the value to assign e.g. col_1 * 1.5.
  • The last condition ELSE is a catch-all case that is applied if all prior cases do not match.
  • See General Operations: Conditional for coverage of CASE and other conditional expressions.
SQL
I/O