Common Transformation Scenarios

The most common data transformations involve applying a function to one or more columns of a table to output a new column.

We will cover three common scenarios for creating new columns:

  • Scalars where we create a new column entirely composed of the same value. This may be a starting point for a larger transformation or to identify rows of a table before appending rows from multiple tables.
  • Operations where we create a new column by applying arithmetic or logical operations on existing columns. For instance, creating a new column that is the ratio of two existing columns.
  • Functions where we create a new column by applying one or more functions to existing columns. For instance, creating a new column that is the log of an existing column.

Scalars

We wish to create a new column that is entirely composed of the same value.

In this example, we create a new column col_3 that has the value 2022 for all rows of the table table_1.

SELECT *, 2022 AS col_3
FROM table_1;

Here is how this works:

  • AS col_3 will add a new column col_3 to the table table_1 or will add it as col_3_1 column if a column named col_3 already exists.
  • Each value of the column col_3 will be set to the scaler value 2022.

Operations

We wish to create a new column by applying arithmetic or logical operations on existing columns.

In this example, we create a new column col_3 that is the ratio of two existing columns col_1 and col_2.

SELECT *, SAFE_DIVIDE(col_1, col_2) AS col_3
FROM table_1;

Here is how this works:

  • SAFE_DIVIDE(col_1, col_2): The data transformation to be carried out which is to divide the value of the column col_1 for each row by the value of the column col_2 of the same row.
  • SAFE_DIVDE is equivalent to the division operator ( X / Y ), but returns NULL if an error occurs, such as a division by zero error.
  • AS col_3 will add a new column col_3 to the table table_1 or will add it as col_3_1 column if a column named col_3 already exists.
  • While there are operations that work for different data types, most often it is arithmetic operations (such as +, -, *, and /) or comparison operations (such as >, <, and ==) applied to numeric data which we cover in Numeric Operations.

Functions

Built-In Functions

We wish to create a new column by applying one or more built-in functions on existing columns.

In this example, we create a new column col_3 that is the log of an existing column col_2.

SELECT *, LOG10(col_2) AS col_3
FROM table_1;

Here is how this works:

  • AS col_13 will add a new column col_3 to the table table_1 or will add it as col_3_1 column if a column named col_3 already exists.
  • LOG10(col_2) is the data transformation to be carried out which is to apply the LOG10 function to take the log base 10 of each value of the column col_2.
  • We cover functions for manipulating different data types in their respective operations sections; Numeric , String , Logical , Factor, and Datetime Operations.

User-Defined Functions

We wish to create a new column by applying one or more user-defined functions (UDF) to existing columns.

In this example, we create a new column col_13 via a UDF that accepts a string value and returns a string column that contains only digits.

CREATE FUNCTION IF NOT EXISTS
    refcon.dataset.extract_digits(v_col_value STRING) RETURNS string
AS (
        REGEXP_REPLACE(v_col_value, '[^0-9 ]', '')
        );

SELECT *, refcon.dataset.extract_digits(col_9) AS col_13
FROM refcon.dataset.table_1;

Here is how this works:

  • We create a user-defined function extract_digits() which accepts a string value as an argument.
  • In REGEXP_REPLACE(v_col_value, '[^0-9 ]', ''), inside the function, use regular expressions to extract digits from a string.
  • We can call the function as part of a data transformation expression inside SELECT just like we would a built-in function (see “Built-In Functions” above).
SQL
I/O