Process String Parts

In some situations, we don’t merely wish to add new columns corresponding to some or all parts resulting from the split of a string column. Rather, we wish to apply some processing on the parts to generate the desired output.

In this section we cover two common scenarios of processing parts produced by string splitting:

  1. Concatenate Parts: We cover how to create new columns by concatenating some of the parts resulting from splitting the values of a string column.
  2. Set Data Type: We cover how to set the data type of parts resulting from splitting the values of a string column.
  3. Custom Logic: We cover how to apply any arbitrary processing logic to the parts resulting from splitting the values of a string column.

This section is complemented by:

  • Split an input string around a delimiter which may be a substring or a regex pattern e.g. break a sentence into words.
  • Select the desired parts out of the parts returned by the prior splitting step e.g. pick the second part and drop the rest.

Concatenate Parts

We wish to create new columns by concatenating some of the parts resulting from splitting the values of a string column.

In this example, we will pick the third and first parts resulting from splitting a string column and concatenate them into a new column.

SELECT col_1,
       CONCAT(SPLIT(col_1, '_')[SAFE_OFFSET(2)],
              '-',
              SPLIT(col_1, '_')[SAFE_OFFSET(0)]) AS col_2
FROM table_1;

Here is how this works:

  • We use SPLIT() to return an array of parts for each element in col_1. See Split.
  • We select the array elements corresponding to the two parts (resulting from the split step) that we wish to concatenate, which are the third and the first in this example, via SAFE_OFFSET(2) and SAFE_OFFSET(0) respectively.
  • We then use the function CONCAT() to concatenate the selected vectors element-wise to produce a new column as output and store that in a new column col_2.

Set Data Type

We wish to set the data type of parts resulting from splitting the values of a string column.

SELECT col_1,
       SAFE_CAST(SPLIT(col_1, ' ')[SAFE_OFFSET(1)] AS INT64) AS col_2,
       SPLIT(col_1, ' ')[SAFE_OFFSET(2)]                     AS col_3
FROM table_1;

Here is how this works:

  • We can apply any of the data type conversion functions to a selected part to convert the resulting column to the desired data type. See Data Type Conversion.
  • We use the function SAFE_CAST() to covert second element of the array to a numeric data type.

Apply Custom Logic

We wish to apply some arbitrary processing logic to the parts resulting from splitting the values of a string column. In other words, we wish to apply some custom logic to determine how to use the output of a string split operation.

In this example, the elements of column col_1 are strings of the form '4,3'. We wish to multiply the numbers within the comma-separated segments and add up the results. So for this case, we would get 4 * 2 = 8.

SELECT col_1,
       SAFE_CAST(SPLIT(col_1, ',')[SAFE_OFFSET(0)] AS INT64) *
       SAFE_CAST(SPLIT(col_1, ',')[SAFE_OFFSET(1)] AS INT64) AS col_2
FROM table_1;

Here is how this works:

  • We use the function str_sSPLITplit() to break down each element of the column col_1 into parts around the delimiter ‘,‘. See Split.
  • We get the two parts we wish to multiply using SAFE_OFFSET() and we cast it to an integer. See Select and Set Data Type.
SQL
I/O