Split a String

Split a string around a given delimiter executing as many splits as there are instances of the delimiter. For instance, if we split ‘a, b, c, d’ around the ‘, ‘ as the delimiter, we would get four sub strings a, b, c and d.

We then cover four extensions that may be applied to the above scenario:

  • Split Words: Split a string into individual words.
  • Split Characters: Split a string into individual characters.
  • Multiple Delimiters: Split a string around any of a set of delimiters e.g. split a string around any hyphen ‘-’ character or underscore ‘_’ character.
  • Delimiter Column: Given a column of strings to split and a column of equal size of delimiters, we wish to split each element in the first column around the corresponding delimiter in the second column. This is often a case when we wish to split the values of a given column of a table using delimiters provided by another column.

This section is complemented by:

  • Select the desired parts out of the parts returned by the prior splitting step e.g. pick the second part and drop the rest.
  • Process: An optional step where we process the selected parts resulting from the prior selection process e.g. concatenate the first and third parts with an underscore in between.

Split

We wish to split a string around a given delimiter executing as many splits as there are instances of the delimiter.

In this example, we wish to split each element in the column col_1 around hyphen ‘-’ characters and add a new column to the table df for each part where the nth part goes into the nth new column.

SELECT col_1,
       SPLIT(col_1,'-') AS col_2
FROM table_1;

Here is how this works:

  • We use the function SPLIT() to split each element in a string column, which here is the column col_1, into parts around a given delimiter, which here is the hyphen character ‘-’.
  • Note: The delimiter pattern can be a string (character sequence).
  • The output table has the same number of rows and columns as the input table table_1 with an additional column added holding an array of the parts of the elements of the column col_1.

Split Words

We wish to split a string into individual words.

SELECT col_1,
       REGEXP_EXTRACT_ALL(col_1, r'\b\w+\b') AS col_2
FROM table_1;

Here is how this works:

This code works similarly to the Free Split scenario above except that we use the regular expression '\W+' as the delimiter which denotes possible separators between words.

Split Characters

We wish to split a string into individual characters.

SELECT col_1,
       SPLIT(col_1,'') AS col_2
FROM table_1;

Here is how this works:

This code works similarly to the Split scenario above except that we use '' as the delimiter which denotes splitting around individual characters.

Multiple Delimiters

We wish to split a string around any of a set of delimiters

In this example, we wish to split each element of the column col_1 into its parts which we wish to return as new columns. The delimiter can be a hyphen ‘-’ or an underscore ‘_’.

SELECT col_1,
       SPLIT(REGEXP_REPLACE(col_1, '-|_', '-'), '-') as col_2
FROM table_1;

Here is how this works:

  • We use the function REGEXP_REPLACE() to replace multiple delimiters with one delimiter. In this example we use '-|_' regular expression to replace '-' and '_' with a '-'. See Replacing.
  • We use the function SPLIT() to split each element in a string column, which here is the column col_1, into parts around a given delimiter. See Split above.
  • Alternatively, if our delimiters are single characters, we can use the bracket operator like so [-_].

Delimiter Column

Given a column of strings to split and a column of equal size of delimiters, we wish to split each element in the first column around the corresponding delimiter in the second column. This is often a case when we wish to split the values of a given column of a table using delimiters provided by another column.

SELECT col_1,
       col_2,
       SPLIT(col_1,col_2) AS col_3
FROM table_1;

Here is how this works:

  • We use the function SPLIT() to split each element in a string column into parts around a given delimiter. See Split above.
  • The function SPLIT() takes a column as an input, which here are the values of the column col_1, and the delimiter pattern, which here are the values of the column col_2. Therefore, the function str_split() will use the corresponding value of the column col_2 as the delimiter when splitting a value in the column col_1.
SQL
I/O