Counting

We wish to count the number of occurrences of a given pattern in a target string.

We will cover the following common pattern occurrence counting scenarios:

  • Substring: how to obtain the number of occurrences of a given substring in a given string.
  • Regular Expression: how to obtain the number of occurrences of a match of a given regular expression in a given string.
  • Word: the special case of how to count the number of words in a given string.

The scenarios above can be extended in multiple ways, the most common of which are:

  • Ignore Case: Ignoring case (of both pattern and string) while matching.
  • Pattern Column: Matching a column of patterns against a column of strings of the same size.
  • Multiple Patterns: Checking for multiple patterns at a time.

Substring

We wish to count the number of occurrences of a given substring in a given string.

In this example, we count the number of occurrences of the string ‘XY’ in each value of the column col_1.

SELECT col_1,
       ARRAY_LENGTH(REGEXP_EXTRACT_ALL(col_1, 'XY')) AS col_2
FROM table_1

Here is how this works:

  • We use REGEXP_EXTRACT_ALL() from to extract all occurrences of the substring ‘XY’ in each value of the column col_1.
  • Then we use ARRAY_LENGTH() to count the number of elements in the occurrences array.
  • The REGEXP_EXTRACT_ALL() function takes the following arguments:
    • The column whose values we wish to check against; which in this case is col_1.
    • The regular expression whose occurrences we wish to count; in this case ‘XY’. In this case ‘XY’ is treated as a substring since it doesn't contain any special characters. We need to escape any special characters if they are part of our substring.
  • The output table will have the same number of rows as the input table table_1 with an added column col_2 holding the number of occurrences of the string ‘XY’ in the corresponding value of the column col_1.

Regular Expression

We wish to count the number of occurrences of a given regular expression in a given string.

In this example, we count the number of vowels (while ignoring case) in each value of the column col_1.

SELECT col_1,
       ARRAY_LENGTH(REGEXP_EXTRACT_ALL(LOWER(col_1), '[aeiou]')) AS col_2
FROM table_1;

Here is how this works:

  • This works similarly to the substring scenario described above except that we pass a regular expression to REGEXP_EXTRACT_ALL() instead of a substring.
  • The regular expression '[aeiou]' matches any of the characters in the square brackets (which are the five vowels in the English language). It can be thought of as an or operation.
  • We convert col_1 to lower case using LOWER() function.
  • The output table will have the same number of rows as the input table table_1 with an added column col_2 holding the number of vowels in the corresponding value of the column col_1.

Word

We wish to count the number of words in a string.

In this example, we wish to count the number of words in each value of the column col_1 and to return that as a new integer column col_2.

SELECT col_1,
       ARRAY_LENGTH(REGEXP_EXTRACT_ALL(col_1, '\\w')) AS col_2
FROM table_1

Here is how this works:

  • We use the regular expression ‘\\w+’ where:
    • \w represents any ‘word’ character; i.e. letters, digits or underscore.
    • + specifies that we are looking for one or more ‘word’ characters.
  • The output is the same as the Regular Expression solution above.

Ignore Case

Substring

We wish to count the number of occurrences of a given substring in a given string while ignoring case.

In this example, we count the number of occurrences of the string ‘xy’ in each value of the column col_1 while ignoring case.

SELECT col_1,
       ARRAY_LENGTH(REGEXP_EXTRACT_ALL(LOWER(col_1), 'xy')) AS col_2
FROM table_1

Here is how this works:

  • This code is similar to the code under Substring above except that we pass col_1 to LOWER() function.

Pattern Column

We wish to count the number of occurrences of a value of one column in the value of another column for each row.

In this example, we have a table table_1 with two columns col_1 and col_2, we wish to count the the number of occurrences of the value of col_2 in col_1.

SELECT col_1,
       col_2,
       ARRAY_LENGTH(REGEXP_EXTRACT_ALL(col_1, col_2)) AS col_3
FROM table_1

Here is how this works:

  • This works similarly to Substring solution above, except we pass col_2 instead of a fixed pattern.

Multiple Patterns

Count All

We wish to return the total number of occurrences of all patterns as a single integer value. In other words, we wish to return the sum of occurrences of a given set of patterns in a given string.

SELECT col_1,
       ARRAY_LENGTH(REGEXP_EXTRACT_ALL(col_1, 'XY|YX')) AS col_2
FROM table_1

Here is how this works:

  • We use the or operator | of regular expressions to build a regular expression that captures all the patterns we wish to look for or’ed together. In this case that regular expression is 'XY|YX'.
  • For each value of col_1, ARRAY_LENGTH(REGEXP_EXTRACT_ALL()) will return the total number of occurrences of all patterns.
  • The output table will have the same number of rows as the input table table_1 with an added column col_2 holding the sum of occurrences of the specified patterns in the corresponding value of the column col_1.
  • See Regular Expression above for a more detailed description.
SQL
I/O