We wish to count the number of occurrences of a given pattern in a target string.
We will cover the following common pattern occurrence counting scenarios:
The scenarios above can be extended in multiple ways, the most common of which are:
We wish to count the number of occurrences of a given substring in a given string.
In this example, we count the number of occurrences of the string ‘XY’
in each value of the
column col_1
.
SELECT col_1,
ARRAY_LENGTH(REGEXP_EXTRACT_ALL(col_1, 'XY')) AS col_2
FROM table_1
Here is how this works:
REGEXP_EXTRACT_ALL()
from to extract all occurrences of the substring ‘XY’
in each
value of the column col_1
.ARRAY_LENGTH()
to count the number of elements in the occurrences array.REGEXP_EXTRACT_ALL()
function takes the following arguments:col_1
.‘XY’
. In this case
‘XY’
is treated as a substring since it doesn't contain any special characters. We need to
escape any special characters if they are part of our substring.table_1
with an added
column col_2
holding the number of occurrences of the string ‘XY’
in the corresponding value
of the column col_1
.We wish to count the number of occurrences of a given regular expression in a given string.
In this example, we count the number of vowels (while ignoring case) in each value of the
column col_1
.
SELECT col_1,
ARRAY_LENGTH(REGEXP_EXTRACT_ALL(LOWER(col_1), '[aeiou]')) AS col_2
FROM table_1;
Here is how this works:
REGEXP_EXTRACT_ALL()
instead of a substring.'[aeiou]'
matches any of the characters in the square brackets (which are
the five vowels in the English language). It can be thought of as an or
operation.col_1
to lower case using LOWER()
function.table_1
with an added
column col_2
holding the number of vowels in the corresponding value of the column col_1
.We wish to count the number of words in a string.
In this example, we wish to count the number of words in each value of the column col_1
and to
return that as a new integer column col_2
.
SELECT col_1,
ARRAY_LENGTH(REGEXP_EXTRACT_ALL(col_1, '\\w')) AS col_2
FROM table_1
Here is how this works:
‘\\w+’
where:\w
represents any ‘word’ character; i.e. letters, digits or underscore.+
specifies that we are looking for one or more ‘word’ characters.Substring
We wish to count the number of occurrences of a given substring in a given string while ignoring case.
In this example, we count the number of occurrences of the string ‘xy’
in each value of the
column col_1
while ignoring case.
SELECT col_1,
ARRAY_LENGTH(REGEXP_EXTRACT_ALL(LOWER(col_1), 'xy')) AS col_2
FROM table_1
Here is how this works:
col_1
to LOWER()
function.We wish to count the number of occurrences of a value of one column in the value of another column for each row.
In this example, we have a table table_1
with two columns col_1
and col_2
, we wish to count the
the number of occurrences of the value of col_2
in col_1
.
SELECT col_1,
col_2,
ARRAY_LENGTH(REGEXP_EXTRACT_ALL(col_1, col_2)) AS col_3
FROM table_1
Here is how this works:
col_2
instead of
a fixed pattern.Count All
We wish to return the total number of occurrences of all patterns as a single integer value. In other words, we wish to return the sum of occurrences of a given set of patterns in a given string.
SELECT col_1,
ARRAY_LENGTH(REGEXP_EXTRACT_ALL(col_1, 'XY|YX')) AS col_2
FROM table_1
Here is how this works:
|
of regular expressions to build a regular expression that captures all
the patterns we wish to look for or’ed together. In this case that regular expression
is 'XY|YX'
.col_1
, ARRAY_LENGTH(REGEXP_EXTRACT_ALL())
will return the total number of
occurrences of all patterns.table_1
with an added
column col_2
holding the sum of occurrences of the specified patterns in the corresponding value
of the column col_1
.