Oftentimes, we wish to select columns whose names fit a particular pattern. This is helpful in situations where we would otherwise need to type in many similar column names explicitly or in situations where we don’t know the exact column names in advance (but we know the pattern they would follow).
Column names are strings. We can therefore apply string matching operations to them. We will cover the string matching operations most commonly used in a column selection context on this page. Please see String Operations for detailed coverage of string matching.
We wish to select columns whose names start with a given prefix.
In this example, we select all columns whose name starts with the prefix ‘col_1’
.
BEGIN
DECLARE v_table_name STRING;
DECLARE v_prefix STRING;
DECLARE v_columns STRING;
SET v_table_name = 'table_1';
SET v_prefix = 'col_1';
SET v_columns = (WITH table_columns AS
(SELECT column_name
FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
WHERE table_name = v_table_name
AND STARTS_WITH(column_name, v_prefix))
SELECT STRING_AGG(column_name) AS columns
FROM table_columns);
EXECUTE IMMEDIATE FORMAT("""SELECT %s
FROM refcon.dataset.%s
""", v_columns, v_table_name);
END;
Here is how this works:
INFORMATION_SCHEMA.COLUMNS
contains the columns info for each table in the dataset. We get the column names and
keep only the columns that match our pattern.STARTS_WITH(STRING,PREFIX)
functions checks if the STRING starts with a given prefix ('col_1'
in this example).STRING_AGG
to get the column names as a comma separated string. i.e. col_1, col_11,col_12
and
set v_columns
variable.EXECUTE IMMEDIATE
to execute our dynamic query using the v_table
and v_columns
variables.FORMAT
function which formats a data type expression as a string. %s
will be replaced
with the variable values in the order of listing the variables.FORMAT
will be SELECT col_1,col_11,col_12 FROM refcon.dataset.table_1
.Alternatively:
We can use the LIKE
operator instead of 'STARTS_WITH'.
BEGIN
DECLARE v_table_name STRING;
DECLARE v_pattern STRING;
DECLARE v_columns STRING;
SET v_table_name = 'table_1';
SET v_pattern = 'col_1%';
SET v_columns = (WITH table_columns AS
(SELECT column_name
FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
WHERE table_name = v_table_name
AND column_name LIKE v_pattern)
SELECT STRING_AGG(column_name) AS columns
FROM table_columns);
EXECUTE IMMEDIATE FORMAT("""SELECT %s
FROM refcon.dataset.%s
""", v_columns, v_table_name);
END;
Here is how this works:
LIKE
operator.X LIKE Y
operator checks if the STRING in the first operand X matches a pattern specified by the second operand Y.
Expressions can contain these characters:'col_1%'
pattern will match any column name that starts with a col_1
.We wish to select columns whose names end with a given suffix.
In this example, we select all columns whose name ends with the suffix ‘_1’
.
BEGIN
DECLARE v_table_name STRING;
DECLARE v_suffix STRING;
DECLARE v_columns STRING;
SET v_table_name = 'table_1';
SET v_suffix = '_1';
SET v_columns = (WITH table_columns AS
(SELECT column_name
FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
WHERE table_name = v_table_name
AND ENDS_WITH(column_name, v_suffix))
SELECT STRING_AGG(column_name) AS columns
FROM table_columns);
EXECUTE IMMEDIATE FORMAT("""SELECT %s
FROM refcon.dataset.%s
""", v_columns, v_table_name);
END;
Here is how this works:
This works similarly to the scenario described above, except that we use ENDS_WITH(STRING,SUFFIX)
.
We wish to select columns whose names contain a given substring.
In this example, we select all columns whose name contains the substring l_1
.
BEGIN
DECLARE v_table_name STRING;
DECLARE v_substr STRING;
DECLARE v_columns STRING;
SET v_table_name = 'table_1';
SET v_substr = 'l_1';
SET v_columns = (WITH table_columns AS
(SELECT column_name
FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
WHERE table_name = v_table_name
AND CONTAINS_SUBSTR(column_name, v_substr))
SELECT STRING_AGG(column_name) AS columns
FROM table_columns);
EXECUTE IMMEDIATE FORMAT("""SELECT %s
FROM refcon.dataset.%s
""", v_columns, v_table_name);
END;
Here is how this works:
This works similarly to the scenario described above, except that we use CONTAINS_SUBSTR(STRING,SUBSTR)
.
We wish to select columns whose names match a given regular expression.
In this example we wish to select all columns whose names end with the digit 1 matched by the regular expression '1$'
.
BEGIN
DECLARE v_table_name STRING;
DECLARE v_regex STRING;
DECLARE v_columns STRING;
SET v_table_name = 'table_1';
SET v_regex = '1$';
SET v_columns = (WITH table_columns AS
(SELECT column_name
FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
WHERE table_name = v_table_name
AND REGEXP_CONTAINS(column_name, v_regex))
SELECT STRING_AGG(column_name) AS columns
FROM table_columns);
EXECUTE IMMEDIATE FORMAT("""SELECT %s
FROM refcon.dataset.%s
""", v_columns, v_table_name);
END;
Here is how this works:
This works similarly to the scenario described above, except that we use REGEXP_CONTAINS(STRING,REGEX)
.
We wish to select columns whose names include any one of a set of substrings.
In this example, we wish to select columns whose names contain any of the words ‘_3’
, or ‘l_4’
, or ‘1’
.
BEGIN
DECLARE v_table_name STRING;
DECLARE v_regex STRING;
DECLARE v_columns STRING;
SET v_table_name = 'table_1';
SET v_regex = '_3|l_4|1';
SET v_columns = (WITH table_columns AS
(SELECT column_name
FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
WHERE table_name = v_table_name
AND REGEXP_CONTAINS(column_name, v_regex))
SELECT STRING_AGG(column_name) AS columns
FROM table_columns);
EXECUTE IMMEDIATE FORMAT("""SELECT %s
FROM refcon.dataset.%s
""", v_columns, v_table_name);
END;
Here is how this works:
CONTAINS_SUBSTR()
or REGEXP_CONTAINS()
multiple times (once for each of the tokens then
combining the results), a relatively simpler way is to use REGEXP_CONTAINS()
once and leverage the or
’ing
capability of regular expressions.‘_3’
, or ‘l_4’
, or ‘1’
is '_3|l_4|1'
.REGEXP_CONTAINS()
as described in the RegEx scenario above.