Selecting by Name Pattern

Oftentimes, we wish to select columns whose names fit a particular pattern. This is helpful in situations where we would otherwise need to type in many similar column names explicitly or in situations where we don’t know the exact column names in advance (but we know the pattern they would follow).

Column names are strings. We can therefore apply string matching operations to them. We will cover the string matching operations most commonly used in a column selection context on this page. Please see String Operations for detailed coverage of string matching.

Starts With

We wish to select columns whose names start with a given prefix.

In this example, we select all columns whose name starts with the prefix ‘col_1’.

BEGIN
    DECLARE v_table_name STRING;
    DECLARE v_prefix STRING;
    DECLARE v_columns STRING;

    SET v_table_name = 'table_1';
    SET v_prefix = 'col_1';
    SET v_columns = (WITH table_columns AS
                              (SELECT column_name
                               FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
                               WHERE table_name = v_table_name
                                 AND STARTS_WITH(column_name, v_prefix))
                     SELECT STRING_AGG(column_name) AS columns
                     FROM table_columns);
    EXECUTE IMMEDIATE FORMAT("""SELECT %s
                                FROM refcon.dataset.%s
                             """, v_columns, v_table_name);
END;

Here is how this works:

  • First we define and set our variables, so we can use this snippet with any table.
  • INFORMATION_SCHEMA.COLUMNS contains the columns info for each table in the dataset. We get the column names and keep only the columns that match our pattern.
  • STARTS_WITH(STRING,PREFIX) functions checks if the STRING starts with a given prefix ('col_1' in this example).
  • We apply STRING_AGG to get the column names as a comma separated string. i.e. col_1, col_11,col_12 and set v_columns variable.
  • We use EXECUTE IMMEDIATE to execute our dynamic query using the v_table and v_columns variables.
  • We achieve this using the FORMAT function which formats a data type expression as a string. %s will be replaced with the variable values in the order of listing the variables.
  • The output of FORMAT will be SELECT col_1,col_11,col_12 FROM refcon.dataset.table_1.

Alternatively:

We can use the LIKE operator instead of 'STARTS_WITH'.

BEGIN
    DECLARE v_table_name STRING;
    DECLARE v_pattern STRING;
    DECLARE v_columns STRING;

    SET v_table_name = 'table_1';
    SET v_pattern = 'col_1%';
    SET v_columns = (WITH table_columns AS
                              (SELECT column_name
                               FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
                               WHERE table_name = v_table_name
                                 AND column_name LIKE v_pattern)
                     SELECT STRING_AGG(column_name) AS columns
                     FROM table_columns);
    EXECUTE IMMEDIATE FORMAT("""SELECT %s
                                FROM refcon.dataset.%s
                             """, v_columns, v_table_name);
END;

Here is how this works:

  • It works similarly to the START_WITH solution except we use LIKE operator.
  • X LIKE Y operator checks if the STRING in the first operand X matches a pattern specified by the second operand Y. Expressions can contain these characters:
    • A percent sign "%" matches any number of characters or bytes.
    • An underscore "_" matches a single character or byte.
  • 'col_1%' pattern will match any column name that starts with a col_1.

Ends With

We wish to select columns whose names end with a given suffix.

In this example, we select all columns whose name ends with the suffix ‘_1’.

BEGIN
    DECLARE v_table_name STRING;
    DECLARE v_suffix STRING;
    DECLARE v_columns STRING;

    SET v_table_name = 'table_1';
    SET v_suffix = '_1';
    SET v_columns = (WITH table_columns AS
                              (SELECT column_name
                               FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
                               WHERE table_name = v_table_name
                                 AND ENDS_WITH(column_name, v_suffix))
                     SELECT STRING_AGG(column_name) AS columns
                     FROM table_columns);
    EXECUTE IMMEDIATE FORMAT("""SELECT %s
                                FROM refcon.dataset.%s
                             """, v_columns, v_table_name);
END;

Here is how this works:

This works similarly to the scenario described above, except that we use ENDS_WITH(STRING,SUFFIX).

Contains

We wish to select columns whose names contain a given substring.

In this example, we select all columns whose name contains the substring l_1.

BEGIN
    DECLARE v_table_name STRING;
    DECLARE v_substr STRING;
    DECLARE v_columns STRING;

    SET v_table_name = 'table_1';
    SET v_substr = 'l_1';
    SET v_columns = (WITH table_columns AS
                              (SELECT column_name
                               FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
                               WHERE table_name = v_table_name
                                 AND CONTAINS_SUBSTR(column_name, v_substr))
                     SELECT STRING_AGG(column_name) AS columns
                     FROM table_columns);
    EXECUTE IMMEDIATE FORMAT("""SELECT %s
                                FROM refcon.dataset.%s
                             """, v_columns, v_table_name);
END;

Here is how this works:

This works similarly to the scenario described above, except that we use CONTAINS_SUBSTR(STRING,SUBSTR).

Matches RegEx

We wish to select columns whose names match a given regular expression.

In this example we wish to select all columns whose names end with the digit 1 matched by the regular expression '1$'.

BEGIN
    DECLARE v_table_name STRING;
    DECLARE v_regex STRING;
    DECLARE v_columns STRING;

    SET v_table_name = 'table_1';
    SET v_regex = '1$';
    SET v_columns = (WITH table_columns AS
                              (SELECT column_name
                               FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
                               WHERE table_name = v_table_name
                                 AND REGEXP_CONTAINS(column_name, v_regex))
                     SELECT STRING_AGG(column_name) AS columns
                     FROM table_columns);
    EXECUTE IMMEDIATE FORMAT("""SELECT %s
                                FROM refcon.dataset.%s
                             """, v_columns, v_table_name);
END;

Here is how this works:

This works similarly to the scenario described above, except that we use REGEXP_CONTAINS(STRING,REGEX).

Multiple Tokens

We wish to select columns whose names include any one of a set of substrings.

In this example, we wish to select columns whose names contain any of the words ‘_3’, or ‘l_4’, or ‘1’ .

BEGIN
    DECLARE v_table_name STRING;
    DECLARE v_regex STRING;
    DECLARE v_columns STRING;

    SET v_table_name = 'table_1';
    SET v_regex = '_3|l_4|1';
    SET v_columns = (WITH table_columns AS
                              (SELECT column_name
                               FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
                               WHERE table_name = v_table_name
                                 AND REGEXP_CONTAINS(column_name, v_regex))
                     SELECT STRING_AGG(column_name) AS columns
                     FROM table_columns);
    EXECUTE IMMEDIATE FORMAT("""SELECT %s
                                FROM refcon.dataset.%s
                             """, v_columns, v_table_name);
END;

Here is how this works:

  • Instead of executing CONTAINS_SUBSTR() or REGEXP_CONTAINS() multiple times (once for each of the tokens then combining the results), a relatively simpler way is to use REGEXP_CONTAINS() once and leverage the or’ing capability of regular expressions.
  • A regular expression that matches ‘_3’, or ‘l_4’, or ‘1’ is '_3|l_4|1'.
  • Now that we have the regular expression we need, we use REGEXP_CONTAINS() as described in the RegEx scenario above.
SQL
I/O