Column Selection

We wish to identify the columns on each of which we will apply filtering logic.

We will cover the following scenarios

  • All Columns where we cover how to apply a logical expression to all columns of a table and return the rows for which any of the columns satisfy the expression.
  • Explicit Selection where we cover how to apply a logical expression to each of a set of explicitly selected columns of a table (e.g. by spelling out the names of the columns of interest) and return the rows for which any of the columns satisfy the expression.
  • Implicit Selection where we cover how to apply a logical expression to each of a set of implicitly selected columns of a table (e.g. by selecting columns whose names contain a certain substring) and return the rows for which any of the columns satisfy the expression.

This section is complemented by

  • Relationship Specification where we cover how to combine the results of applying the specified function(s) to the specified column(s) in either an AND manner or an OR manner.

All Columns

We wish to apply a logical expression to every column and to return any row for which any column satisfies the logical expression.

In this example, we wish to return any row in the table dummy_table for which any column has a missing value NULL.

BEGIN
    DECLARE v_condition STRING;
    SET v_condition =
            (SELECT STRING_AGG(CONCAT(column_name, ' IS NULL '), 'OR ')
             FROM refcon.examples.INFORMATION_SCHEMA.COLUMNS
             WHERE table_name = 'dummy_table');
    EXECUTE IMMEDIATE FORMAT("""
                                SELECT *
                                FROM refcon.examples.dummy_table
                                WHERE %s
                             """, v_condition);

END;

Here is how this works:

  • First, we want to construct a null condition over all columns in the table.
  • INFORMATION_SCHEMA provides metadata for each table in the dataset and COLUMNS view returns the info for each column in the table.
  • We CONCAT the column name with IS NULL to get col_1 IS NULL.
  • We use STRING_AGG with OR as a separator to get col_1 IS NULL OR col_2 IS NULL OR ...
  • We use EXECUTE IMMEDIATE to execute our dynamic query using the v_condition variable.
  • We achieve this using the FORMAT function which formats a data type expression as a string. %s will be replaced with the generated null condition.

Explicit Selection

We wish to apply a logical expression to a set of explicitly specified column and to return any row for which any of those columns satisfies the logical expression.

In this example, we wish to return any row in the table dummy_table for which any column starting at col_1 and until col_5 has a missing value NULL.

BEGIN
    DECLARE v_start_position INT64;
    DECLARE v_end_position INT64;
    DECLARE v_table_name STRING;
    DECLARE v_condition STRING;

    SET v_start_position = 1;
    SET v_end_position = 5;
    SET v_table_name = 'dummy_table';
    SET v_condition =
            (SELECT STRING_AGG(CONCAT(column_name, ' IS NULL '), 'OR ')
             FROM refcon.examples.INFORMATION_SCHEMA.COLUMNS
             WHERE table_name = v_table_name
               AND ordinal_position >= v_start_position
               AND ordinal_position <= v_end_position);
    EXECUTE IMMEDIATE FORMAT("""SELECT *
                                FROM refcon.examples.%s
                                WHERE %s
                             """, v_table_name, v_condition);
END;

Here is how it works:

This works similarly to the scenario above, except we filter based on ordinal_position and keep only columns between col_1 and col_5.

Implicit Selection

We wish to apply a logical expression to a set of implicitly specified column and to return any row for which any of those columns satisfies the logical expression. Implicit column selection is when we do not spell out the column names or positions explicitly but rather identify the columns via a property of their name or their data.

In this example, we wish to return any row in the table dummy_table for which the value of all columns of type INT64 is bigger than 0.

BEGIN
    DECLARE v_table_name STRING;
    DECLARE v_condition STRING;
    DECLARE v_data_type STRING;
    SET v_table_name = 'dummy_table';
    SET v_data_type = 'INT64';
    SET v_condition =
            (SELECT STRING_AGG(CONCAT(column_name, ' > 0 '), 'AND ')
             FROM refcon.examples.INFORMATION_SCHEMA.COLUMNS
             WHERE table_name = v_table_name
               AND data_type = v_data_type);
    EXECUTE IMMEDIATE FORMAT("""SELECT *
                                FROM refcon.examples.%s
                                WHERE %s
                             """, v_table_name, v_condition);
END;

Here is how it works:

This works similarly to the scenario above, except we filter data type, and we change the IS NULL condition to > 0.

SQL
I/O