Selecting by Data Type

Often times we wish to select columns of one or more data types; for instance to apply a data type specific operation to all of them e.g. select columns with a float data type and then round each to 2 decimal places.

See General Operations for a coverage of data types and data type transformation.

Single Data Type

We wish to select all columns of one particular data type.

In this example, we wish to select all columns of an integer data type;

BEGIN
    DECLARE v_table_name STRING;
    DECLARE v_data_type STRING;
    DECLARE v_columns STRING;

    SET v_table_name = 'table_1';
    SET v_data_type = 'INT64';
    SET v_columns = (WITH table_columns AS
                              (SELECT column_name
                               FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
                               WHERE table_name = v_table_name
                                 AND data_type = v_data_type)
                     SELECT STRING_AGG(column_name) AS columns
                     FROM table_columns);
    EXECUTE IMMEDIATE FORMAT("""SELECT %s
                                FROM refcon.dataset.%s
                             """, v_columns, v_table_name);
END;

Here is how this works:

  • First we define and set our variables, so we can use this snippet with any table.
  • INFORMATION_SCHEMA.COLUMNS contains the columns info for each table in the dataset including data type. We get the column names and keep only the columns that have an integer data type.
  • We apply STRING_AGG to get the column names as a comma separated string. i.e. col_1, col_5, col_6 and set v_columns variable.
  • We use EXECUTE IMMEDIATE to execute our dynamic query using the v_table and v_columns variables.
  • We achieve this using the FORMAT function which formats a data type expression as a string. %s will be replaced with the variable values in the order of listing the variables.
  • The output of FORMAT will be SELECT col_1, col_5, col_6 FROM refcon.dataset.table_1.

Multiple Data Types

We wish to select all columns of two or more data types.

In this example, we wish to select all columns of integer or string data types.

BEGIN
    DECLARE v_table_name STRING;
    DECLARE v_data_types ARRAY <STRING>;
    DECLARE v_columns STRING;

    SET v_table_name = 'table_1';
    SET v_data_types = ['INT64','STRING'];
    SET v_columns = (WITH table_columns AS
                              (SELECT column_name
                               FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
                               WHERE table_name = v_table_name
                                 AND data_type IN UNNEST(v_data_types))
                     SELECT STRING_AGG(column_name) AS columns
                     FROM table_columns);
    EXECUTE IMMEDIATE FORMAT("""SELECT %s
                                FROM refcon.dataset.%s
                             """, v_columns, v_table_name);
END;

Here is how this works:

  • This works similarly to the scenario described above, except that we specify more than one data type using an array.
  • In UNNEST(v_data_types) we use UNNEST to convert an ARRAY into a set of values (rows) to which IN may be applied.
SQL
I/O