Exclude Columns

At times, we wish to exclude one or more columns and retain the rest. At other times the criteria to describe the columns that we wish to exclude are easier to formulate than their complement (the criteria for columns we wish to retain). In such situations we need to exclude some columns and return the rest. There are four common column exclusion scenarios which we cover in this section:

  1. Exclude by Name: Exclude one or more columns given their names.
  2. Exclude by Position: Exclude one or more columns given their positions.
  3. Select Complement: Exclude columns that do not match a condition.
  4. Exclude Data Type: Exclude columns of one or more data types.

By Name

We wish to exclude one or more columns of a table given their names and to return the table’s remaining columns.

In this example, we wish to exclude the columns named col_1 and col_5 and return the remaining columns.

SELECT * EXCEPT (col_1,col_5)
FROM refcon.dataset.table_1;

Here is how this works:

  • SELECT * means select all columns in the table .
  • EXCEPT (col_1,col_5) will instruct SELECT to exclude the columns we specify by name.

By Position

We wish to exclude one or more columns of a table given their positions (where 1 is the left most column) and to return the table’s remaining columns.

In this example, we wish to exclude the columns at the first and fifth positions and return the remaining columns.

BEGIN

    DECLARE v_column_positions ARRAY <INT64>;
    DECLARE v_table_name STRING;
    DECLARE v_columns STRING;

    SET v_column_positions = [1,5];
    SET v_table_name = 'table_1';
    SET v_columns =
            (WITH table_columns AS
                      (SELECT column_name
                       FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
                       WHERE table_name = v_table_name
                         AND ordinal_position NOT IN UNNEST(v_column_positions))
             SELECT STRING_AGG(column_name) AS columns
             FROM table_columns);
    EXECUTE IMMEDIATE FORMAT("""SELECT %s 
                                FROM refcon.dataset.%s
                             """, v_columns, v_table_name);
END;

Here is how this works:

  • First we define and set our variables, so we can use this snippet with any table.
  • INFORMATION_SCHEMA.COLUMNS contains the columns info for each table in the dataset.
  • We get the column names and keep only the columns with the ordinal_position that does not equal any of the v_column_positions array elements.
  • To use the IN operator we need to UNNEST the array first.
  • We apply STRING_AGG to get the column names as a comma separated string. i.e. col_1,col_2,col_3,.. and set v_columns variable.
  • We use EXECUTE IMMEDIATE to execute our dynamic query using the v_table and v_columns variables.
  • We achieve this using the FORMAT function which formats a data type expression as a string. %s will be replaced with the variable values in the order of listing the variables.

Select Complement

We wish to exclude columns that match a condition and return the table’s remaining columns.

In this example, we exclude columns that end with 1 and return the remaining columns.

BEGIN
    DECLARE v_table_name STRING;
    DECLARE v_suffix STRING;
    DECLARE v_columns STRING;

    SET v_table_name = 'table_1';
    SET v_suffix = '1';
    SET v_columns = (WITH table_columns AS
                              (SELECT column_name
                               FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
                               WHERE table_name = v_table_name
                                 AND NOT ENDS_WITH(column_name, v_suffix))
                     SELECT STRING_AGG(column_name) AS columns
                     FROM table_columns);
    EXECUTE IMMEDIATE FORMAT("""SELECT %s
                                FROM refcon.dataset.%s
                             """, v_columns, v_table_name);
END;

Here is how this works:

  • First we define and set our variables, so we can use this snippet with any table.
  • INFORMATION_SCHEMA.COLUMNS contains the columns info for each table in the dataset. We get the column names and keep only the columns that match our pattern.
  • ENDS_WITH(column_name, v_suffix) will return TRUE for any column that ends with '1'.
  • We use NOT operator to get the complement of our condition. i.e. all columns that do not end with '1'.
  • We apply STRING_AGG to get the column names as a comma separated string. i.e. col_1, col_11 and set v_columns variable.
  • We use EXECUTE IMMEDIATE to execute our dynamic query using the v_table and v_columns variables.
  • We achieve this using the FORMAT function which formats a data type expression as a string. %s will be replaced with the variable values in the order of listing the variables.

Exclude Data Type

We wish to exclude columns of one or more data types and return the data frame’s remaining columns.

In this example, we wish to exclude both columns of an integer data type and of a logical data type.

BEGIN
    DECLARE v_table_name STRING;
    DECLARE v_data_types ARRAY <STRING>;
    DECLARE v_columns STRING;

    SET v_table_name = 'table_1';
    SET v_data_types = ['INT64','BOOL'];
    SET v_columns = (WITH table_columns AS
                              (SELECT column_name
                               FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
                               WHERE table_name = v_table_name
                                 AND data_type NOT IN UNNEST(v_data_types))
                     SELECT STRING_AGG(column_name) AS columns
                     FROM table_columns);
    EXECUTE IMMEDIATE FORMAT("""SELECT %s
                                FROM refcon.dataset.%s
                             """, v_columns, v_table_name);
END;

Here is how this works:

This works similarly to the "By Position" scenario described above, except that we use filter our based on data types instead of position.

SQL
I/O