Selecting by Position

We wish to select one or more column from a table, and we wish to identify the columns to be selected by specifying their position. The positions start at 1 for the left most column.

Single Column

We wish to select a single column from a table by specifying its position.

BEGIN

    DECLARE v_column_position INT64;
    DECLARE v_table_name STRING;
    DECLARE v_columns STRING;

    SET v_column_position = 1;
    SET v_table_name = 'table_1';
    SET v_columns =
            (SELECT column_name
             FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
             WHERE table_name = v_table_name
               AND ordinal_position = v_column_position);
    EXECUTE IMMEDIATE FORMAT("""SELECT %s
                                FROM refcon.dataset.%s
                             """, v_columns, v_table_name);
END;

Here is how this works:

  • First we define and set our variables, so we can use this snippet with any table.
  • INFORMATION_SCHEMA.COLUMNS contains the columns info for each table in the dataset. We get the column names and keep only the column with the ordinal_position that equals 1.
  • We use EXECUTE IMMEDIATE to execute our dynamic query using the v_table and v_columns variables.
  • We achieve this using the FORMAT function which formats a data type expression as a string. %s will be replaced with the variable values in the order of listing the variables.
  • The output of FORMAT will be SELECT col_1 FROM refcon.dataset.table_1.

List of Columns

We wish to specify the columns that we wish to select by their position.

BEGIN
    DECLARE v_column_positions ARRAY <INT64>;
    DECLARE v_table_name STRING;
    DECLARE v_columns STRING;

    SET v_column_positions = [1,2,3];
    SET v_table_name = 'table_1';
    SET v_columns = (WITH table_columns AS
                              (SELECT column_name
                               FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
                               WHERE table_name = v_table_name
                                 AND ordinal_position IN UNNEST(v_column_positions))
                     SELECT STRING_AGG(column_name) AS columns
                     FROM table_columns);
    EXECUTE IMMEDIATE FORMAT("""SELECT %s
                                FROM refcon.dataset.%s
                             """, v_columns, v_table_name);
END;

Here is how this works:

  • First we define and set our variables, so we can use this snippet with any table.
  • INFORMATION_SCHEMA.COLUMNS contains the columns info for each table in the dataset.
  • We get the column names and keep only the columns with the ordinal_position that equals one of the v_column_positions array elements.
  • To use the IN operator we need to UNNEST the array first.
  • We apply STRING_AGG to get the column names as a comma separated string. i.e. col_1,col_2,col_3 and set v_columns variable.
  • We use EXECUTE IMMEDIATE to execute our dynamic query using the v_table and v_columns variables.
  • We achieve this using the FORMAT function which formats a data type expression as a string. %s will be replaced with the variable values in the order of listing the variables.
  • The output of FORMAT will be SELECT col_1,col_2,col_3 FROM refcon.dataset.table_1.

Range of Columns

We wish to select every column between a given start column and end column including both start and end. We wish to specify the start and end column by their position.

BEGIN
    DECLARE v_start_position INT64;
    DECLARE v_end_position INT64;
    DECLARE v_table_name STRING;
    DECLARE v_columns STRING;

    SET v_start_position = 1;
    SET v_end_position = 3;
    SET v_table_name = 'table_1';
    SET v_columns = (WITH table_columns AS
                              (SELECT column_name
                               FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
                               WHERE table_name = v_table_name
                                 AND ordinal_position >= v_start_position
                                 AND ordinal_position <= v_end_position)
                     SELECT STRING_AGG(column_name) AS columns
                     FROM table_columns);
    EXECUTE IMMEDIATE FORMAT("""SELECT %s
                                FROM refcon.dataset.%s
                             """, v_columns, v_table_name);
END;

Here is how this works:

This works similarly to the scenario described above, except that we filter using v_start_position and v_end_position variables.

Relative to End

We wish to select a subset of columns in a table by specifying their position. However, we wish to specify the position relative to the end of the table (the right end) rather than relative to the beginning (the left end).

In this example, we wish to select the right most column and the column that is the third from the right of a table.

BEGIN
    DECLARE v_start_position INT64;
    DECLARE v_end_position INT64;
    DECLARE v_table_name STRING;
    DECLARE v_offset INT64;
    DECLARE v_columns STRING;

    SET v_start_position = -3;
    SET v_end_position = -1;
    SET v_table_name = 'table_1';
    SET v_offset =
                (SELECT COUNT(1)
                 FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
                 WHERE table_name = v_table_name
                   AND ordinal_position IS NOT NULL) + 1;
    SET v_columns =
            (WITH table_columns AS
                      (SELECT column_name
                       FROM refcon.dataset.INFORMATION_SCHEMA.COLUMNS
                       WHERE table_name = v_table_name
                         AND ordinal_position >= v_start_position + v_offset
                         AND ordinal_position <= v_end_position + v_offset)
             SELECT STRING_AGG(column_name) AS columns
             FROM table_columns);
    EXECUTE IMMEDIATE FORMAT("""SELECT %s
                                FROM refcon.dataset.%s
                             """, v_columns, v_table_name);

END;

Here is how this works:

  • First we get the column count in the table from INFORMATION_SCHEMA.COLUMNS. We only keep columns where ordinal_position is not NULL. This is to exclude any hidden columns such as partition_time.
  • We calculate the offset by adding 1 to the column count so the right most column index is -1. i.e. if the table has 10 columns, then -1 + 10 + 1 = 10 which is the right most column.
  • We get the column names and keep only the columns with the ordinal_position that lies between v_start_position + v_offset and v_end_position + v_offset.
  • We apply STRING_AGG to get the column names as a comma separated string. i.e. col_1,col_2,col_3 and set v_columns variable.
  • We use EXECUTE IMMEDIATE to execute our dynamic query using the v_table and v_columns variables.
  • We achieve this using the FORMAT function which formats a data type expression as a string. %s will be replaced with the variable values in the order of listing the variables.
SQL
I/O