Custom Sorting

We wish to sort the rows of a table by a transformation of the values of one or more columns instead of by the values themselves. A common example is we want to sort by the numeric part of a string column.

While we can create a new column by transforming existing columns, there are situations where we don’t wish to keep an intermediate value column that we only need for sorting.

We will cover the following custom sorting scenarios:

  • Single Column: We wish to sort the rows of a table by a transformation of one column. We will cover two common situations: (a) We wish to sort a string column in a case-insensitive manner and (b) We wish to sort by the numeric part of a string column.
  • Multiple Columns: We wish to sort the rows of a table by a transformation of multiple columns. In particular, we wish to apply the same transformation to each of a set of columns individually and then sort by the transformed columns.
  • Multiple Transformations: We wish to sort the rows of a table by transformations of multiple columns. In particular, we wish to apply a different transformation to each of a set of columns and then sort by the transformed columns.
  • Multivariate Transformation: We wish to sort the rows of a table by a function of multiple columns. In particular, we wish to sort by one variable which is the output of a function of multiple columns. For example, we wish to sort by the ratio of two columns.
  • Factor Sorting: A special case of custom sorting is the sorting of categorical variables e.g. t-shirt sizes S, M, L, XL.
  • Missing Values: Rows where the sorting columns have missing values are sorted at the end by default. In some situations, we may want to sort them differently e.g. at the beginning.

Single Column

We wish to sort the rows of a table by a transformation of one column. We will cover two common situations:

  • Case Insensitive: We wish to sort a string column in a case-insensitive manner.
  • Numeric String: We wish to sort by the numeric part of a string column.

Case Insensitive

We wish to sort by a string column in a case insensitive manner i.e. for the purpose of sorting, we wish to treat upper and lower case letters as the same. With string columns, sorting is case sensitive by default, meaning upper case text will appear first when sorting in ascending order and last when sorting in descending order.

In this example, we wish to sort the table dummy_table in ascending order of a string column with the name col_2 in a case insensitive manner.

SELECT *
FROM refcon.examples.dummy_table
ORDER BY LOWER(col_2);

Here is how this works:

  • To sort by the transformed values of a column, we simply apply the transformation we want after ORDER BY.
  • We use LOWER() to convert all characters to lowercase. For detailed coverage of working with strings in SQL see String Operations.
  • An all lower case version of col_2 is created, used for sorting, then discarded. The original values of col_2 stay as is.

Numeric String

We wish to sort by the numerical component of a string column. In addition, we wish to order the numeric characters in a numeric order, not a character order. For example, the ascending order should be ‘9’ then ‘10’ and not ‘10’ then ‘9’.

In this example, we wish to sort the table dummy_table in ascending order of the numerical substring of a string column with the name col_9 in numeric order (not character order). The values of col_9 are of the format order_<val> where val can be any integer e.g. ‘order_25'.

SELECT *
FROM refcon.examples.dummy_table
ORDER BY SAFE_CAST(REPLACE(col_9, 'order_', '') AS INT64);

Here is how this works:

  • To sort by the numeric part of a string, we extract the numeric part then convert its data type (type cast it) to an integer type then use that resulting numeric variable to sort the rows of the table.
  • As noted above, column col_9 is comprised of values of the format order_<val> where val can be any integer e.g. ‘order_25'. We use REPLACE() to remove the character prefix ‘order_’ from every value in the column col_9 . This leaves the numeric part e.g. ‘order_25’ becomes ‘25’. For detailed coverage of working with strings in SQL see String Operations.
  • We then use SAFE_CAST() to convert the data type of the remaining numeric string to an integer data type.
  • The output of the above two operations is an integer variable that is then used by ORDER BY to sort the rows of the table.
  • A transformed version of col_9 is created, used for sorting, then discarded. The original values of col_9 stay as is.

Multiple Columns

We wish to sort the rows of a table by a transformation of multiple columns. In particular, we wish to apply the same transformation to each of a set of columns individually then sort by the transformed columns.

In this example, we wish to sort the rows of a table by the lowercase transformation of columns col_2 and col_9.

SELECT *
FROM refcon.examples.dummy_table
ORDER BY LOWER(col_2), LOWER(col_9);

Here is how this works:

  • This works similarly to the Case Insensitive sorting example described above.
  • Each of the columns col_2 and col_9 is transformed individually and the table is arranged in ascending order of the lowercase transformation of col_2 then rows that have the same lowercase value for col_9 are sorted in ascending order of the lower case transformation of col_9.

Multiple Transformations

We wish to sort the rows of a table by transformations of multiple columns. In particular, we wish to apply a different transformation to each of a set of columns then sort by the transformed columns.

In this example, we wish to sort the rows of a table in ascending order of the lowercase transformation of the column col_2 and the suffix after the underscore in col_9.

SELECT *
FROM refcon.examples.dummy_table
ORDER BY LOWER(col_2), REGEXP_REPLACE(col_9, '.*_', '');

Here is how this works:

  • This works similarly to the Case Insensitive sorting example described above.
  • Each sorting variable after ORDER BY can be a distinct transformation of any one or more columns.
  • In this example, the first sorting variable is a lowercase transformation of col_2 and the second sorting variable is a substring of col_9 (whatever is after the underscore _).

Multivariate Transformation

We wish to sort the rows of a table by a function of multiple columns. In particular, we wish to sort by one variable which is the output of a function of multiple columns. For example, we wish to sort by the ratio of two columns.

In this example, we wish to sort the rows of a table by the ratio between two columns col_6 and col_3.

SELECT *
FROM refcon.examples.dummy_table
ORDER BY col_6 * 1.0 / col_3;

Here is how this works:

  • We can specify the transformation logic involving multiple columns after ORDER BY.
  • The values of col_6 are divided by the corresponding values of col_3. The resulting ratio is then used to sort the rows of the table in ascending order.

Factor Sorting

We wish to sort an ordinal categorical variable in a domain accurate manner, not in alphanumeric order. An ordinal categorical variable is one where there is a natural order e.g. responses in a survey. See Factor Operations for a coverage of working with factor data.

In this example, we have a table that has a column col_10 which is categorical variable holding t-shirt sizes XS, S, M, L, XL. We wish to sort the rows of the table in the natural order of the t-shirt sizes i.e. XS < S < M < L < XL.

Value Mapping

SELECT *
FROM refcon.examples.dummy_table
ORDER BY (CASE
              WHEN col_10 = 'XS' THEN 0
              WHEN col_10 = 'S' THEN 1
              WHEN col_10 = 'M' THEN 2
              WHEN col_10 = 'L' THEN 3
              WHEN col_10 = 'XL' THEN 4
              WHEN col_10 = 'XXL' THEN 5
              ELSE 6 END
             );

Here is how this works:

  • We used CASE WHEN to map values of the column col_10 to an integer that defines their sorting order. See General Operations for a coverage of conditional statements in SQL.
  • ORDER BY sorts the table based on the output of CASE WHEN.

Missing Values

Missing values in SQL come at the top when sorting in ASC order and at the bottom when sorting DESC order. In this section we cover how to change this behavior and show rows with missing values at the opposite side of the default position.

Ascending

We want to show missing values at the bottom (instead of at the top) when sorting the table in ASC based on col_5.

SELECT *
FROM refcon.examples.dummy_table
ORDER BY col_5 IS NULL, col_5;

Here is how this works:

  • We sort by col_5 IS NULL first. It evaluates to TRUE when the value of col_5 is NULL and FALSE otherwise. Since FALSE sorts higher than TRUE, all NULL values will be sorted at the end.
  • The second column to sort by is col_5. Therefore, rows where col_5 IS NULL is FALSE i.e. not NULL will then be sorted by the ascending value of col_5 which is the desired outcome.

Descending

We want to show missing values at the top (instead of at the bottom) when sorting the table in DESC based on col_5.

SELECT *
FROM refcon.examples.dummy_table
ORDER BY col_5 IS NOT NULL, col_5 DESC;

Here is how this works:

  • We sort by col_5 IS NOT NULL first. It evaluates to TRUE when the value of col_5 is not NULL and FALSE otherwise. Since FALSE sorts higher than TRUE, all NULL values will be sorted first.
  • The second column to sort by is col_5. Therefore, rows where col_5 IS NOT NULL is TRUE i.e. not NULL will then be sorted by the descending value of col_5 which is the desired outcome.
SQL
I/O