We wish to sort the rows of a table by a transformation of the values of one or more columns instead of by the values themselves. A common example is we want to sort by the numeric part of a string column.
While we can create a new column by transforming existing columns, there are situations where we don’t wish to keep an intermediate value column that we only need for sorting.
We will cover the following custom sorting scenarios:
We wish to sort the rows of a table by a transformation of one column. We will cover two common situations:
Case Insensitive
We wish to sort by a string column in a case insensitive manner i.e. for the purpose of sorting, we wish to treat upper and lower case letters as the same. With string columns, sorting is case sensitive by default, meaning upper case text will appear first when sorting in ascending order and last when sorting in descending order.
In this example, we wish to sort the table dummy_table
in ascending order of a string column with the name col_2
in
a case insensitive manner.
SELECT *
FROM refcon.examples.dummy_table
ORDER BY LOWER(col_2);
Here is how this works:
ORDER BY
.LOWER()
to convert all characters to lowercase. For detailed coverage of working with strings in SQL
see String Operations.col_2
is created, used for sorting, then discarded. The original values of col_2
stay
as is.Numeric String
We wish to sort by the numerical component of a string column. In addition, we wish to order the numeric characters in a
numeric order, not a character order. For example, the ascending order should be ‘9’
then ‘10’
and not ‘10’
then ‘9’
.
In this example, we wish to sort the table dummy_table
in ascending order of the numerical substring of a string
column with the name col_9
in numeric order (not character order). The values of col_9
are of the
format order_<val>
where val
can be any integer e.g. ‘order_25'
.
SELECT *
FROM refcon.examples.dummy_table
ORDER BY SAFE_CAST(REPLACE(col_9, 'order_', '') AS INT64);
Here is how this works:
col_9
is comprised of values of the format order_<val>
where val
can be any integer
e.g. ‘order_25'
. We use REPLACE()
to remove the character prefix ‘order_’
from every value in the column col_9
. This leaves the numeric part e.g. ‘order_25’
becomes ‘25’
. For detailed coverage of working with strings in SQL
see String Operations.SAFE_CAST()
to convert the data type of the remaining numeric string to an integer data type.ORDER BY
to sort the rows of the
table.col_9
is created, used for sorting, then discarded. The original values of col_9
stay as
is.We wish to sort the rows of a table by a transformation of multiple columns. In particular, we wish to apply the same transformation to each of a set of columns individually then sort by the transformed columns.
In this example, we wish to sort the rows of a table by the lowercase transformation of columns col_2
and col_9
.
SELECT *
FROM refcon.examples.dummy_table
ORDER BY LOWER(col_2), LOWER(col_9);
Here is how this works:
col_2
and col_9
is transformed individually and the table is arranged in ascending order of
the lowercase transformation of col_2
then rows that have the same lowercase value for col_9
are sorted in
ascending order of the lower case transformation of col_9
.We wish to sort the rows of a table by transformations of multiple columns. In particular, we wish to apply a different transformation to each of a set of columns then sort by the transformed columns.
In this example, we wish to sort the rows of a table in ascending order of the lowercase transformation of the
column col_2
and the suffix after the underscore in col_9
.
SELECT *
FROM refcon.examples.dummy_table
ORDER BY LOWER(col_2), REGEXP_REPLACE(col_9, '.*_', '');
Here is how this works:
ORDER BY
can be a distinct transformation of any one or more columns.col_2
and the second sorting variable
is a substring of col_9
(whatever is after the underscore _
).We wish to sort the rows of a table by a function of multiple columns. In particular, we wish to sort by one variable which is the output of a function of multiple columns. For example, we wish to sort by the ratio of two columns.
In this example, we wish to sort the rows of a table by the ratio between two columns col_6
and col_3
.
SELECT *
FROM refcon.examples.dummy_table
ORDER BY col_6 * 1.0 / col_3;
Here is how this works:
ORDER BY
.col_6
are divided by the corresponding values of col_3
. The resulting ratio is then used to sort the
rows of the table in ascending order.We wish to sort an ordinal categorical variable in a domain accurate manner, not in alphanumeric order. An ordinal categorical variable is one where there is a natural order e.g. responses in a survey. See Factor Operations for a coverage of working with factor data.
In this example, we have a table that has a column col_10
which is categorical variable holding t-shirt sizes XS, S,
M, L, XL. We wish to sort the rows of the table in the natural order of the t-shirt sizes i.e. XS < S < M < L < XL.
Value Mapping
SELECT *
FROM refcon.examples.dummy_table
ORDER BY (CASE
WHEN col_10 = 'XS' THEN 0
WHEN col_10 = 'S' THEN 1
WHEN col_10 = 'M' THEN 2
WHEN col_10 = 'L' THEN 3
WHEN col_10 = 'XL' THEN 4
WHEN col_10 = 'XXL' THEN 5
ELSE 6 END
);
Here is how this works:
CASE WHEN
to map values of the column col_10
to an integer that defines their sorting order.
See General Operations for a coverage of conditional statements in SQL.ORDER BY
sorts the table based on the output of CASE WHEN
.Missing values in SQL come at the top when sorting in ASC
order and at the bottom when sorting DESC
order. In this
section we cover how to change this behavior and show rows with missing values at the opposite side of the default
position.
Ascending
We want to show missing values at the bottom (instead of at the top) when sorting the table in ASC
based on col_5
.
SELECT *
FROM refcon.examples.dummy_table
ORDER BY col_5 IS NULL, col_5;
Here is how this works:
col_5 IS NULL
first. It evaluates to TRUE
when the value of col_5
is NULL
and FALSE
otherwise.
Since FALSE
sorts higher than TRUE
, all NULL
values will be sorted at the end.col_5
. Therefore, rows where col_5 IS NULL
is FALSE
i.e. not NULL
will then be
sorted by the ascending value of col_5
which is the desired outcome.Descending
We want to show missing values at the top (instead of at the bottom) when sorting the table in DESC
based on col_5
.
SELECT *
FROM refcon.examples.dummy_table
ORDER BY col_5 IS NOT NULL, col_5 DESC;
Here is how this works:
col_5 IS NOT NULL
first. It evaluates to TRUE
when the value of col_5
is not NULL
and FALSE
otherwise. Since FALSE
sorts higher than TRUE
, all NULL
values will be sorted first.col_5
. Therefore, rows where col_5 IS NOT NULL
is TRUE
i.e. not NULL
will then
be sorted by the descending value of col_5
which is the desired outcome.