For each group in a column, we wish to filter rows by the rank of a value of a specific column.
This section is organized as follows:
We wish to return the row (or rows) where a particular column has its largest value for each group.
In this example, we wish to return the row (or rows) of each group where the column col_1
has its max value for the
group. The groups are defined by the column col_10
.
SELECT *
FROM refcon.examples.dummy_table
QUALIFY DENSE_RANK() OVER (PARTITION BY col_10 ORDER BY col_1 DESC ) = 1;
Here is how this works:
DENSE_RANK()
returns a ranking of the rows by values of a given column. It is different from regular ranking in that
it leaves no gaps between ranks. See Ranking.PARTITION BY col_10
to calculate the rank per group and sort by col_1
in a descending manner i.e. rank 1 is
the largest, not the smallest value.QUALIFY = 1
evaluates to TRUE
for rows where the rank of the value of col_1
is any of its three
largest values for the group.Alternatively:
WITH group_max AS
(SELECT col_10, MAX(col_1) col_1_max
FROM refcon.examples.dummy_table
GROUP BY col_10)
SELECT *
FROM refcon.examples.dummy_table t
INNER JOIN group_max m
ON t.col_10 = m.col_10
AND t.col_1 = m.col_1_max;
Here is how this works:
MAX(col_1)
in a separate CTE by grouping by col_10
.group_max
with dummy_table on both col_10
and col_1_max
. This will only keep rows with a value
equal to the max of the group.We wish to return the row (or rows) where a particular column has its smallest value for each group.
In this example, we wish to return the row (or rows) of each group where the column col_1
has its smallest value for
the group. The groups are defined by the column col_10
.
SELECT *
FROM refcon.examples.dummy_table
QUALIFY DENSE_RANK() OVER (PARTITION BY col_10 ORDER BY col_1 ) = 1
Here is how this works:
This works similarly to the “Largest” scenario above except that we don't order using DESC
so we
get a ranking that starts at the smallest value i.e. rank 1 is the smallest, not the largest value.
We wish to return the row (or rows) where a particular column has its nth largest or smallest value for each group.
Largest
In this example, we wish to return the row (or rows) of each group where the column col_1
has its third-largest value
for the group. The groups are defined by the column col_10
.
SELECT *
FROM refcon.examples.dummy_table
QUALIFY DENSE_RANK() OVER (PARTITION BY col_10 ORDER BY col_1 DESC) = 3
Here is how this works:
This works similarly to the “Largest” scenario above except we filter using QUALIFY =3
.
Smallest
In this example, we wish to return the row (or rows) of each group where the column col_1
has its third-smallest value
for the group. The groups are defined by the column col_10
.
SELECT *
FROM refcon.examples.dummy_table
QUALIFY DENSE_RANK() OVER (PARTITION BY col_10 ORDER BY col_1 ) = 3
Here is how this works:
DESC
keyword after the ORDER BY
clause to order by col_1
in ascending manner. i.e. rank 1 is
the smallest, not the largest value.We wish to return the rows where a particular column has its largest values, defined by a count or a proportion, for each group.
Count
For each group, we wish to return the rows for which a particular column takes its nth largest values.
In this example, we wish to return the rows from each group where the column col_1
takes its three largest values. The
groups are defined by the column col_10
.
SELECT *
FROM refcon.examples.dummy_table
QUALIFY DENSE_RANK() OVER (PARTITION BY col_10 ORDER BY col_1 DESC) <= 3
Here is how this works:
col_1
takes its three
highest values (hence <=3
).QUALIFY
will return n
rows each with a
unique value for the column of interest (col_1
in this example).QUALIFY
will return a number of rows that is greater than n
and the
number of unique values returned will likely be smaller than n
.n
largest unique values then the
solution above is the way to go. Example input: 2, 2, 2, 3, 4. Example Output: 2, 2, 3, 3. For n = 2
n
including the duplicates, and we do not care
that we get n
unique values for the column of interest then we use ROW_NUMBER
instead of DENSE_RANK
. Example
input: 2, 2, 2, 3, 4. Example Output: 2, 2. For n = 2
.n
rows each with a unique value for the column of interest just
like we would get if there were no duplicates. Example input: 2, 2, 2, 3, 4. Example Output: 2, 3. For n = 2
.Alternatively:
WITH ranked_table AS
(SELECT *,
DENSE_RANK() OVER (PARTITION BY col_10 ORDER BY col_1 DESC) row_rank,
FROM refcon.examples.dummy_table)
SELECT t.* EXCEPT (row_rank)
FROM ranked_table t
WHERE row_rank <= 3
QUALIFY ROW_NUMBER() OVER (PARTITION BY col_10,row_rank ORDER BY row_rank) = 1;
Here is how this works:
DENSE_RANK
similar to the scenarios above.row_rank <= 3
evaluates to TRUE
for rows where the rank of the value of col_1
is any of
its three largest values for the group.QUALIFY
on the ranked table partition
by col_10
and row_rank
. Only the first largest value per row_rank
will evaluate to TRUE
.Proportion
For each group, we wish to return a proportion of the rows for which a particular column takes its largest values.
In this example, we wish to return rows from each group where the column col_1
takes a value within its 20% largest
values. The groups are defined by the column col_10
.
SELECT *
FROM refcon.examples.dummy_table
QUALIFY DENSE_RANK() OVER (PARTITION BY col_10 ORDER BY col_1 DESC)
< COUNT(DISTINCT col_1) OVER (PARTITION BY col_10) * 0.2;
Here is how this works:
COUNT(DISTINCT col_1) OVER (PARTITION BY col_10)
returns the number of unique values that the column col_1
takes
per group. See Uniqueness.0.2 * COUNT(DISTINCT col_1)
tells us how many values are 20% of the total number of possible values of the
column col_1
.QUALIFY
evaluates to TRUE
for rows where the rank of the value of col_1
is within the top
20% ranked values.We wish to return the rows where a particular column has its smallest values, defined by a count or a proportion, for each group.
In this example, we wish to return the rows from each group where the column col_1
takes its three smallest values.
The groups are defined by the column col_10
.
SELECT *
FROM refcon.examples.dummy_table
QUALIFY DENSE_RANK() OVER (PARTITION BY col_10 ORDER BY col_1) <= 3
Here is how this works:
DESC
keyword.For each group, we wish to return the rows where a particular column takes a value whose rank falls in a given range of value ranks.
In this example, we wish to return rows that take the second through to the fifth ranked possible values of the
column col_1
. The groups are defined by the column col_10
SELECT *
FROM refcon.examples.dummy_table
QUALIFY DENSE_RANK() OVER (PARTITION BY col_10 ORDER BY col_1 DESC)
BETWEEN 2 AND 5
Here is how this works:
BETWEEN 2 AND 5
evaluates to TRUE
for rows where the rank of the value
of col_1
is between 2 and 5 (inclusive) for the group.For each group, we wish to return the rows where a particular column takes a value whose rank is one of a set of value ranks.
In this example, for each group, we wish to return rows where the column col_1
takes values that are ranked first,
third or fifth among the possible values of the column col_1
. The groups are defined by the column col_10
SELECT *
FROM refcon.examples.dummy_table
QUALIFY DENSE_RANK() OVER (PARTITION BY col_10 ORDER BY col_1 DESC) IN (1, 3, 5)
Here is how this works:
IN (1, 3, 5)
evaluates to TRUE
for rows where the rank of the value
of col_1
take on the values 1, 3, 5.