We wish to obtain or learn about the unique values in data.
In particular, this section covers the following:
In each of those, we cover two scenarios:
We wish to obtain unique values.
One Column
We wish to obtain the unique values in one column of a table.
In this example, we wish to obtain the unique values of the column col_1
of the table table_1
.
SELECT DISTINCT col_1
FROM table_1;
Here is how this works:
SELECT DISTINCT
to obtain all unique values taken by a column
of a table.DISTINCT
.Multiple Columns
We wish to obtain the unique combinations of values of a set of columns of a table.
SELECT DISTINCT col_1, col_2
FROM table_1;
Here is how this works:
This works similarly to the solution above, except we pass multiple columns.
Extension: Unique Rows
SELECT DISTINCT *
FROM table_1;
Here is how this works:
We use SELECT DISTINCT *
to get unique rows in a table.
We wish to obtain the number of possible unique values.
One Column
We wish to obtain the number of possible unique values in a column of a table.
In this example, we have a table table_1
and we wish to obtain the number of unique values of the
column col_2
per group, where the groups are specified by the column col_1
.
SELECT col_1,
COUNT(DISTINCT col_2) col_2_n_distinct
FROM table_1
GROUP BY col_1;
Here is how this works:
GROUP BY
clause to obtain a summary for each group. In this example GROUP BY col_1
.
See Aggregating.COUNT(DISTINCT col_2)
to compute the number of unique values of the column col_2
for each group.Multiple Columns
We wish to obtain the number of possible unique combinations of a set of columns of a table.
In this example, we have a table table_1
and we wish to obtain the number of unique combinations
of the columns col_2
and col_3
per group, where the groups are specified by the column col_1
.
SELECT col_1,
COUNT(DISTINCT CONCAT(col_2, col_3)) AS col_2_3_n_distinct
FROM table_1
GROUP BY col_1;
Here is how this works:
We can't pass multiple columns to COUNT(DISTINCT )
, so we concatenate the values of col_2
and col_3
using CONCAT()
function. This will return the number of unique combinations of those
columns per group.
We wish to compute the frequency of occurrence of each unique value.
One Column
We wish to compute the frequency of occurrence of each unique value of a column of a table.
In this example, we wish to obtain the number of occurrences of each unique value of the
column col_1
.
SELECT col_1,
COUNT(col_1) n
FROM table_1
GROUP BY col_1
ORDER BY n DESC;
Here is how this works:
GROUP BY col_1
clause to obtain unique values for col_1
.
See Aggregating.COUNT(col_1)
to compute the number of occurrences for each value of the column col_1
.ORDER BY n DESC
.Extension: Most Frequent Value
We wish to obtain the most frequent value taken by a column of a table.
In this example, we have a table table_1
and we wish to obtain the most frequent value of the
column col_2
per group where the groups are defined by the value of the column col_1
.
WITH ranked_table AS
(SELECT col_1,
col_2,
ROW_NUMBER()
OVER (
PARTITION BY col_1
ORDER BY COUNT(col_2) DESC ) row_rank
FROM table_1
GROUP BY col_1, col_2)
SELECT col_1, col_2 AS col_2_mode
FROM ranked_table
WHERE row_rank = 1
Here is how this works:
GROUP BY col_1, col_2
to get the rank of each value in col_2
based on its frequency.
See Aggregating.ROW_NUMBER()
window function where we:PARTITION BY col_1
to get rank per group.ORDER BY COUNT(col_2) DESC
to order by frequency. The most frequent value will have 1 as the
row number.ranked_table
using WHERE row_rank = 1
to get the most frequent value of col_2
per group.Multiple Columns
We wish to compute the frequency of occurrence of each unique combination of values of a set of columns of a table.
In this example, we wish to obtain the number of occurrences of each unique combination of values of
the columns col_1
and col_2
.
SELECT col_1,
col_2,
COUNT(1) n
FROM table_1
GROUP BY col_1, col_2
ORDER BY n DESC ;
Here is how this works:
GROUP BY col_1, col_2
clause to obtain unique combinations for col_1
and col_2
.
See Aggregating.COUNT(1)
to compute the number of occurrences for each combination.ORDER BY n DESC
.
See Sorting.We wish to compute the ratio between the number of occurrences of each unique value to the total number of occurrences.
One Column
We wish to compute the ratio between the number of occurrences of each unique value of a column of a table to the length of the column.
In this example, we wish to obtain the ratio between the number of occurrences of each unique value
of the column col_1
of the table table_1
to the length of the column.
SELECT col_1,
COUNT(1) n,
COUNT(1) / (SUM(COUNT(1)) OVER ()) AS percent
FROM table_1
GROUP BY col_1;
Here is how this works:
COUNT(1)
to obtain the frequency of occurrence of each unique value as described
under Occurrence Frequency above.SUM(COUNT(1)) OVER ()
is a window function that counts the number of rows in the table and
ignores the GROUP BY
clause.col_1
. It has three
columns: col_1
, n
, and percent
. col_1
holds the unique values, n
holds the number of
occurrences of the corresponding unique value, and percent
holds the proportion of occurrence of
the corresponding unique value.Multiple Columns
We wish to compute the ratio between the number of occurrences of each unique combination of values of a set of columns of a table to the length of the table.
In this example, we wish to obtain the ratio between the number of occurrences of each unique
combination of the values of the columns col_1
and col_2
of the table table_1
to the length of
the table.
SELECT col_1,
col_2,
COUNT(1) / (SUM(COUNT(1)) OVER ()) AS percent
FROM table_1
GROUP BY col_1, col_2;
Here is how this works: