We wish to obtain or learn about the unique values in data.

In particular, this section covers the following:

- In
**Get Unique Values**, we cover how to obtain the actual unique values. - In
**Count Unique Value**, we cover how to obtain the number of possible unique values. - In
**Occurrence Frequency**, we cover how to compute the frequency of occurrence of each unique value. - In
**Occurrence Proportion**, we cover how to compute the ratio between the number of occurrences of each unique value to the total number of occurrences.

In each of those, we cover two scenarios:

**One Column**: The unique values a single column can take.**Multiple Columns**: The unique combinations a set of columns can take.

We wish to obtain unique values.

**One Column**

We wish to obtain the unique values in one column of a table.

In this example, we wish to obtain the unique values of the column `col_1`

of the table `table_1`

.

```
SELECT DISTINCT col_1
FROM table_1;
```

Here is how this works:

- We use the
`SELECT DISTINCT`

to obtain all unique values taken by a column of a table. - We specify the name of the column whose unique values we wish to obtain after
`DISTINCT`

.

**Multiple Columns**

We wish to obtain the unique combinations of values of a set of columns of a table.

```
SELECT DISTINCT col_1, col_2
FROM table_1;
```

Here is how this works:

This works similarly to the solution above, except we pass multiple columns.

*Extension: Unique Rows*

```
SELECT DISTINCT *
FROM table_1;
```

Here is how this works:

We use `SELECT DISTINCT *`

to get unique rows in a table.

We wish to obtain the number of possible unique values.

**One Column**

We wish to obtain the number of possible unique values in a column of a table.

In this example, we have a table `table_1`

and we wish to obtain the number of unique values of the
column `col_2`

per group, where the groups are specified by the column `col_1`

.

```
SELECT col_1,
COUNT(DISTINCT col_2) col_2_n_distinct
FROM table_1
GROUP BY col_1;
```

Here is how this works:

- We use
`GROUP BY`

clause to obtain a summary for each group. In this example`GROUP BY col_1`

. See Aggregating. - We use
`COUNT(DISTINCT col_2)`

to compute the number of unique values of the column`col_2`

for each group.

**Multiple Columns**

We wish to obtain the number of possible unique combinations of a set of columns of a table.

In this example, we have a table `table_1`

and we wish to obtain the number of unique combinations
of the columns `col_2`

and `col_3`

per group, where the groups are specified by the column `col_1`

.

```
SELECT col_1,
COUNT(DISTINCT CONCAT(col_2, col_3)) AS col_2_3_n_distinct
FROM table_1
GROUP BY col_1;
```

Here is how this works:

We can't pass multiple columns to `COUNT(DISTINCT )`

, so we concatenate the values of `col_2`

and `col_3`

using `CONCAT()`

function. This will return the number of unique combinations of those
columns per group.

We wish to compute the frequency of occurrence of each unique value.

**One Column**

We wish to compute the frequency of occurrence of each unique value of a column of a table.

In this example, we wish to obtain the number of occurrences of each unique value of the
column `col_1`

.

```
SELECT col_1,
COUNT(col_1) n
FROM table_1
GROUP BY col_1
ORDER BY n DESC;
```

Here is how this works:

- We use
`GROUP BY col_1`

clause to obtain unique values for`col_1`

. See Aggregating. - We use
`COUNT(col_1)`

to compute the number of occurrences for each value of the column`col_1`

. - Typically, when we look at unique value counts, we wish to see them sorted in descending order of
frequency. To do that, we use
`ORDER BY n DESC`

.

*Extension: Most Frequent Value*

We wish to obtain the most frequent value taken by a column of a table.

In this example, we have a table `table_1`

and we wish to obtain the most frequent value of the
column `col_2`

per group where the groups are defined by the value of the column `col_1`

.

```
WITH ranked_table AS
(SELECT col_1,
col_2,
ROW_NUMBER()
OVER (
PARTITION BY col_1
ORDER BY COUNT(col_2) DESC ) row_rank
FROM table_1
GROUP BY col_1, col_2)
SELECT col_1, col_2 AS col_2_mode
FROM ranked_table
WHERE row_rank = 1
```

Here is how this works:

- First we
`GROUP BY col_1, col_2`

to get the rank of each value in`col_2`

based on its frequency. See Aggregating. - We do this using
`ROW_NUMBER()`

window function where we:`PARTITION BY col_1`

to get rank per group.`ORDER BY COUNT(col_2) DESC`

to order by frequency. The most frequent value will have 1 as the row number.

- We then filter
`ranked_table`

using`WHERE row_rank = 1`

to get the most frequent value of`col_2`

per group. - See Ranking for a coverage of computing the ranks of values in a column.

**Multiple Columns**

We wish to compute the frequency of occurrence of each unique combination of values of a set of columns of a table.

In this example, we wish to obtain the number of occurrences of each unique combination of values of
the columns `col_1`

and `col_2`

.

```
SELECT col_1,
col_2,
COUNT(1) n
FROM table_1
GROUP BY col_1, col_2
ORDER BY n DESC ;
```

Here is how this works:

- We use
`GROUP BY col_1, col_2`

clause to obtain unique combinations for`col_1`

and`col_2`

. See Aggregating. - We use
`COUNT(1)`

to compute the number of occurrences for each combination. - Typically, when we look at unique value counts, we wish to see them sorted in descending order of
frequency. To do that, we use
`ORDER BY n DESC`

. See Sorting.

We wish to compute the ratio between the number of occurrences of each unique value to the total number of occurrences.

**One Column**

We wish to compute the ratio between the number of occurrences of each unique value of a column of a table to the length of the column.

In this example, we wish to obtain the ratio between the number of occurrences of each unique value
of the column `col_1`

of the table `table_1`

to the length of the column.

```
SELECT col_1,
COUNT(1) n,
COUNT(1) / (SUM(COUNT(1)) OVER ()) AS percent
FROM table_1
GROUP BY col_1;
```

Here is how this works:

- We use
`COUNT(1)`

to obtain the frequency of occurrence of each unique value as described under Occurrence Frequency above. `SUM(COUNT(1)) OVER ()`

is a window function that counts the number of rows in the table and ignores the`GROUP BY`

clause.- The output is a table with one row for each unique value in
`col_1`

. It has three columns:`col_1`

,`n`

, and`percent`

.`col_1`

holds the unique values,`n`

holds the number of occurrences of the corresponding unique value, and`percent`

holds the proportion of occurrence of the corresponding unique value.

**Multiple Columns**

We wish to compute the ratio between the number of occurrences of each unique combination of values of a set of columns of a table to the length of the table.

In this example, we wish to obtain the ratio between the number of occurrences of each unique
combination of the values of the columns `col_1`

and `col_2`

of the table `table_1`

to the length of
the table.

```
SELECT col_1,
col_2,
COUNT(1) / (SUM(COUNT(1)) OVER ()) AS percent
FROM table_1
GROUP BY col_1, col_2;
```

Here is how this works:

- This works similarly to the "One Column" solution above except we group by the names of the set of columns whose unique combinations we are interested in.
- See Inspecting Factor by Factor for how to create a cross-table between two columns.

SQL