In this page, we will cover the functions to use to carry out some of the most common data aggregation operations.

In its simplest form, a data aggregation function is one which takes multiple individual values and returns a single summary value; e.g. sum.

We have extended sections where we cover the data manipulation operations relevant to each data type quite extensively (see the data type operation sections in the navigation bar). Think of this page as a brief overview of the most common aggregation functions for each data type with links to the respective operations section for a coverage.

Note: In this page, we demonstrate the data aggregation operations in a scenario where a table
is grouped by one column. We can drop the `GROUP BY`

to aggregate over the entire table (without
grouping). We can also group by multiple columns or in myriad other ways (
See Grouping).

We wish to count the number of rows or the number of unique values in a group.

In this example, we wish to count the number of rows in each group and the number of unique values
of the column `col_2`

in each group. We wish to carry out the aggregation on the table `table_1`

grouped by the column `col_1`

.

```
SELECT col_1,
COUNT(1) AS row_count,
COUNT(DISTINCT col_2) col_2_vals
FROM table_1
GROUP BY col_1;
```

Here is how this works:

- The
`GROUP BY`

clause groups the rows of a table by the values in one or more columns. The rows are then assigned to a group based on the values in the grouping column(s). In this example, the rows are being grouped by the values in the col_1 column. `COUNT(1)`

gives the current group size. See Counting.`COUNT(DISTINCT col_2)`

gives the number of unique values. See Uniqueness.

We wish to aggregate data by computing an arithmatic operation, e.g. sum, for each group.

In this example, we wish to compute the sum of values of the column `col_2`

for each group and the
sum of the product of the values of the columns `col_3`

and `col_4`

for each group. We wish to carry
out the aggregation on the table `table_1`

grouped by the column `col_1`

.

```
SELECT col_1,
SUM(col_2) col_2_sum,
SUM(col_3 * col_4) col_3_4_ws
FROM table_1
GROUP BY col_1;
```

Here is how this works:

- The
`GROUP BY`

clause groups the rows of a table by the values in one or more columns. The rows are then assigned to a group based on the values in the grouping column(s). In this example, the rows are being grouped by the values in the col_1 column. - We then use the built-in function
`SUM()`

to compute the sum of values of`col_2`

. - In
`SUM(col_3 * col_4)`

, we first multiply the values of`col_3`

and`col_4`

for each row then sum the results (for each group). - For a coverage of arithmatic operations, see Arithmatic Operations.

We wish to aggregate data by computing summary statistics, e.g. the mean, for each group.

In this example we wish to compute the mean, and standard deviation of values of the
column `col_2`

for each group. We wish to carry out the aggregation on the table `table_1`

grouped
by the column `col_1`

.

```
SELECT col_1,
AVG(col_2) AS mean_col_2,
STDDEV(col_2) AS stddev_col_2
FROM table_1
GROUP BY col_1;
```

Here is how this works:

- The
`GROUP BY`

clause groups the rows of a table by the values in one or more columns. The rows are then assigned to a group based on the values in the grouping column(s). In this example, the rows are being grouped by the values in the col_1 column. - We then use the built-in functions
`AVG()`

and`STDDEV()`

to compute the mean, and standard deviation of values of the column`col_3`

. - For a coverage of arithmatic operations, see Arithmatic Operations.
- For a coverage of summary statistics, see Summary Statistics

We wish to aggregate data by computing the maximum or minimum value of a given column for each group.

In this example we wish to compute the minimum value of the numeric column `col_3`

for each group
and the maximum value of the numeric column `col_4`

for each group. We wish to carry out the
aggregation on the table `table_1`

grouped by the column `col_1`

.

```
SELECT col_1,
MIN(col_3) col_3_min,
MAX(col_4) col_4_max,
FROM table_1
GROUP BY col_1;
```

Here is how this works:

- The
`GROUP BY`

clause groups the rows of a table by the values in one or more columns. The rows are then assigned to a group based on the values in the grouping column(s). In this example, the rows are being grouped by the values in the col_1 column. - As their name implies, given a vector of values:
`MIN()`

returns the minimum value.`MAX()`

returns the maximum value.

`MAX()`

and the`MIN()`

belong to the larger class of ranking operations, which we cover in Ranking.- Note that here we are aggregating the values of individual columns in a group, to filter entire rows in a group based on the rank of the values of some columns, see Filtering Groups by Rank.

Unlike R and Python, some operations in SQL are considered a window function and cannot be used in an aggregation scenario. Median is a very common example and in this example we provide a blueprint for doing aggregations on top of window functions.

In this example we wish to compute the median of values of the column `col_2`

for each group. We
wish to carry out the aggregation on the table `table_1`

grouped by the column `col_1`

.

```
WITH table_2 AS (SELECT *, PERCENTILE_DISC(col_2, 0.5) OVER (PARTITION BY col_1) col_1_2_median
FROM table_1)
SELECT col_1,
MAX(col_1_2_median) col_2_median
FROM table_2
GROUP BY col_1;
```

Here is how this works:

- In
`table_2`

CTE , the`PERCENTILE_DISC()`

function is used to compute the median value (the 50th percentile) of the`col_2`

column, partitioned by`col_1`

. This means that the median value is calculated separately for each unique value of`col_1`

, and the resulting value is assigned to the`col_1_2_median`

column for each row in`table_2`

. - The last
`SELECT`

statement we group the results by`col_1`

. The`MAX()`

function is used to select the maximum value of`col_1_2_median`

for each group. We can replace`MAX`

by`MIN`

or`AVG`

since the value is of col_1_2_median is constant.

We wish to perform data aggregation operations on a column of a string data type.

In this example, we wish to summarize the table `table_1`

over groups defined by the column `col_1`

.
The aggregation operations we wish to carry out are: (1) Concatenate the unique values of the
column `col_2`

per group into a comma separated string. (2) Concatenate the unique values of the
column `col_2`

that matches a given regular expression for each group.

```
SELECT col_1,
STRING_AGG(DISTINCT col_2, ',') col_2_distinct,
STRING_AGG(DISTINCT CASE WHEN col_2 LIKE '%X%' THEN col_2 END, ',') col_2_match
FROM table_1
GROUP BY col_1;
```

- The
`GROUP BY`

clause groups the rows of a table by the values in one or more columns. The rows are then assigned to a group based on the values in the grouping column(s). In this example, the rows are being grouped by the values in the col_1 column. `STRING_AGG()`

function is used to concatenate a list of strings, separated by a specified delimiter. We use`DISTINCT`

to only concatenate unique values.`col_2_match`

includes only those values of`col_2`

that match a certain condition, specified using a`CASE`

expression. In this case, the condition is that the`col_2`

value contains the character`'X'`

. If the condition is met, the col_2 value is included in the concatenated string. Otherwise, it is not included.- For a coverage of string operations (including regular expressions), see String Operations.

We wish to perform data aggregation operations on a column of a logical data type; i.e. one that
takes a value of `TRUE`

or `FALSE`

.

In this example, for each group, where the groups are defined by the value of the column `col_1`

, we
wish to obtain the number and ratio of rows where the column `col_2`

is `TRUE`

.

```
SELECT col_1,
SUM(CAST(col_2 AS int64)) col_2_true_count,
AVG(CAST(col_2 AS int64)) col_2_true_rate
FROM table_1
GROUP BY col_1;
```

Here is how this works:

- The
`GROUP BY`

clause groups the rows of a table by the values in one or more columns. The rows are then assigned to a group based on the values in the grouping column(s). In this example, the rows are being grouped by the values in the col_1 column. - We cast
`col_2`

to integer inside`SUM()`

to obtain the number of`TRUE`

values. Summing a boolean column that is casted as an integer is basically counting the number of`TRUE`

values. When casting a boolean to an integer,`TRUE`

is regarded as`1`

and`FALSE`

is regarded as`0`

. - Similarly, we use
`AVG()`

to obtain the proportion of`TRUE`

values of`col_2`

for each group. Since`TRUE`

is regarded as`1`

and`FALSE`

is regarded as`0`

, taking the mean of a logical column is basically taking the ratio between the number of`TRUE`

values and the total number of values in the input vector. - This approach will ignore missing values since both
`SUM`

and`AVG`

ignore NULL values. This will affect the results of the ratio in this scenario. If we wish to include missing values see alternative solution below. - For a coverage of logical operations, see Logical Operations

*Alternatively:*

```
SELECT col_1,
COUNT(CASE WHEN col_2 THEN 1 END) col_2_true_count,
COUNT(CASE WHEN col_2 THEN 1 END) / COUNT(1) col_2_true_rate
FROM table_1
GROUP BY col_1;
```

Here is how this works:

`col_2_true_count`

column is computed using the`COUNT()`

function, which counts the number of rows in each group defined by col_1.- The
`CASE`

expression is used to count only those rows where`col_2`

is`TRUE`

. `col_2_true_rate`

is the count of`TRUE`

values divided by total row count.`COUNT(1)`

will count rows where the value of col_2 is`NULL`

.

We wish to group a table by some unit or interval of time e.g. group by year or weekday.

In this example, we wish to group the table `table_1`

by weekday inferred from the date-time
column `col_1`

. We then wish to compute the number of rows in each group and the number of unique
hours that `col_1`

takes in each group.

```
SELECT FORMAT_DATE('%A', col_1) AS weekday_name,
COUNT(1) row_count,
COUNT(DISTINCT EXTRACT(HOUR FROM col_1)) AS hour_count,
FROM table_1
GROUP BY FORMAT_DATE('%A', col_1);
```

Here is how this works:

`weekday_name`

column is computed using the`FORMAT_DATE()`

function, which formats a date value as a string using a specified format string. In this case, the format string is '%A', which returns the full name of the weekday (e.g., "Monday").- We use
`COUNT(1)`

to obtain the number of rows in each group. - The
`EXTRACT()`

function extracts the hour part of the`col_1`

date value as an integer. - We use
`COUNT(DISTINCT )`

to count the number of unique values of`EXTRACT(HOUR FROM col_1)`

for each group. - For a coverage of date time operations, see Date Time Operations.

SQL