Random Rows

We wish to obtain a random set of rows from a table.

By Count

We wish to get n random rows from a table.

Large Table

Similar to getting a proportion of rows, we just add a LIMIT clause.

SELECT *
FROM refcon.dataset.large_table_1 TABLESAMPLE SYSTEM (10 PERCENT)
LIMIT 10;

Small Table

If the table is not large then we can scan the full table and use the LIMIT clause.

SELECT *
FROM refcon.dataset.table_1
ORDER BY RAND()
LIMIT 10;

Here is how this works:

  • RAND() generates a random number between 0 and 1 for each row in the table.
  • LIMIT will return only 10 rows.
  • The query might return different results each time.

By Proportion

We wish to get a proportion (percent) of the rows of a table selected at random.

Large Table

The recommended way is using TABLESAMPLE clause for large tables (Size is in GBs) if sampling is meant for inspecting purposes only. Table sampling lets us query random subsets of data from large BigQuery tables. Sampling returns a variety of records while avoiding the costs associated with scanning and processing an entire table.

SELECT *
FROM refcon.dataset.large_table_1 TABLESAMPLE SYSTEM (10 PERCENT);

Here is how this works:

  • TABLESAMPLE returns a random subset of data from a table.
  • BigQuery does not cache the results of queries that include a TABLESAMPLE clause, so the query might return different results each time.
  • We set the proportion to 10 PERCENT.

Small Table

BigQuery splits tables into blocks if they are larger than about 1 GB. Smaller tables might consist of a single data block in which case TABLESAMPLE will return the full table. We use RAND() function to overcome this limitation.

SELECT *
FROM refcon.dataset.table_1
WHERE RAND() < 0.1;

Here is how this works:

  • RAND() generates a random number between 0 and 1 for each row in the table.
  • 10% of the rows will get a number less than 0.1 since it follows a .
  • The query might return different results each time.
SQL
I/O