We wish to obtain a random set of rows from a table.
We wish to get n
random rows from a table.
Large Table
Similar to getting a proportion of rows, we just add a LIMIT clause.
SELECT *
FROM refcon.dataset.large_table_1 TABLESAMPLE SYSTEM (10 PERCENT)
LIMIT 10;
Small Table
If the table is not large then we can scan the full table and use
the LIMIT
clause.
SELECT *
FROM refcon.dataset.table_1
ORDER BY RAND()
LIMIT 10;
Here is how this works:
RAND()
generates a random number between 0 and 1 for each row in the table.LIMIT
will return only 10 rows.We wish to get a proportion (percent) of the rows of a table selected at random.
Large Table
The recommended way is using TABLESAMPLE
clause for large tables (Size is in GBs) if sampling is
meant for inspecting
purposes only. Table sampling lets us query random subsets of data from large BigQuery tables.
Sampling returns a
variety of records while avoiding the costs associated with scanning and processing an entire table.
SELECT *
FROM refcon.dataset.large_table_1 TABLESAMPLE SYSTEM (10 PERCENT);
Here is how this works:
TABLESAMPLE
returns a random subset of data from a table.TABLESAMPLE
clause, so the query
might return
different results each time.Small Table
BigQuery splits tables into blocks if they are larger than about 1 GB. Smaller tables might consist
of a single data
block in which case TABLESAMPLE
will return the full table. We use RAND()
function to overcome
this limitation.
SELECT *
FROM refcon.dataset.table_1
WHERE RAND() < 0.1;
Here is how this works:
RAND()
generates a random number between 0 and 1 for each row in the table.