Random Rows

We wish to obtain a random set of rows from a data frame.

By Count

We wish to get n random rows from a data frame.

set.seed(1234)
df %>% slice_sample(n=10)

Here is how this works:

  • We pass a data frame df to the function slice_sample().
  • slice_sample(n=10) returns a random sample of 10 rows since we set n=10. If n is not set, slice_sample() returns 1 row by default.
  • It is good practice to always set a seed value before generating random numbers (or using operations that generate random numbers) to ensure reproducibility of results (i.e. that we would get the same results when we run the same code later). We do that here by passing a determined seed of 1234 to set.seed() (we could pass any other number).

By Proportion

We wish to get a proportion (percent) of the rows of a data frame selected at random.

set.seed(1234)
df %>% slice_sample(prop=0.1)

Here is how this works:

  • This works similarly to the above except that we use the prop argument of slice_sample() instead of the n argument.
  • Setting prop=0.1 returns 10% of the rows of the data frame df.

Selected Columns

We wish to get a sample of rows of a data frame but return only a selected set of columns.

df %>% select(col_1, col_3) %>% slice_sample(n=10)

Here is how this works:

  • We use select() to specify the column names of the columns of the data frame df that we wish to include in the output. In this example, the column names are col_1 and col_3. For a detailed coverage, see Selecting by Name.
  • We then pass the output of select() to slice_sample(n=10) to get a random set of 10 rows.
R
I/O