Random Rows

We wish to obtain a random set of rows from a data frame.

By Count

We wish to get n random rows from a data frame.

df.sample(n=10, random_state=1234)

Here is how this works:

  • Pandas data frames have a sample() method that returns a randomly selected set of rows.
  • We specify the number or rows to return by setting the argument n which in this example we set to 10. If n is not set, sample() returns 1 row by default.
  • It is good practice to always set a seed value when generating random numbers (or using operations that generate random numbers) to ensure reproducibility of results (i.e. that we would get the same results when we run the same code later). We do that by here by passing a seed of 1234 (but can be any integer) to the random_state argument of sample().

By Proportion

We wish to get a proportion (percent) of the rows of a data frame selected at random.

df.sample(frac=0.1, random_state=1234)

Here is how this works:

  • This works similarly to the above except that to return a randomly selected proportion of the rows of a data frame, we use the frac argument of the sample() method.
  • Setting frac=0.1 returns 10% of the rows of the data frame df.

Selected Columns

We wish to get a randomly selected sample of rows of a data frame but return only a particular set of columns.

df.loc[:, ['col_1', 'col_3']].sample(n=10, random_state=1234)

Here is how this works:

  • We pass the names of the columns of interest, which in this example are ['col_1', 'col_3'], as a list of strings to the second argument of loc[]. To the first argument of loc[], we pass : to denote all rows. The preferred method to select columns by name in Pandas is .loc[] (see Selecting by Name).
  • We then apply sample() as described above to get a random set of rows.
PYTHON
I/O