We wish to obtain a random set of rows from a data frame.
We wish to get n
random rows from a data frame.
df.sample(n=10, random_state=1234)
Here is how this works:
sample()
method that returns a randomly selected set of rows.n
which in this example we set to 10
. If n
is not set, sample()
returns 1
row by default.1234
(but can be any integer) to the random_state
argument of sample()
.We wish to get a proportion (percent) of the rows of a data frame selected at random.
df.sample(frac=0.1, random_state=1234)
Here is how this works:
frac
argument of the sample()
method.frac=0.1
returns 10% of the rows of the data frame df
.We wish to get a randomly selected sample of rows of a data frame but return only a particular set of columns.
df.loc[:, ['col_1', 'col_3']].sample(n=10, random_state=1234)
Here is how this works:
['col_1', 'col_3']
, as a list of strings to the second argument of loc[]
. To the first argument of loc[]
, we pass :
to denote all rows. The preferred method to select columns by name in Pandas is .loc[]
(see Selecting by Name).sample()
as described above to get a random set of rows.