We wish to obtain a range of rows of a data frame. This is commonly referred to as slicing.
We wish to get a range of rows between a given start
position and end
position.
df.iloc[0:9]
Here is how this works:
iloc
is what is called an indexer which is why it is followed by indexing brackets iloc[]
.iloc[]
can take a range of row positions specified as start:end
and returns the corresponding range of rows.start
but excludes the end
. Therefore, in this example df.iloc[0:9]
will return the 9 rows zero to eight but not nine.Get a range of rows (slice) relative to the bottom of the data frame.
df.iloc[-7:-2]
Here is how this works:
iloc[]
to the data frame df
as we did above to obtain a range of rows specified by start:end
.-2
in the example above means second last row.start
and end
can be negative or just one of the two. e.g. df.iloc[0,-2]
to obtain all but the last row.Often times we are faced with scenarios where we need the data frame to be sorted in a certain way before we take a slice. In other words, We wish to sort the data frame by a particular column (or set of columns) and then take a slice.
df.sort_values(by='col_1').iloc[4:8]
Here is how this works:
df
by the values of col_1
in ascending order using sort_values()
. For more details see Sorting.iloc[]
as described above to extract a range of rows.We wish to get a range of rows from a data frame but return only a particular set of columns.
df.loc[:, ['col_1', 'col_3']].iloc[0:9]
Here is how this works:
['col_1', 'col_3']
, as a list of strings to the second argument of loc[]
. To the first argument of loc[]
, we pass :
to denote all rows. The preferred method to select columns by name in Pandas is .loc[]
(see Selecting)iloc[start:end]
as described above to get a range of rows.