Range of Rows

We wish to obtain a range of rows of a data frame. This is commonly referred to as slicing.

Range

We wish to get a range of rows between a given start position and end position.

df.iloc[0:9]

Here is how this works:

  • iloc is what is called an indexer which is why it is followed by indexing brackets iloc[].
  • iloc[] can take a range of row positions specified as start:end and returns the corresponding range of rows.
  • Indexing by position in Pandas includes the start but excludes the end. Therefore, in this example df.iloc[0:9] will return the 9 rows zero to eight but not nine.
  • It’s worth reminding ourselves that indices in Python are zero based.

From End

Get a range of rows (slice) relative to the bottom of the data frame.

df.iloc[-7:-2]

Here is how this works:

  • We apply iloc[] to the data frame df as we did above to obtain a range of rows specified by start:end.
  • In pandas negative indices indicate indexing relative to the end. For instance, the -2 in the example above means second last row.
  • Both start and end can be negative or just one of the two. e.g. df.iloc[0,-2] to obtain all but the last row.

Sorted Slice

Often times we are faced with scenarios where we need the data frame to be sorted in a certain way before we take a slice. In other words, We wish to sort the data frame by a particular column (or set of columns) and then take a slice.

df.sort_values(by='col_1').iloc[4:8]

Here is how this works:

  • We first sort df by the values of col_1 in ascending order using sort_values(). For more details see Sorting.
  • We then apply iloc[] as described above to extract a range of rows.

Selected Columns

We wish to get a range of rows from a data frame but return only a particular set of columns.

df.loc[:, ['col_1', 'col_3']].iloc[0:9]

Here is how this works:

  • We pass the names of the columns of interest, which in this example are ['col_1', 'col_3'], as a list of strings to the second argument of loc[]. To the first argument of loc[], we pass : to denote all rows. The preferred method to select columns by name in Pandas is .loc[] (see Selecting)
  • We then apply iloc[start:end] as described above to get a range of rows.
PYTHON
I/O