Specific Rows

We wish to obtain specific rows of a data frame by specifying their positions (row number).

Specific Rows

We wish to get specific rows from a data frame by specifying their position.

df.iloc[[1, 3, 7]]

Here is how this works:

  • We apply iloc[] to our data frame df.
  • We pass to iloc[] the indices of the rows of interest as a list of zero based row position indices, which in this case is [1, 3, 7].

Selected Columns

We wish to get the specific rows of a data frame including only a selected set of columns.

df.loc[:, ['col_1', 'col_3']].iloc[[1, 3, 7]]

Here is how this works:

  • We pass the names of the columns of interest, which in this example are ['col_1', 'col_3'], as a list of strings to the second argument of loc[]. To the first argument of loc[], we pass : to denote all rows. The preferred method to select columns by name in Pandas is .loc[] (see Selecting by Name)
  • We then iloc[] to select rows by their position as we did above.

An alternative approach is:

col_ids = [df.columns.get_loc(x) for x in ['col_1', 'col_3']]
df.iloc[[1, 3, 7], col_ids]

Here is how this works:

  • Although safe for data inspection purposes, chaining multiple indexers is generally discouraged.
  • One way to get around that is to covert column names into column positions via get_loc().
  • Then use the same iloc[] indexer for both slicing rows and selecting columns.
  • Although we may be able to manually identify column index, it’s safer to do that via get_loc() because column names are usually stable but their index is not.

Single Value

We wish to obtain the value of a particular column at a particular row. In this example we wish to obtain the value of col_1 at row 3.

sub_df = df.iloc[2].loc['col_1']

Here is how this works:

  • We use iloc[2] to extract the third row as a Series.
  • We then use loc[’col’1] to extract the Series value at index ‘col_1’. Note that df.iloc[2] returns a Series where the index is the column names.

An alternative approach is:

col_id = df.columns.get_loc('col_1')
sub_df = df.iloc[2, col_id]

See the alternative approach discussion above.

PYTHON
I/O