Selecting by Position

We wish to select one or more column from a data frame and we wish to identify the columns to be selected by specifying their position. The positions start at 0 for the left most column.

Single Column

We wish to select and extract a single column from a data frame by specifying their position.

When selecting a single column, we can choose to return that one column either as a Series or as a DataFrame. The choice of which is appropriate depends on whether the operation(s) we wish to run afterwards expect a Series or a DataFrame.

As Series

We wish to select a single column from a data frame by specifying their position and to return the selected column as a Series.

col = df.iloc[:, 1]

Here is how this works:

  • We use iloc[] to select columns by their position. Note that iloc[] indexes by position and not by the Index.
  • In Pandas (like Numpy and Python in general) indexing is 0-based. Therefore, in this example, 1 refers to the second column from the left.
  • iloc[] can subset both rows and columns in the same command via position i.e. iloc[row_selection, column_selection].
    • To select all rows for a selected set of columns we have to pass a : as the first argument (before the comma).
    • We pass the position of the columns we wish to select as a lists to the second argument of iloc[]; which here is 1.
  • df.iloc[:, 1] returns a Series.
  • Selecting a column by position can not be done by the square bracket operator []. We must use the iloc[] operator.
  • If the position passed to iloc[] is out of bounds (an integer larger than the number of columns in the DataFrame) we get an IndexError.

As Data Frame

We wish to select a single column from a data frame by specifying their position and to return the selected column as a data frame of one column.

df_2 = df.iloc[:, [1]]

Here is how this works:

  • We use iloc[] to select columns by their position. Note that iloc[] indexes by position and not by the Index.
  • Note that indexing is 0-based. So in this example 1 refers to the second column.
  • iloc[] can subset both rows and columns in the same command via position i.e. iloc[row_selection, column_selection].
    • To select all rows for a selected set of columns we have to pass a : as the first argument (before the comma).
    • We pass the position of the columns we wish to select as a lists to the second argument of iloc[]; which here is [1].
  • df.iloc[:, [1]] returns a DataFrame with one column.
  • If the position passed to iloc[] is out of bounds (an integer larger than the number of columns in the DataFrame) we get an IndexError.

List of Columns

Given a data frame, we wish to return another data frame that is comprised of a subset of the columns of the original data frame. We wish to specify the columns that we wish to select by their position.

df_2 = df.iloc[:, [0, 1]]

Here is how this works:

  • We use iloc[] to select columns by their position (not by the index).
  • iloc[] can subset both rows and columns in the same command via position i.e. iloc[row_selection, column_selection].
    • To select all rows for a selected set of columns we have to pass a : as the first argument (before the comma).
    • We pass the positions of the columns we wish to select as a lists to the second argument of iloc[]; which here is [0, 1].
  • If any of the positions in the list passed to iloc[] is out of bounds, we get an IndexError.

Range of Columns

Given a data frame, we wish to return another data frame that is comprised of a range of columns from the original data frame between a given start position and a given end position.

df.iloc[:, 1:5]

Here is how this works:

  • When selecting columns by position, Pandas uses a half open interval i.e. the column at the start position is included but the column at the end position is not included.
  • Out of range start and end indexes are handled gracefully; i.e. if either start or end or both are out of bounds, no error will be returned. If both start and end are out of range, an empty data frame is returned.
  • We can’t use the bracket operator [] to select a range of columns because a range inside the bracket operator [] denotes slicing rows. For instance df[0:3] returns the first three rows.
  • iloc[] offers the following convenance features:
    • If the start column of the range is the very first column we can skip passing its position e.g. we can use df.iloc[:, :5] instead of df.iloc[:, 0:5].
    • Similarly if the end position is the last column we can skip passing its position e.g. df.iloc[:, 1:] instead of df.iloc[:, 1:5] (where the example data frame has 5 columns).

Relative to End

We wish to select a subset of columns of a data frame by specifying their position. However, we wish to specify the position relative to the end of the data frame (the right end) rather than relative to the beginning (the left end).

In this example, we wish to select the right most column and the column that is the third from the right of a data frame.

df.iloc[:, [-3, -1]]

Here is how this works:

  • .iloc[] **** supports indexing relative to the end of the data frame via negative integers starting at -1.
  • If the position passed to iloc[] is out of bounds (a negative integer larger than the number of columns in the DataFrame) we get an IndexError.
PYTHON
I/O