We wish to select one or more column from a data frame and we wish to identify the columns to be selected by specifying their position. The positions start at 0 for the left most column.
We wish to select and extract a single column from a data frame by specifying their position.
When selecting a single column, we can choose to return that one column either as a Series
or as a DataFrame
. The choice of which is appropriate depends on whether the operation(s) we wish to run afterwards expect a Series
or a DataFrame
.
As Series
We wish to select a single column from a data frame by specifying their position and to return the selected column as a Series
.
col = df.iloc[:, 1]
Here is how this works:
iloc[]
to select columns by their position. Note that iloc[]
indexes by position and not by the Index
.1
refers to the second column from the left.iloc[]
can subset both rows and columns in the same command via position i.e. iloc[row_selection, column_selection]
.:
as the first argument (before the comma).iloc[]
; which here is 1
.df.iloc[:, 1]
returns a Series
.[]
. We must use the iloc[]
operator.iloc[]
is out of bounds (an integer larger than the number of columns in the DataFrame
) we get an IndexError
.As Data Frame
We wish to select a single column from a data frame by specifying their position and to return the selected column as a data frame of one column.
df_2 = df.iloc[:, [1]]
Here is how this works:
iloc[]
to select columns by their position. Note that iloc[]
indexes by position and not by the Index
.1
refers to the second column.iloc[]
can subset both rows and columns in the same command via position i.e. iloc[row_selection, column_selection]
.:
as the first argument (before the comma).iloc[]
; which here is [1]
.df.iloc[:, [1]]
returns a DataFrame
with one column.iloc[]
is out of bounds (an integer larger than the number of columns in the DataFrame
) we get an IndexError
.Given a data frame, we wish to return another data frame that is comprised of a subset of the columns of the original data frame. We wish to specify the columns that we wish to select by their position.
df_2 = df.iloc[:, [0, 1]]
Here is how this works:
iloc[]
to select columns by their position (not by the index).iloc[]
can subset both rows and columns in the same command via position i.e. iloc[row_selection, column_selection]
.:
as the first argument (before the comma).iloc[]
; which here is [0, 1]
.iloc[]
is out of bounds, we get an IndexError
.Given a data frame, we wish to return another data frame that is comprised of a range of columns from the original data frame between a given start
position and a given end
position.
df.iloc[:, 1:5]
Here is how this works:
start
position is included but the column at the end
position is not included.[]
to select a range of columns because a range inside the bracket operator []
denotes slicing rows. For instance df[0:3]
returns the first three rows.iloc[]
offers the following convenance features:df.iloc[:, :5]
instead of df.iloc[:, 0:5]
.df.iloc[:, 1:]
instead of df.iloc[:, 1:5]
(where the example data frame has 5 columns).We wish to select a subset of columns of a data frame by specifying their position. However, we wish to specify the position relative to the end of the data frame (the right end) rather than relative to the beginning (the left end).
In this example, we wish to select the right most column and the column that is the third from the right of a data frame.
df.iloc[:, [-3, -1]]
Here is how this works:
.iloc[]
**** supports indexing relative to the end of the data frame via negative integers starting at -1
.iloc[]
is out of bounds (a negative integer larger than the number of columns in the DataFrame
) we get an IndexError
.