We wish to select one or more column from a data frame and we wish to identify the columns to be selected by specifying their position. The positions start at 1 for the left most column.
We wish to select and extract a single column from a data frame by specifying their position.
When selecting a single column, we have the choice of returning that as a vector or as a data frame. The choice of which is appropriate depends on whether the operation(s) we wish to run afterwards expect a vector or a data frame.
As Data Frame
We wish to select a single column from a data frame by specifying their position and to return the selected column as a data frame of one column.
df_2 = df %>% select(2)
Here is how this works:
df
to the function select()
.select()
the position of the column we wish to select; here 2
i.e. the second column from the left.select()
can take as input column names (see Selecting by Name) or column position.df_2
) is a data frame that contains one column which is the second column from the left in the original data frame df
.As Vector
We wish to select a single column from a data frame by specifying their position and to return the selected column as a vector.
col = df %>% pull(2)
Here is how this works:
df
to the function pull()
.pull()
the position of the column we wish to select; here 2
.pull()
returns the selected column as a vector (not a data frame like what select()
returns).df[[2]]
, however we recommend using pull()
for it’s versatility and how it fits in a chain.Given a data frame, we wish to return another data frame that is comprised of a subset of the columns of the original data frame. We wish to specify the columns that we wish to select by their position.
df_2 = df %>% select(1, 2)
Here is how this works:
df
to the function select()
.select()
the positions of the column we wish to select separated by commas; here 1, 2
.Given a data frame, we wish to return another data frame that is comprised of a range of columns from the original data frame i.e. we wish to return every column between a given start
column and end
column including both start
and end
. We wish to specify the start
and end
column by their position.
df_2 = df %>% select(2:5)
Here is how this works:
df
to the function select()
.select()
the positions of the the start
and end
column for the range of columns we wish to extract; here the start column is at position 2
(2nd from the left) and the end column is at position 5
(5th from the left).start
position, the column at the end
position, and all columns in between are returned as a data frame. In this example columns at positions 2, 3, 4 and 5 in the data frame df
are returned (and saved in the data frame df_2
).We wish to select a subset of columns of a data frame by specifying their position. However, we wish to specify the position relative to the end of the data frame (the right end) rather than relative to the beginning (the left end).
In this example, we wish to select the right most column and the column that is the third from the right of a data frame.
df_2 = df %>% select(last_col(offset = 2), last_col())
Here is how this works:
last_col()
to index relative to the right end of the Data Frame.las_col()
without passing any arguments refers to the right most column of the data frame.offset
argument of last_col()
. In this example, we use last_col(offset = 2)
to refer to the third column from the right end.select()
interprets the minus sign as a request to drop a column (see Exclude Columns).