Selecting by Position

We wish to select one or more column from a data frame and we wish to identify the columns to be selected by specifying their position. The positions start at 1 for the left most column.

Single Column

We wish to select and extract a single column from a data frame by specifying their position.

When selecting a single column, we have the choice of returning that as a vector or as a data frame. The choice of which is appropriate depends on whether the operation(s) we wish to run afterwards expect a vector or a data frame.

As Data Frame

We wish to select a single column from a data frame by specifying their position and to return the selected column as a data frame of one column.

df_2 = df %>% select(2)

Here is how this works:

  • We pass the data frame df to the function select().
  • We pass to select() the position of the column we wish to select; here 2 i.e. the second column from the left.
  • select() can take as input column names (see Selecting by Name) or column position.
  • The output (here saved in df_2) is a data frame that contains one column which is the second column from the left in the original data frame df.

As Vector

We wish to select a single column from a data frame by specifying their position and to return the selected column as a vector.

col = df %>% pull(2)

Here is how this works:

  • We pass the data frame df to the function pull().
  • We pass to pull() the position of the column we wish to select; here 2.
  • pull() returns the selected column as a vector (not a data frame like what select() returns).
  • Alternatively, we could use df[[2]], however we recommend using pull() for it’s versatility and how it fits in a chain.

List of Columns

Given a data frame, we wish to return another data frame that is comprised of a subset of the columns of the original data frame. We wish to specify the columns that we wish to select by their position.

df_2 = df %>% select(1, 2)

Here is how this works:

  • We pass the data frame df to the function select().
  • We pass to select() the positions of the column we wish to select separated by commas; here 1, 2.

Range of Columns

Given a data frame, we wish to return another data frame that is comprised of a range of columns from the original data frame i.e. we wish to return every column between a given start column and end column including both start and end. We wish to specify the start and end column by their position.

df_2 = df %>% select(2:5)

Here is how this works:

  • We pass the data frame df to the function select().
  • We pass to select() the positions of the the start and end column for the range of columns we wish to extract; here the start column is at position 2 (2nd from the left) and the end column is at position 5 (5th from the left).
  • The column at the start position, the column at the end position, and all columns in between are returned as a data frame. In this example columns at positions 2, 3, 4 and 5 in the data frame df are returned (and saved in the data frame df_2).

Relative to End

We wish to select a subset of columns of a data frame by specifying their position. However, we wish to specify the position relative to the end of the data frame (the right end) rather than relative to the beginning (the left end).

In this example, we wish to select the right most column and the column that is the third from the right of a data frame.

df_2 = df %>% select(last_col(offset = 2), last_col())

Here is how this works:

  • We use the operator last_col() to index relative to the right end of the Data Frame.
  • las_col() without passing any arguments refers to the right most column of the data frame.
  • We can refer to a column relative to the right end of the data frame by using the offset argument of last_col(). In this example, we use last_col(offset = 2) to refer to the third column from the right end.
  • A common mistake is to attempt to use the minus sign to select from the end (which is actually the right way in other tools like Pandas in Python). In R, select() interprets the minus sign as a request to drop a column (see Exclude Columns).
R
I/O