Set Names

We wish to rename columns of a data frame by their position without providing the current name(s).

We should rename columns by mapping the current name to the desired name where possible. Sometimes though we need to identify the columns whose names we wish to change by their position instead of their current name; e.g. when the current names are not valid or too long.

Note that referring to columns by their position is rather fragile and should be handled with care else risk mislabeling columns if we accidentally get the positions of columns wrong.

In this section, we will cover three common scenarios of column name setting:

  • All Columns: Set the names of all columns given their positions and the desired names.
  • One Column: Set the name of one column given its position and the desired name.
  • Some Columns: Set the names of some columns given their positions and the desired names.

All Columns

We wish to rename all columns by providing a set of new column names in the right order.

In this example, we have a data frame of three columns and we wish to set their names to col_a, col_b, and col_c.

df_2 = df %>% 
  set_names(c('col_a', 'col_b', 'col_c'))

Here is how this works:

  • We use the function set_names() from base R to assign new names to columns given a vector of names in the right order.
  • The passed vector of column names must have as many elements as there are columns in the data frame otherwise an error is raised.
  • We need to be especially careful to get the order of columns right when setting all column names, so we do not accidentally mislabel columns.

One Column

We wish to set the name for a specific column given its position.

In this example, we have a data frame of three columns, and we wish to change the name of the third column to col_c.

df_2 = df %>% 
    rename_with(~'col_c', 3)

Here is how this works:

  • We use the function rename_with() from dplyr to set the name of a particular column given its index. In this case, we set the name of the 3rd column to col_c. See Implicit Naming.
  • We are passing to rename_with():
    • An anonymous function (a one-sided formula) that specifies the desired name, which in this case is ~'col_c'.
    • The second argument selects the column(s) we wish to rename. In this case we select a column by its location which is 3.
  • The output data frame df_2 will be the same as the input data frame df except that the third column will have its name changed to col_c.

Some Columns

We wish to set the name for some columns given their position.

In this example, we wish to change the name of the first and third columns to col_a and col_c respectively.

df_2 = df %>% 
    rename_with(~c('col_a', 'col_c'), c(1, 3))

Here is how this works:

  • We use the function rename_with() from dplyr to set the names of particular columns given their index. In this case, we set the name of the 1st and 3rd columns to col_a and col_c. See Implicit Naming.
  • We are passing to rename_with():
    • An anonymous function (a one-sided formula) that specifies the desired names, which in this case is ~c('col_a', 'col_c').
    • The second argument selects the column(s) we wish to rename. In this case we select a column by its location which is c(1, 3).
  • The output data frame df_2 will be the same as the input data frame df except that the first and third columns will have their names changed to col_a and col_c respectively.