Appending

We wish to combine multiple data frames into a single data frame vertically where rows are appended to each other, or horizontally, where columns are combined side by side. Concatenating rows together is referred to as Union in SQL.

This section is organized as follows:

  • Append Rows where we cover how to append two or more data frames vertically. i.e. concatenate rows while aligning common columns.
  • Append Columns where we cover how to append two or more data frames horizontally. This is similar to Joining but can only be done based on the index. We recommend using the methods covered in Joining to append data frames horizontally.

Append Rows

We wish to append two data frames together vertically where rows are appended to each other and columns with the same name in both data frames are aligned based on their names. This is also referred to as Concatenating or Union in SQL.

In this example, we wish to append df_1 and df_2 and return a single data frame with all rows and columns from both data frames.

df = df_1 %>% bind_rows(df_2)

Here is how this works:

  • We use the bind_rows() function from the dplyr package to append rows from df_1 and df_2 vertically.
  • Columns present in one data frame only will be filled with NA for rows coming from the other data frame.
  • The output data frame df will contain all the rows from both df_1 and df_2, appended one after the other and aligned based on column names. If a column only exists in one data frame, it will be added to the left of the output data frame and filled with NA for the rows coming from the other data frame.

Extension: More than 2 Data Frames

We wish to append 3 or more data frames vertically.

In this example, we wish to append df_1, df_2, and df_3 and return a single data frame with all rows and columns from both data frames.

df_1 = data.frame(col_1 = c(1, 2, 3, 4),
                  col_2 = c('A', 'B', 'C', 'D'),
                  col_3 = c(25, 30, 35, 40))

df_2 = data.frame(col_1 = c(2, 3, 4, 5),
                  col_4 = c('W', 'X', 'Y', 'Z'))

df_3 = data.frame(col_1 = c(3, 4, 5, 6),
                  col_5 = c('10', '20', '30', '40'))

df = bind_rows(list(df_1, df_2, df_3))

Here is how this works:

  • We use the bind_rows() function from the dplyr package to append rows from df_1, df_2, and df_3 vertically.
  • While we can chain bind_rows() using %>%, we can pass a list of data frames to append, in case the number of data frames is large. In this example list(df_1, df_2, df_3)

Extension: Data Frame Identifier

We wish to create a new column of identifiers to link each row to its original data frame.

In this example, we wish to append df_1 and df_2 vertically and add a new column named source to the output data frame df.

df = bind_rows(list(a = df_1, b = df_2), .id = "source")

Here is how this works:

  • We use the bind_rows() function from the dplyr package to append rows from df_1 and df_2 vertically.
  • We pass a named list of data frames to identify each data frame. In this example list(a = df_1, b = df_2).
  • We pass the name of the new column to the .id argument, in this example .id = "source".
  • The source column will have the value a for rows coming from df_1 and b for rows coming from df_2.

Append Columns

We wish to append two or more data frames horizontally. This is similar to Joining but can only be done based on the row position. We think this is often rather unsafe and recommend using the methods covered in Joining to append data frames horizontally.

In this example, we wish to append df_1 and df_2 horizontally and align rows based on position in the data frame. This means all data frames must have the same number of rows.

df = df_1 %>% bind_cols(df_2)

Here is how this works:

  • We use the bind_cols() function from the dplyr package to append columns from df_1 and df_2 horizontally.
  • The output data frame df will contain all the rows from both df_1 and df_2, appended horizontally and aligned based on the row position. All columns from both data frames will be in the output data frame.
  • We recommend using methods covered in Joining to perform column appending.
R
I/O