We wish to combine multiple data frames into a single data frame vertically where rows are appended to each other, or horizontally, where columns are combined side by side. Concatenating rows together is referred to as Union in SQL.
This section is organized as follows:
We wish to append two data frames together vertically where rows are appended to each other and columns with the same name in both data frames are aligned based on their names. This is also referred to as Concatenating or Union in SQL.
In this example, we wish to append df_1
and df_2
and return a single data frame with all rows and
columns from both data frames.
df = df_1 %>% bind_rows(df_2)
Here is how this works:
bind_rows()
function from the dplyr
package to append rows from df_1
and df_2
vertically.NA
for rows coming from the other
data frame.df
will contain all the rows from both df_1
and df_2
, appended one
after the other and aligned based on column names. If a column only exists in one data frame, it
will be added to the left of the output data frame and filled with NA
for the rows coming from
the other data frame.Extension: More than 2 Data Frames
We wish to append 3 or more data frames vertically.
In this example, we wish to append df_1
, df_2
, and df_3
and return a single data frame with all
rows and columns from both data frames.
df_1 = data.frame(col_1 = c(1, 2, 3, 4),
col_2 = c('A', 'B', 'C', 'D'),
col_3 = c(25, 30, 35, 40))
df_2 = data.frame(col_1 = c(2, 3, 4, 5),
col_4 = c('W', 'X', 'Y', 'Z'))
df_3 = data.frame(col_1 = c(3, 4, 5, 6),
col_5 = c('10', '20', '30', '40'))
df = bind_rows(list(df_1, df_2, df_3))
Here is how this works:
bind_rows()
function from the dplyr
package to append rows from df_1
, df_2
,
and df_3
vertically.bind_rows()
using %>%
, we can pass a list of data frames to append, in case
the number of data frames is large. In this example list(df_1, df_2, df_3)
Extension: Data Frame Identifier
We wish to create a new column of identifiers to link each row to its original data frame.
In this example, we wish to append df_1
and df_2
vertically and add a new column named source
to
the output data frame df
.
df = bind_rows(list(a = df_1, b = df_2), .id = "source")
Here is how this works:
bind_rows()
function from the dplyr
package to append rows from df_1
and df_2
vertically.list(a = df_1, b = df_2)
..id
argument, in this example .id = "source"
.source
column will have the value a
for rows coming from df_1
and b
for rows coming
from df_2
.We wish to append two or more data frames horizontally. This is similar to Joining but can only be done based on the row position. We think this is often rather unsafe and recommend using the methods covered in Joining to append data frames horizontally.
In this example, we wish to append df_1
and df_2
horizontally and align rows based on position in
the data frame. This means all data frames must have the same number of rows.
df = df_1 %>% bind_cols(df_2)
Here is how this works:
bind_cols()
function from the dplyr
package to append columns from df_1
and df_2
horizontally.df
will contain all the rows from both df_1
and df_2
, appended
horizontally and aligned based on the row position. All columns from both data frames will be in
the output data frame.