We wish to combine multiple data frames into a single data frame vertically where rows are appended to each other, or horizontally, where columns are combined side by side. Concatenating rows together is referred to as Union in SQL.
This section is organized as follows:
We wish to append two data frames together vertically where rows are appended to each other and columns with the same name in both data frames are aligned based on their names. This is also referred to as Concatenating or Union in SQL.
In this example, we wish to append df_1
and df_2
and return a single data frame with all rows
and
columns from both data frames.
df = pd.concat([df_1, df_2], ignore_index=True)
Here is how this works:
concat()
Pandas function to append rows from df_1
and df_2
vertically.concat()
, in this example [df_1, df_2]
.concat()
will keep the original index. We set ignore_index=True
to reset the
index in case the original index is not meaningful.NA
for rows coming from the other
data frame.df
will contain all the rows from both df_1
and df_2
, appended one
after the other and aligned based on column names. If a column only exists in one data frame, it
will be added to the left of the output data frame and filled with NA
for the rows coming from
the other data frame.Extension: Data Frame Identifier
We wish to create a new column of identifiers to link each row to its original data frame.
In this example, we wish to append df_1
and df_2
vertically and add a new column named source
to
the output data frame df
.
df = pd.concat([df_1, df_2], keys=['df_1', 'df_2'])
df['source'] = df.index.get_level_values(0)
Here is how this works:
concat()
. In this example keys=['df_1', 'df_2']
. This will create a MultiIndex
where
level 0 is the keys and level 1 is the original index.df['source'] = df.index.get_level_values(0)
we convert level 0 of the MultiIndex to a column
named source.df
is similar to the Append Rows solution above with a new
column named source
that links each row to the original data frame using the keys df_1
and df_2
.Extension: Chain Concat
wish to apply the concat()
function in a chainable manner.
In this example, we wish to append df_1
and df_2
and return a single data frame with all rows
from
both data frames and rename col_5
from df_2
to col_4
.
df = df_1. \
pipe(lambda x: pd.concat([x, df_2], ignore_index=True)). \
rename(columns={'col_5': 'col_4'})
Here is how it works:
pipe()
function from Pandas to chain together operations (like the %>%
operation in
R).df_1 (as x)
and df_2
to concat()
and then we
rename col_5
using rename()
. See Renaming.col_5
renamed
to col_4
.We wish to append two or more data frames horizontally. This is similar to joining but can only be done based on the index. We recommend using the methods covered in Joining to append data frames horizontally.
In this example, we wish to append df_1
and df_2
horizontally and align rows based on the index
of
the data frames.
df = pd.concat([df_1, df_2], axis='columns', ignore_index=True)
Here is how this works:
concat()
Pandas function to append rows from df_1
and df_2
horizontally.concat()
in this example [df_1, df_2]
.(axis='rows')
. We set the axis argument to axis='columns'
to append columns based on the index.concat()
will keep the original index. We set ignore_index=True
to reset the
index in case the original index is not meaningful.df
will contain all the rows from both df_1
and df_2
, appended
horizontally and aligned based on the index. All columns from both data frames will be in the
output data frame.