We wish to join two data frames based on specific columns.
This section is organized as follows:
This section is complemented by:
We wish to join two data frames based on all common columns between them. This is often referred to as a Natural Join.
In this example, we wish to join df_1
and df_2
based on the columns named col_1
and col_2
in
both data frames.
df = df_1.merge(df_2, how='inner')
Here is how this works:
merge()
function in pandas to combine df_1
and df_2
based on the common
column col_1
.inner
as the how
argument.
See Join Types.merge()
will join the two data frames on all the columns with the same name in both
data frames.df
will contain the rows that have the same value for both col_1
and col_2
in both df_1
and df_2
and all columns in both data frames with only one col_1
and col_2
.We wish to join two data frames based on one column.
Common Column Name
We wish to join two data frames based on one column that have the same name in the data frames to be joined.
In this example, we wish to join df_1
and df_2
based on a column named col_1
in both
data frames.
df = df_1.merge(df_2, how='inner', on='col_1')
Here is how this works:
merge()
function in pandas to combine df_1
and df_2
based on the common
column col_1
.inner
as the how
argument.
See Join Types.merge()
will join the two data frames on all the columns with the same name in both
data frames. To change this behavior, we specify the column we wish to join on by passing its name
as a string to the on
argument; which in this example is on=’col_1’
.df
will contain the rows that have the same value for col_1
in
both df_1
and df_2
and all columns in both data frames with only one col_1
.Different Column Name
We wish to join two data frames based on one column that has a different name in the data frames.
In this example, we wish to join df_1
and df_2
based on col_1_x
from df_1
and col_1_y
in df_2
only.
df = df_1.merge(df_2, how='inner', left_on='col_1_x', right_on='col_1_y')
Here is how this works:
merge()
function in pandas to combine df_1
and df_2
based on col_1_x
and col_1_y
.inner
as the how
argument.
See Join Types.left_on
and right_on
arguments to specify the column we wish to use in the join
condition. In this example, df_1
is the left data frame and df_2
is the right one.df
will contain the rows that have the same value for col_1_x
and col_1_y
and all columns in both data frames.We wish to join two data frames based on multiple columns.
Common Columns Names
We wish to join two tables based on multiple columns that have the same name in the data frames to be joined.
In this example, we wish to join df_1
and df_2
based on the columns named col_1
and col_2
in both data frames.
df = df_1.merge(df_2, how='inner', on=['col_1', 'col_2'])
Here is how this works:
on
argument.df
will contain the rows that have the same value for col_1
and col_2
in both df_1
and df_2
and all columns in both data frames with one instance of each of col_1
and col_2
.Different Columns Names
We wish to join two data frames based on multiple columns, while each of the join columns has a different name in either data frame.
In this example, we wish to join df_1
and df_2
based on col_1_x
and col_2_x
from df_1
and col_1_y
and col_2_y
in df_2
.
df = df_1.merge(df_2, how='inner',
left_on=['col_1_x', 'col_2_x'],
right_on=['col_1_y', 'col_2_y'])
Here is how this works:
left_on
and right_on
arguments.df
will contain the rows that have the same value for all the specified
join columns. All columns from both data frames will be in the output data frame df
.We wish to join two data frames based on their index.
In this example, we wish to join df_1
and df_2
based on their index.
df = df_1.merge(df_2, how='inner', left_index=True, right_index=True)
Here is how this works:
merge()
function in pandas to combine df_1
and df_2
based on the index.inner
as the how
argument.
See Join Types.left_index
and right_index
arguments to True
which means the two data frames
will be joined based on the index. In this example, df_1
is the left data frame and df_2
is
the right one.df
will contain the rows that have the same index value in df_1
and df_2
.Extension: Join on MultiIndex
df = df_1.merge(df_2, how='inner', left_index=True, right_index=True)
Here is how this works: This works similarly to the scenario above except here the data frames have a MultiIndex.
Extension: Join a Single Index to a MultiIndex
df = df_1.merge(df_2, how='inner', left_index=True, right_index=True)
Here is how this works:
We wish to join two data frames based on the index from one data frame and a column from the other data frame.
df = df_1.merge(df_2, how='inner', left_index=True, right_on='col_1')
Here is how this works:
merge()
function in pandas to combine df_1
and df_2
based on the index.inner
as the how
argument.
See Join Types.left_index
to True
and
we pass col_1
from df_2
as the right_on
argument.df
will contain the rows that have the same value as an index in df_1
and col_1
in df_2
.Extension: Join on MultiIndex
df = df_1.merge(df_2, how='inner',
left_index=True,
right_on=['col_3', 'col_4'])
Here is how this works:
df_1
has MultiIndex.df_2
that maps to the MultiIndex hierarchy to the right_on
argument.