Joining tables is a common task in data analysis and manipulation, but it can also present some special cases that we need to be mindful of and handle as appropriate for the purpose at hand.
In this section, we will discuss some of these special cases, including:
This section is complemented by:
We wish to rename columns that have the same name in both tables being joined.
In this example, we wish to rename col_2
coming from table_1
to col_2_x
and rename col_2
coming from table_2
to col_2_y
.
SELECT t1.col_1,
t1.col_2 AS col_2_x,
t1.col_3,
t2.col_2 AS col_2_y
FROM table_1 AS t1
INNER JOIN table_2 AS t2 ON
t1.col_1 = t2.col_1;
Here is how this works:
INNER JOIN
operation in SQL to combine table_1
and table_2
based on the common
column col_1
.AS
, in this example table_1
alias is t1
and table_2
alias is t_2
. We recommend using table alias always when performing a join, so we can easily
select the columns we wish to keep from each table and write clear join conditions.ON
clause, in this example t1.col_1 = t2.col_1;
.
See Join Columns to learn how to
change this behavior._1
and _2
as suffixes to resolve duplicate column
names if we select all columns using .*
notation.
See Selecting By NameSELECT
statement, and we rename the
columns using AS
alias. See Renaming.col_1
in
both table_1
and table_2
and the selected columns after renaming.We wish to join two tables and handle duplicate values in the join columns by removing them. If the desired output is a Cartesian product, then there is no need to remove duplicate values.
In this example, we wish to drop rows with duplicate values in col_1
in table_1
and then perform
the join operation.
WITH table_1_distinct AS
(SELECT *
FROM table_1
QUALIFY ROW_NUMBER() OVER (PARTITION BY col_1) = 1)
SELECT t1.col_1,
t1.col_2,
t1.col_3,
t2.col_4
FROM table_1_distinct AS t1
INNER JOIN table_2 AS t2 ON t1.col_1 = t2.col_1;
Here is how this works:
QUALIFY ROW_NUMBER()
function to drop rows with duplicate values from the
column col_1
of the table table_1
.
See Uniqueness.col_1
in
both table_1
and table_2
after dropping rows with duplicate values in table_1
.We wish to join two tables with missing values in the join columns. NULL
values should not be
matched because NULL
values are not comparable. That said, we should be mindful that
ignoring NULL
values will have different outcomes based on the join type.
NULL
values for the columns from the right table.NULL
values for the columns from the left table.NULL
values for the columns from the other table.This is the default behoving in SQL and can be changed by adding a specific join condition to check
for NULL
values.
In this example, we wish to join table_1
and table_2
based on col_1
and consider NULL
values
as equal.
SELECT t1.col_1,
t1.col_2,
t1.col_3,
t2.col_4
FROM table_1 AS t1
INNER JOIN table_2 AS t2 ON
t1.col_1 = t2.col_1 OR
(t1.col_1 IS NULL AND t2.col_1 IS NULL);
Here is how this works:
INNER JOIN
operation in SQL to combine table_1
and table_2
based on the common
column col_1
.col_1
will be combined. Sice NULL
will be ignored
in SQL, we use an OR
condition to check if col_1
is NULL
in both
tables (t1.col_1 IS NULL AND t2.col_1 IS NULL)
.col_1
in both table_1
and table_2
or rows that have NULL
value in col_1
in both tables. Note that this will
generate a cartesian product between rows with NULL
values in col_1
.We wish to join two tables on a column with a different data type.
In this example, we wish to join table_1
and table_2
on col_1
which is of type integer
in table_1
and of type string in table_2
.
SELECT t1.col_1,
t1.col_2,
t1.col_3,
t2.col_4
FROM table_1 AS t1
INNER JOIN table_2 AS t2 ON
t1.col_1 = SAFE_CAST(t2.col_1 AS int64)
Here is how this works:
We cast col_1
to integer before joining using SAFE_CAST(t2.col_1 AS int64)
function.
See Type Casting.