Filtering Joins

Filtering joins allow us to filter rows from a table based on the presence or absence of matches in another table.

There are two types of filtering joins which are:

  1. Semi Join which allows us to filter (return) rows of a table that have a match in another table.
  2. Anti Join which allows us to filter (return) rows of a table that have no match in another table.

These operations are useful in some filtering scenarios and also for debugging when carrying out table join operations ( see Joining).

Semi Join

We wish to return rows of a table that have a match in another table.

In this example, we wish to return rows in table table_1 that match rows in another table table_1. A row in table_1 is considered a match to a row in table_1 if the combination of values of columns col_1 and col_2 in both tables are a match.

SELECT t.*
FROM refcon.dataset.table_1 t
         INNER JOIN refcon.dataset.table_1_1 t1 ON
            t.col_1 = t1.col_1 AND
            t.col_2 = t1.col_2

Here is how this works:

  • INNER JOIN will only return rows where col_1 and col_2 have the same value in both tables.
  • We use t.* to keep only columns from table_1.

Alternatively:

One Join Column

SELECT *
FROM refcon.dataset.table_1
WHERE col_1 IN (SELECT col_1 FROM refcon.dataset.table_1_1)

Here is how this works:

  • The approach we use here involves using the IN method to identify rows in the table table_1 that have a match in the table table_1.
  • We get the values of col_1 from table_1 in a subquery, and then we apply the IN operator.
  • In this case, the join columns have the same names in both tables. Had they been named differently we simply use the corresponding column names for each table e.g. WHERE col_1 IN (SELECT col_2 FROM refcon.dataset.table_1_1).

Anti Join

We wish to return rows of a table that do not have a match in another table.

In this example, we have two tables table_1 and table_1 and we wish to retain rows that are in table_1 but not in table_1. A row in table_1 is considered a match to a row in table_1 if the values of columns col_1 and col_2 in both tables are a match.

SELECT t.*
FROM refcon.dataset.table_1 t
         LEFT JOIN refcon.dataset.table_1_1 t1 ON
            t.col_1 = t1.col_1 AND
            t.col_2 = t1.col_2
WHERE t1.col_1 IS NULL;

Here is how this works:

  • There is no dedicated anti join function in SQL.
  • We use a LEFT JOIN to join the two tables on col_1 and col_2.
  • Rows that don't match on col_1 and col_2 will have NULL values for all columns coming from table_1.
  • We filter using one of our joining columns using WHERE t1.col_1 IS NULL which will filter our matching rows.

Alternatively:

One Join Column

SELECT *
FROM refcon.dataset.table_1
WHERE col_1 NOT IN (SELECT col_1 FROM refcon.dataset.table_1_1)

Here is how this works:

  • This is the same as the alternative semi join solution described above.
  • We add a NOT operator to take the complement.
SQL
I/O