LEFT OUTER JOIN
Contents
import sqlalchemy
engine = sqlalchemy.create_engine('postgresql://bob:1234@localhost:5432/dvdrental')
engine.execution_options(isolation_level="AUTOCOMMIT")
%load_ext sql
%sql $engine.url
'Connected: bob@dvdrental'
LEFT OUTER JOIN#
Another very used join operation is LEFT OUTER JOIN
, useful in those cases where we want to pull all the rows from a table, plus the rows in the second table with matching values in the column being used for the join. The thing is that, since we’re pulling all the rows from the first table, some of the rows won’t have a corresponding one in the second table; we’ll get null
for those rows. Check the diagram:
If that’s still confusing check this one out:
The end result shows all the values from the left table, even though the yellow row doesn’t have a match in the table to the right (the gap represents a null
).
Example 1#
Let’s say we want to get data from the film
and inventory
tables:
The
film
table contains the movie titles.The
inventory
table contains the copies of the movies kept at the stores.
But the thing is that the some rows in the film
table may not have a corresponding one in the inventory
table; think for example the situation where all the copies have been rented. It may also be possible that there are several copies available for a movie title
in the same or different stores.
From the
film
table we want:first_name
,last_name
.From the
payment
table we want:amount
.
Let’s take a look at the columns of the tables we’ll be working with:
%%sql
SELECT *
FROM film
WHERE FALSE;
* postgresql://bob:***@localhost:5432/dvdrental
0 rows affected.
film_id | title | description | release_year | language_id | rental_duration | rental_rate | length | replacement_cost | rating | last_update | special_features | fulltext |
---|
Now let’s take a look to the first few rows in the inventory
table:
%%sql
SELECT *
FROM inventory
ORDER BY film_id ASC
LIMIT 9;
* postgresql://bob:***@localhost:5432/dvdrental
9 rows affected.
inventory_id | film_id | store_id | last_update |
---|---|---|---|
1 | 1 | 1 | 2006-02-15 10:09:17 |
3 | 1 | 1 | 2006-02-15 10:09:17 |
2 | 1 | 1 | 2006-02-15 10:09:17 |
5 | 1 | 2 | 2006-02-15 10:09:17 |
6 | 1 | 2 | 2006-02-15 10:09:17 |
7 | 1 | 2 | 2006-02-15 10:09:17 |
8 | 1 | 2 | 2006-02-15 10:09:17 |
4 | 1 | 1 | 2006-02-15 10:09:17 |
9 | 2 | 2 | 2006-02-15 10:09:17 |
As you can see, the movie with film_id
1 has several copies in both stores. Let’s write a query to get information from these both tables:
%%sql
SELECT film.film_id, film.title, inventory_id
FROM film
LEFT JOIN inventory ON inventory.film_id = film.film_id
ORDER BY film_id ASC
LIMIT 9;
* postgresql://bob:***@localhost:5432/dvdrental
9 rows affected.
film_id | title | inventory_id |
---|---|---|
1 | Academy Dinosaur | 1 |
1 | Academy Dinosaur | 2 |
1 | Academy Dinosaur | 3 |
1 | Academy Dinosaur | 4 |
1 | Academy Dinosaur | 5 |
1 | Academy Dinosaur | 6 |
1 | Academy Dinosaur | 7 |
1 | Academy Dinosaur | 8 |
2 | Ace Goldfinger | 9 |
As you can see, we got multiple rows for the same title
due to the fact of several copies of the same movie.
Note that we wrote
LEFT JOIN
and notLEFT OUTER JOIN
, because by default, left joins are outer.
Now let’s say we want to order films that we don’t have in our inventory, meaning we’re interested in finding out about the movie titles whose inventory_id
is null
(because we want to order them form our distributor, to keep our customers happy).
%%sql
SELECT film.film_id, film.title, inventory_id
FROM film
LEFT JOIN inventory ON inventory.film_id = film.film_id
WHERE inventory_id IS null
LIMIT 10;
* postgresql://bob:***@localhost:5432/dvdrental
10 rows affected.
film_id | title | inventory_id |
---|---|---|
802 | Sky Miracle | None |
497 | Kill Brotherhood | None |
801 | Sister Freddy | None |
359 | Gladiator Westward | None |
325 | Floats Garden | None |
33 | Apollo Teen | None |
198 | Crystal Breaking | None |
419 | Hocus Frida | None |
332 | Frankenstein Stranger | None |
712 | Raiders Antitrust | None |