import sqlalchemy
engine = sqlalchemy.create_engine('postgresql://bob:1234@localhost:5432/dvdrental')
engine.execution_options(isolation_level="AUTOCOMMIT")
%load_ext sql
%sql $engine.url
'Connected: bob@dvdrental'

LEFT OUTER JOIN#

Another very used join operation is LEFT OUTER JOIN, useful in those cases where we want to pull all the rows from a table, plus the rows in the second table with matching values in the column being used for the join. The thing is that, since we’re pulling all the rows from the first table, some of the rows won’t have a corresponding one in the second table; we’ll get null for those rows. Check the diagram:

left outer join

If that’s still confusing check this one out:

left outer join

The end result shows all the values from the left table, even though the yellow row doesn’t have a match in the table to the right (the gap represents a null).

Example 1#

Let’s say we want to get data from the film and inventory tables:

  • The film table contains the movie titles.

  • The inventory table contains the copies of the movies kept at the stores.

But the thing is that the some rows in the film table may not have a corresponding one in the inventory table; think for example the situation where all the copies have been rented. It may also be possible that there are several copies available for a movie title in the same or different stores.

  • From the film table we want: first_name, last_name.

  • From the payment table we want: amount.

Let’s take a look at the columns of the tables we’ll be working with:

%%sql
SELECT *
FROM film
WHERE FALSE;
 * postgresql://bob:***@localhost:5432/dvdrental
0 rows affected.
film_id title description release_year language_id rental_duration rental_rate length replacement_cost rating last_update special_features fulltext

Now let’s take a look to the first few rows in the inventory table:

%%sql
SELECT *
FROM inventory
ORDER BY film_id ASC
LIMIT 9;
 * postgresql://bob:***@localhost:5432/dvdrental
9 rows affected.
inventory_id film_id store_id last_update
1 1 1 2006-02-15 10:09:17
3 1 1 2006-02-15 10:09:17
2 1 1 2006-02-15 10:09:17
5 1 2 2006-02-15 10:09:17
6 1 2 2006-02-15 10:09:17
7 1 2 2006-02-15 10:09:17
8 1 2 2006-02-15 10:09:17
4 1 1 2006-02-15 10:09:17
9 2 2 2006-02-15 10:09:17

As you can see, the movie with film_id 1 has several copies in both stores. Let’s write a query to get information from these both tables:

%%sql
SELECT film.film_id, film.title, inventory_id
FROM film
LEFT JOIN inventory ON inventory.film_id = film.film_id
ORDER BY film_id ASC
LIMIT 9;
 * postgresql://bob:***@localhost:5432/dvdrental
9 rows affected.
film_id title inventory_id
1 Academy Dinosaur 1
1 Academy Dinosaur 2
1 Academy Dinosaur 3
1 Academy Dinosaur 4
1 Academy Dinosaur 5
1 Academy Dinosaur 6
1 Academy Dinosaur 7
1 Academy Dinosaur 8
2 Ace Goldfinger 9

As you can see, we got multiple rows for the same title due to the fact of several copies of the same movie.

Note that we wrote LEFT JOIN and not LEFT OUTER JOIN, because by default, left joins are outer.

Now let’s say we want to order films that we don’t have in our inventory, meaning we’re interested in finding out about the movie titles whose inventory_id is null (because we want to order them form our distributor, to keep our customers happy).

%%sql
SELECT film.film_id, film.title, inventory_id
FROM film
LEFT JOIN inventory ON inventory.film_id = film.film_id
WHERE inventory_id IS null
LIMIT 10;
 * postgresql://bob:***@localhost:5432/dvdrental
10 rows affected.
film_id title inventory_id
802 Sky Miracle None
497 Kill Brotherhood None
801 Sister Freddy None
359 Gladiator Westward None
325 Floats Garden None
33 Apollo Teen None
198 Crystal Breaking None
419 Hocus Frida None
332 Frankenstein Stranger None
712 Raiders Antitrust None