import sqlalchemy
engine = sqlalchemy.create_engine('postgresql://bob:1234@localhost:5432/dvdrental')
engine.execution_options(isolation_level="AUTOCOMMIT")
%load_ext sql
%sql $engine.url
'Connected: bob@dvdrental'

The JOIN clause (Inner Joins)#

So far, our queries have only accessed one table at a time, but probaby the strongest feature of SQL is to be able to access data from several tables from the same query. In order to do that we use the JOIN clause. The basic syntax looks like this:

SELECT table_A.column_1, table_B.some_column
FROM table_A
INNER JOIN table_B ON table_A.primary_key = table_B.foreign_key;

Let’s disect the query above:

  1. First of all, note that in the SELECT statement we’re selecting columns from both table_A and table_B.

  2. FROM table_A INNER JOIN table_B indicates the tables we want to pull data from.

  3. Lastly, ON table_A.primary_key = table_B.foreign_key we’re selecting two columns. The final result will be the rows where the values of these two columns are common in both tables.

For example, if table_A looks like this:

customer_id

first_name

last_name

country_id

1

Lynda

Chu

3

2

John

Doe

2

3

Bob

Ross

2

And table_B like this:

country_id

country_name

1

USA

2

UK

3

China

We could write the following query, to fetch data from both tables:

SELECT table_A.first_name, table_B.country_name
FROM table_A
INNER JOIN table_B ON table_A.country_name = table_B.country_name;

The query above is using the country_id column to JOIN both tables. This column is defined in table_A as a foreign key and in table_B as a primary key. The result of the query above would look like this:

first_name

country_name

Lynda

China

John

USA

Bob

USA

As you may have notice, when writing the column names we have specify the table name as well (with dot notation). That’s because both tables could have columns with the same names, so we have to be specific about what table/column we mean.

The INNER JOIN clause returns rows from both tables (or one if we want) that have corresponding rows in both tables. By corresponding rows we mean rows with the same value. That’s why we have to be specific about the table/column that contain the corresponding values.

Example 1#

Let’s say we want to get data from a couple of tables:

  • From the customer table we want: first_name, last_name.

  • From the payment table we want: amount.

%%sql
SELECT customer.customer_id, first_name, last_name, amount
FROM customer
INNER JOIN payment ON customer.customer_id = payment.customer_id
ORDER BY customer.customer_id ASC
LIMIT 5;
 * postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.
customer_id first_name last_name amount
1 Mary Smith 5.99
1 Mary Smith 0.99
1 Mary Smith 9.99
1 Mary Smith 4.99
1 Mary Smith 4.99

As you can see, we get multiple rows for Aaron Selby (and the other customers as well). Wouldn’t be cool, if we tune the query to group the payments (amount) in a list, so we’d get a single customer per row?

%%sql
SELECT customer.customer_id, first_name, last_name, array_agg(amount) AS payments
FROM customer
INNER JOIN payment ON customer.customer_id = payment.customer_id
GROUP BY customer.customer_id
ORDER BY customer_id ASC
LIMIT 5;
 * postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.
customer_id first_name last_name payments
1 Mary Smith [Decimal('5.99'), Decimal('0.99'), Decimal('9.99'), Decimal('4.99'), Decimal('4.99'), Decimal('0.99'), Decimal('3.99'), Decimal('4.99'), Decimal('3.99'), Decimal('0.99'), Decimal('4.99'), Decimal('0.99'), Decimal('0.99'), Decimal('2.99'), Decimal('0.99'), Decimal('1.99'), Decimal('2.99'), Decimal('5.99'), Decimal('5.99'), Decimal('5.99'), Decimal('4.99'), Decimal('4.99'), Decimal('7.99'), Decimal('2.99'), Decimal('4.99'), Decimal('4.99'), Decimal('0.99'), Decimal('0.99'), Decimal('2.99'), Decimal('2.99')]
2 Patricia Johnson [Decimal('2.99'), Decimal('0.99'), Decimal('0.99'), Decimal('5.99'), Decimal('6.99'), Decimal('2.99'), Decimal('2.99'), Decimal('2.99'), Decimal('4.99'), Decimal('5.99'), Decimal('4.99'), Decimal('4.99'), Decimal('2.99'), Decimal('6.99'), Decimal('4.99'), Decimal('5.99'), Decimal('5.99'), Decimal('5.99'), Decimal('2.99'), Decimal('5.99'), Decimal('4.99'), Decimal('10.99'), Decimal('0.99'), Decimal('6.99'), Decimal('6.99'), Decimal('2.99')]
3 Linda Williams [Decimal('8.99'), Decimal('6.99'), Decimal('6.99'), Decimal('2.99'), Decimal('5.99'), Decimal('4.99'), Decimal('8.99'), Decimal('2.99'), Decimal('8.99'), Decimal('0.99'), Decimal('2.99'), Decimal('4.99'), Decimal('4.99'), Decimal('5.99'), Decimal('10.99'), Decimal('7.99'), Decimal('6.99'), Decimal('4.99'), Decimal('4.99'), Decimal('2.99'), Decimal('1.99'), Decimal('3.99'), Decimal('2.99'), Decimal('4.99')]
4 Barbara Jones [Decimal('4.99'), Decimal('0.99'), Decimal('2.99'), Decimal('0.99'), Decimal('0.99'), Decimal('5.99'), Decimal('0.99'), Decimal('2.99'), Decimal('4.99'), Decimal('2.99'), Decimal('8.99'), Decimal('1.99'), Decimal('2.99'), Decimal('6.99'), Decimal('4.99'), Decimal('2.99'), Decimal('1.99'), Decimal('2.99'), Decimal('2.99'), Decimal('3.99'), Decimal('5.99'), Decimal('5.99')]
5 Elizabeth Brown [Decimal('3.99'), Decimal('2.99'), Decimal('4.99'), Decimal('2.99'), Decimal('4.99'), Decimal('4.99'), Decimal('0.99'), Decimal('4.99'), Decimal('4.99'), Decimal('3.99'), Decimal('9.99'), Decimal('2.99'), Decimal('1.99'), Decimal('0.99'), Decimal('6.99'), Decimal('6.99'), Decimal('2.99'), Decimal('0.99'), Decimal('4.99'), Decimal('2.99'), Decimal('4.99'), Decimal('5.99'), Decimal('1.99'), Decimal('0.99'), Decimal('8.99'), Decimal('3.99'), Decimal('4.99'), Decimal('0.99'), Decimal('0.99'), Decimal('3.99'), Decimal('2.99'), Decimal('1.99'), Decimal('4.99'), Decimal('3.99'), Decimal('0.99')]

Example 2#

Let’s say we want to pull the following data:

  • From the payment table we want: payment_id, amount.

  • From the staff table we want the first_name and last_name of the staff member who processed the payment.

So we need to find a column to connect these two tables. Let’s take a look at payment first:

%%sql
SELECT * FROM payment WHERE false;
 * postgresql://bob:***@localhost:5432/dvdrental
0 rows affected.
payment_id customer_id staff_id rental_id amount payment_date

Great, we have the payment_id and the amount of each payment. There’s also a column named staff_id. Wouldn’t be great if this column was also available in the staff table? Let’s take a look:

%%sql
SELECT * FROM staff WHERE false;
 * postgresql://bob:***@localhost:5432/dvdrental
0 rows affected.
staff_id first_name last_name address_id email store_id active username password last_update picture

Guess what! There it is, the staff_id column exists in this table as well (it’s the primary key). Let’s create the join then:

%%sql
SELECT payment_id, amount, first_name, last_name
FROM payment
INNER JOIN staff ON staff.staff_id = payment.staff_id
LIMIT 10;
 * postgresql://bob:***@localhost:5432/dvdrental
10 rows affected.
payment_id amount first_name last_name
17503 7.99 Jon Stephens
17504 1.99 Mike Hillyer
17505 7.99 Mike Hillyer
17506 2.99 Jon Stephens
17507 7.99 Jon Stephens
17508 5.99 Mike Hillyer
17509 5.99 Jon Stephens
17510 5.99 Mike Hillyer
17511 2.99 Mike Hillyer
17512 4.99 Jon Stephens

Example 3#

Let’s say we want to check the inventory of movies at one of the stores, and we want to know how many copies of each title we can find at the store:

%%sql
SELECT title, count(title) AS copies_at_store1
FROM inventory
INNER JOIN film ON inventory.film_id = film.film_id
WHERE store_id = 1
GROUP BY title
ORDER BY title
LIMIT 10;
 * postgresql://bob:***@localhost:5432/dvdrental
10 rows affected.
title copies_at_store1
Academy Dinosaur 4
Affair Prejudice 4
Agent Truman 3
Airplane Sierra 2
Alabama Devil 3
Aladdin Calendar 4
Alamo Videotape 4
Alaska Phantom 3
Alien Center 2
Alley Evolution 2

Example 4#

Let’s say we want to list our movies in the film table, and we want to see the language as well:

%%sql
SELECT film.title, language.name AS movie_language
FROM film
INNER JOIN language ON language.language_id = film.language_id
LIMIT 10;
 * postgresql://bob:***@localhost:5432/dvdrental
10 rows affected.
title movie_language
Chamber Italian English
Grosse Wonderful English
Airport Pollock English
Bright Encounters English
Academy Dinosaur English
Ace Goldfinger English
Adaptation Holes English
Affair Prejudice English
African Egg English
Agent Truman English