The JOIN clause (Inner Joins)
Contents
import sqlalchemy
engine = sqlalchemy.create_engine('postgresql://bob:1234@localhost:5432/dvdrental')
engine.execution_options(isolation_level="AUTOCOMMIT")
%load_ext sql
%sql $engine.url
'Connected: bob@dvdrental'
The JOIN clause (Inner Joins)#
So far, our queries have only accessed one table at a time, but probaby the strongest feature of SQL is to be able to access data from several tables from the same query. In order to do that we use the JOIN clause. The basic syntax looks like this:
SELECT table_A.column_1, table_B.some_column
FROM table_A
INNER JOIN table_B ON table_A.primary_key = table_B.foreign_key;
Let’s disect the query above:
First of all, note that in the
SELECT
statement we’re selecting columns from bothtable_A
andtable_B
.FROM table_A INNER JOIN table_B
indicates the tables we want to pull data from.Lastly,
ON table_A.primary_key = table_B.foreign_key
we’re selecting two columns. The final result will be the rows where the values of these two columns are common in both tables.
For example, if table_A
looks like this:
customer_id |
first_name |
last_name |
country_id |
---|---|---|---|
1 |
Lynda |
Chu |
3 |
2 |
John |
Doe |
2 |
3 |
Bob |
Ross |
2 |
And table_B
like this:
country_id |
country_name |
---|---|
1 |
USA |
2 |
UK |
3 |
China |
We could write the following query, to fetch data from both tables:
SELECT table_A.first_name, table_B.country_name
FROM table_A
INNER JOIN table_B ON table_A.country_name = table_B.country_name;
The query above is using the country_id
column to JOIN
both tables. This column is defined in table_A
as a foreign key and in table_B
as a primary key. The result of the query above would look like this:
first_name |
country_name |
---|---|
Lynda |
China |
John |
USA |
Bob |
USA |
As you may have notice, when writing the column names we have specify the table name as well (with dot notation). That’s because both tables could have columns with the same names, so we have to be specific about what table/column we mean.
The
INNER JOIN
clause returns rows from both tables (or one if we want) that have corresponding rows in both tables. By corresponding rows we mean rows with the same value. That’s why we have to be specific about the table/column that contain the corresponding values.
Example 1#
Let’s say we want to get data from a couple of tables:
From the
customer
table we want:first_name
,last_name
.From the
payment
table we want:amount
.
%%sql
SELECT customer.customer_id, first_name, last_name, amount
FROM customer
INNER JOIN payment ON customer.customer_id = payment.customer_id
ORDER BY customer.customer_id ASC
LIMIT 5;
* postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.
customer_id | first_name | last_name | amount |
---|---|---|---|
1 | Mary | Smith | 5.99 |
1 | Mary | Smith | 0.99 |
1 | Mary | Smith | 9.99 |
1 | Mary | Smith | 4.99 |
1 | Mary | Smith | 4.99 |
As you can see, we get multiple rows for Aaron Selby (and the other customers as well). Wouldn’t be cool, if we tune the query to group the payments (amount
) in a list, so we’d get a single customer per row?
%%sql
SELECT customer.customer_id, first_name, last_name, array_agg(amount) AS payments
FROM customer
INNER JOIN payment ON customer.customer_id = payment.customer_id
GROUP BY customer.customer_id
ORDER BY customer_id ASC
LIMIT 5;
* postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.
customer_id | first_name | last_name | payments |
---|---|---|---|
1 | Mary | Smith | [Decimal('5.99'), Decimal('0.99'), Decimal('9.99'), Decimal('4.99'), Decimal('4.99'), Decimal('0.99'), Decimal('3.99'), Decimal('4.99'), Decimal('3.99'), Decimal('0.99'), Decimal('4.99'), Decimal('0.99'), Decimal('0.99'), Decimal('2.99'), Decimal('0.99'), Decimal('1.99'), Decimal('2.99'), Decimal('5.99'), Decimal('5.99'), Decimal('5.99'), Decimal('4.99'), Decimal('4.99'), Decimal('7.99'), Decimal('2.99'), Decimal('4.99'), Decimal('4.99'), Decimal('0.99'), Decimal('0.99'), Decimal('2.99'), Decimal('2.99')] |
2 | Patricia | Johnson | [Decimal('2.99'), Decimal('0.99'), Decimal('0.99'), Decimal('5.99'), Decimal('6.99'), Decimal('2.99'), Decimal('2.99'), Decimal('2.99'), Decimal('4.99'), Decimal('5.99'), Decimal('4.99'), Decimal('4.99'), Decimal('2.99'), Decimal('6.99'), Decimal('4.99'), Decimal('5.99'), Decimal('5.99'), Decimal('5.99'), Decimal('2.99'), Decimal('5.99'), Decimal('4.99'), Decimal('10.99'), Decimal('0.99'), Decimal('6.99'), Decimal('6.99'), Decimal('2.99')] |
3 | Linda | Williams | [Decimal('8.99'), Decimal('6.99'), Decimal('6.99'), Decimal('2.99'), Decimal('5.99'), Decimal('4.99'), Decimal('8.99'), Decimal('2.99'), Decimal('8.99'), Decimal('0.99'), Decimal('2.99'), Decimal('4.99'), Decimal('4.99'), Decimal('5.99'), Decimal('10.99'), Decimal('7.99'), Decimal('6.99'), Decimal('4.99'), Decimal('4.99'), Decimal('2.99'), Decimal('1.99'), Decimal('3.99'), Decimal('2.99'), Decimal('4.99')] |
4 | Barbara | Jones | [Decimal('4.99'), Decimal('0.99'), Decimal('2.99'), Decimal('0.99'), Decimal('0.99'), Decimal('5.99'), Decimal('0.99'), Decimal('2.99'), Decimal('4.99'), Decimal('2.99'), Decimal('8.99'), Decimal('1.99'), Decimal('2.99'), Decimal('6.99'), Decimal('4.99'), Decimal('2.99'), Decimal('1.99'), Decimal('2.99'), Decimal('2.99'), Decimal('3.99'), Decimal('5.99'), Decimal('5.99')] |
5 | Elizabeth | Brown | [Decimal('3.99'), Decimal('2.99'), Decimal('4.99'), Decimal('2.99'), Decimal('4.99'), Decimal('4.99'), Decimal('0.99'), Decimal('4.99'), Decimal('4.99'), Decimal('3.99'), Decimal('9.99'), Decimal('2.99'), Decimal('1.99'), Decimal('0.99'), Decimal('6.99'), Decimal('6.99'), Decimal('2.99'), Decimal('0.99'), Decimal('4.99'), Decimal('2.99'), Decimal('4.99'), Decimal('5.99'), Decimal('1.99'), Decimal('0.99'), Decimal('8.99'), Decimal('3.99'), Decimal('4.99'), Decimal('0.99'), Decimal('0.99'), Decimal('3.99'), Decimal('2.99'), Decimal('1.99'), Decimal('4.99'), Decimal('3.99'), Decimal('0.99')] |
Example 2#
Let’s say we want to pull the following data:
From the
payment
table we want:payment_id
,amount
.From the
staff
table we want thefirst_name
andlast_name
of the staff member who processed the payment.
So we need to find a column to connect these two tables. Let’s take a look at payment
first:
%%sql
SELECT * FROM payment WHERE false;
* postgresql://bob:***@localhost:5432/dvdrental
0 rows affected.
payment_id | customer_id | staff_id | rental_id | amount | payment_date |
---|
Great, we have the payment_id
and the amount
of each payment. There’s also a column named staff_id
. Wouldn’t be great if this column was also available in the staff
table? Let’s take a look:
%%sql
SELECT * FROM staff WHERE false;
* postgresql://bob:***@localhost:5432/dvdrental
0 rows affected.
staff_id | first_name | last_name | address_id | store_id | active | username | password | last_update | picture |
---|
Guess what! There it is, the staff_id
column exists in this table as well (it’s the primary key). Let’s create the join then:
%%sql
SELECT payment_id, amount, first_name, last_name
FROM payment
INNER JOIN staff ON staff.staff_id = payment.staff_id
LIMIT 10;
* postgresql://bob:***@localhost:5432/dvdrental
10 rows affected.
payment_id | amount | first_name | last_name |
---|---|---|---|
17503 | 7.99 | Jon | Stephens |
17504 | 1.99 | Mike | Hillyer |
17505 | 7.99 | Mike | Hillyer |
17506 | 2.99 | Jon | Stephens |
17507 | 7.99 | Jon | Stephens |
17508 | 5.99 | Mike | Hillyer |
17509 | 5.99 | Jon | Stephens |
17510 | 5.99 | Mike | Hillyer |
17511 | 2.99 | Mike | Hillyer |
17512 | 4.99 | Jon | Stephens |
Example 3#
Let’s say we want to check the inventory of movies at one of the stores, and we want to know how many copies of each title we can find at the store:
%%sql
SELECT title, count(title) AS copies_at_store1
FROM inventory
INNER JOIN film ON inventory.film_id = film.film_id
WHERE store_id = 1
GROUP BY title
ORDER BY title
LIMIT 10;
* postgresql://bob:***@localhost:5432/dvdrental
10 rows affected.
title | copies_at_store1 |
---|---|
Academy Dinosaur | 4 |
Affair Prejudice | 4 |
Agent Truman | 3 |
Airplane Sierra | 2 |
Alabama Devil | 3 |
Aladdin Calendar | 4 |
Alamo Videotape | 4 |
Alaska Phantom | 3 |
Alien Center | 2 |
Alley Evolution | 2 |
Example 4#
Let’s say we want to list our movies in the film
table, and we want to see the language as well:
%%sql
SELECT film.title, language.name AS movie_language
FROM film
INNER JOIN language ON language.language_id = film.language_id
LIMIT 10;
* postgresql://bob:***@localhost:5432/dvdrental
10 rows affected.
title | movie_language |
---|---|
Chamber Italian | English |
Grosse Wonderful | English |
Airport Pollock | English |
Bright Encounters | English |
Academy Dinosaur | English |
Ace Goldfinger | English |
Adaptation Holes | English |
Affair Prejudice | English |
African Egg | English |
Agent Truman | English |