Subqueries
Contents
import sqlalchemy
engine = sqlalchemy.create_engine('postgresql://bob:1234@localhost:5432/dvdrental')
engine.execution_options(isolation_level="AUTOCOMMIT")
%load_ext sql
%sql $engine.url
'Connected: bob@dvdrental'
Subqueries#
A subquery allows us to use multiple SELECT statements nested within each other. Think of it as having a query within a query. For example, imagine we want to list the names of the hotels with prices below average:
SELECT
name
FROM
hotel
WHERE price < (SELECT AVG (price) FROM hotel);
In the query above, the subquery in parentheses (SELECT AVG (price) FROM hotel
) allows us to get the average price of all rows in the hotel
table. We use that value in the outer query to filter the hotel names whose price is below average.
An example#
Let’s see an example similar to the one described above, but this time on the dvdrental
database. Let’s say we want to list all the movies in the film
database whose price is below average:
%%sql
SELECT film_id, title, rental_rate
FROM film
WHERE rental_rate < (SELECT avg(rental_rate) FROM film)
LIMIT 10;
* postgresql://bob:***@localhost:5432/dvdrental
10 rows affected.
film_id | title | rental_rate |
---|---|---|
1 | Academy Dinosaur | 0.99 |
11 | Alamo Videotape | 0.99 |
12 | Alaska Phantom | 0.99 |
213 | Date Speed | 0.99 |
14 | Alice Fantasia | 0.99 |
17 | Alone Trip | 0.99 |
18 | Alter Victory | 0.99 |
19 | Amadeus Holy | 0.99 |
23 | Anaconda Confessions | 0.99 |
26 | Annie Identity | 0.99 |
We’ve used the result of the subquery (SELECT avg(rental_rate) FROM film)
to filter the results of the outer query.
Without Subqueries#
To better understand the benefit of using subqueries, let’s see how we could achieve the same result in two separate queries:
As you can see, we get a lot of decimals, which we can round
by using one of the mathematical functions available in PostgreSQL:
%%sql
SELECT avg(rental_rate) FROM film;
* postgresql://bob:***@localhost:5432/dvdrental
1 rows affected.
avg |
---|
2.9800000000000000 |
Once we’ve found the average price of the movies in the film
table, we’d use the resulting value in a second query:
%%sql
SELECT film_id, title, rental_rate
FROM film
WHERE rental_rate < 2.98
LIMIT 10;
* postgresql://bob:***@localhost:5432/dvdrental
10 rows affected.
film_id | title | rental_rate |
---|---|---|
1 | Academy Dinosaur | 0.99 |
11 | Alamo Videotape | 0.99 |
12 | Alaska Phantom | 0.99 |
213 | Date Speed | 0.99 |
14 | Alice Fantasia | 0.99 |
17 | Alone Trip | 0.99 |
18 | Alter Victory | 0.99 |
19 | Amadeus Holy | 0.99 |
23 | Anaconda Confessions | 0.99 |
26 | Annie Identity | 0.99 |
The result is identical but it requires us to write down the result of the first query (or store it programatically in a variable) and then use it in a second query. That would affect the performance of our application (two trips to the database).
Another Example#
Let’s say we want to obtain the list of films that have returned between May 29th and May 30th. Let’s explore the tables involved:
%%sql
SELECT * FROM rental
WHERE false;
* postgresql://bob:***@localhost:5432/dvdrental
0 rows affected.
rental_id | rental_date | inventory_id | customer_id | return_date | staff_id | last_update |
---|
%%sql
SELECT * FROM inventory
WHERE false;
* postgresql://bob:***@localhost:5432/dvdrental
0 rows affected.
inventory_id | film_id | store_id | last_update |
---|
%%sql
SELECT * FROM film
WHERE false;
* postgresql://bob:***@localhost:5432/dvdrental
0 rows affected.
film_id | title | description | release_year | language_id | rental_duration | rental_rate | length | replacement_cost | rating | last_update | special_features | fulltext |
---|
We want to join the rental
and the inventory
tables on the inventory_id
column, getting a list of the movies (their film_id
actually) with the returned date mentioned at the beginning:
%%sql
SELECT inventory.film_id
FROM rental
JOIN inventory ON inventory.inventory_id = rental.inventory_id
WHERE
return_date BETWEEN '2005-05-29' AND '2005-05-30'
LIMIT 10;
* postgresql://bob:***@localhost:5432/dvdrental
10 rows affected.
film_id |
---|
870 |
971 |
573 |
288 |
89 |
681 |
858 |
776 |
257 |
397 |
Great! We got the list of film_id
we wanted (limited to 10 rows for practical reasons). But we were interested in the movie names… How about we use the query above as a subquery?
%%sql
SELECT film_id, title FROM film
WHERE film_id IN
(SELECT inventory.film_id FROM rental
INNER JOIN inventory ON inventory.inventory_id = rental.inventory_id
WHERE return_date BETWEEN '2005-05-29' AND '2005-05-30')
LIMIT 10;
* postgresql://bob:***@localhost:5432/dvdrental
10 rows affected.
film_id | title |
---|---|
307 | Fellowship Autumn |
255 | Driving Polish |
388 | Gunfight Moon |
130 | Celebrity Horn |
563 | Massacre Usual |
397 | Hanky October |
898 | Tourist Pelican |
228 | Detective Vision |
347 | Games Bowfinger |
1000 | Zorro Ark |