import sqlalchemy
engine = sqlalchemy.create_engine('postgresql://bob:1234@localhost:5432/dvdrental')
engine.execution_options(isolation_level="AUTOCOMMIT")
%load_ext sql
%sql $engine.url
'Connected: bob@dvdrental'

Subqueries#

A subquery allows us to use multiple SELECT statements nested within each other. Think of it as having a query within a query. For example, imagine we want to list the names of the hotels with prices below average:

SELECT
    name
FROM
    hotel
WHERE price < (SELECT AVG (price) FROM hotel);

In the query above, the subquery in parentheses (SELECT AVG (price) FROM hotel) allows us to get the average price of all rows in the hotel table. We use that value in the outer query to filter the hotel names whose price is below average.

An example#

Let’s see an example similar to the one described above, but this time on the dvdrental database. Let’s say we want to list all the movies in the film database whose price is below average:

%%sql
SELECT film_id, title, rental_rate
FROM film
WHERE rental_rate < (SELECT avg(rental_rate) FROM film)
LIMIT 10;
 * postgresql://bob:***@localhost:5432/dvdrental
10 rows affected.
film_id title rental_rate
1 Academy Dinosaur 0.99
11 Alamo Videotape 0.99
12 Alaska Phantom 0.99
213 Date Speed 0.99
14 Alice Fantasia 0.99
17 Alone Trip 0.99
18 Alter Victory 0.99
19 Amadeus Holy 0.99
23 Anaconda Confessions 0.99
26 Annie Identity 0.99

We’ve used the result of the subquery (SELECT avg(rental_rate) FROM film) to filter the results of the outer query.

Without Subqueries#

To better understand the benefit of using subqueries, let’s see how we could achieve the same result in two separate queries:

As you can see, we get a lot of decimals, which we can round by using one of the mathematical functions available in PostgreSQL:

%%sql
SELECT avg(rental_rate) FROM film;
 * postgresql://bob:***@localhost:5432/dvdrental
1 rows affected.
avg
2.9800000000000000

Once we’ve found the average price of the movies in the film table, we’d use the resulting value in a second query:

%%sql
SELECT film_id, title, rental_rate
FROM film
WHERE rental_rate < 2.98
LIMIT 10;
 * postgresql://bob:***@localhost:5432/dvdrental
10 rows affected.
film_id title rental_rate
1 Academy Dinosaur 0.99
11 Alamo Videotape 0.99
12 Alaska Phantom 0.99
213 Date Speed 0.99
14 Alice Fantasia 0.99
17 Alone Trip 0.99
18 Alter Victory 0.99
19 Amadeus Holy 0.99
23 Anaconda Confessions 0.99
26 Annie Identity 0.99

The result is identical but it requires us to write down the result of the first query (or store it programatically in a variable) and then use it in a second query. That would affect the performance of our application (two trips to the database).

Another Example#

Let’s say we want to obtain the list of films that have returned between May 29th and May 30th. Let’s explore the tables involved:

%%sql
SELECT * FROM rental
WHERE false;
 * postgresql://bob:***@localhost:5432/dvdrental
0 rows affected.
rental_id rental_date inventory_id customer_id return_date staff_id last_update
%%sql
SELECT * FROM inventory
WHERE false;
 * postgresql://bob:***@localhost:5432/dvdrental
0 rows affected.
inventory_id film_id store_id last_update
%%sql
SELECT * FROM film
WHERE false;
 * postgresql://bob:***@localhost:5432/dvdrental
0 rows affected.
film_id title description release_year language_id rental_duration rental_rate length replacement_cost rating last_update special_features fulltext

We want to join the rental and the inventory tables on the inventory_id column, getting a list of the movies (their film_id actually) with the returned date mentioned at the beginning:

%%sql
SELECT inventory.film_id
FROM rental
JOIN inventory ON inventory.inventory_id = rental.inventory_id
WHERE
    return_date BETWEEN '2005-05-29' AND '2005-05-30'
LIMIT 10;
 * postgresql://bob:***@localhost:5432/dvdrental
10 rows affected.
film_id
870
971
573
288
89
681
858
776
257
397

Great! We got the list of film_id we wanted (limited to 10 rows for practical reasons). But we were interested in the movie names… How about we use the query above as a subquery?

%%sql
SELECT film_id, title FROM film
WHERE film_id IN
    (SELECT inventory.film_id FROM rental
        INNER JOIN inventory ON inventory.inventory_id = rental.inventory_id
        WHERE return_date BETWEEN '2005-05-29' AND '2005-05-30')
LIMIT 10;
 * postgresql://bob:***@localhost:5432/dvdrental
10 rows affected.
film_id title
307 Fellowship Autumn
255 Driving Polish
388 Gunfight Moon
130 Celebrity Horn
563 Massacre Usual
397 Hanky October
898 Tourist Pelican
228 Detective Vision
347 Games Bowfinger
1000 Zorro Ark