The SELECT statement
Contents
The SELECT statement#
Before going into details, let’s create a connection to our database:
import sqlalchemy
engine = sqlalchemy.create_engine('postgresql://bob:1234@localhost:5432/dvdrental')
engine.execution_options(isolation_level="AUTOCOMMIT")
%load_ext sql
%sql $engine.url
'Connected: bob@dvdrental'
Querying All columns#
Most of the times, the SELECT statement is used to fetch rows from a given table. To query data from all the columns in a given table we would use the following syntax:
SELECT * FROM table_name;
If for some reason, you don’t remember the names of the tables in a given database, you can use the
\dt
meta-command. Or if you’re using pgAdmin, you can right-click onsql-lab > Databases > dvdrental > Schemas > public > Tables
.
In the query above we’re using the *
special character. For example, let’s pull all the columns from the actor
table:
%%sql
SELECT * FROM actor LIMIT 5;
* postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.
actor_id | first_name | last_name | last_update |
---|---|---|---|
1 | Penelope | Guiness | 2013-05-26 14:47:57.620000 |
2 | Nick | Wahlberg | 2013-05-26 14:47:57.620000 |
3 | Ed | Chase | 2013-05-26 14:47:57.620000 |
4 | Jennifer | Davis | 2013-05-26 14:47:57.620000 |
5 | Johnny | Lollobrigida | 2013-05-26 14:47:57.620000 |
By the way, in the query above, we can see we’re using the LIMIT statement in order to limit the number of rows in the result.
Querying Some Columns#
For tables tables with small amount of columns, using the *
in the SELECT
statement is just fine. But for tables with a lot of columns, this is not a good practice. It’d increase the amount of data that our application has to deal with, slowing it down. It’s better to be specific about what columns we’re interested in pulling out. The general syntax for doing that is quite simple:
SELECT column1, column2 FROM table_name;
As you can see, we have to separate the column names by a comma (,
). For example, let’s say we want to pull out the first_name
and last_name
of the first five actors in the actor
table:
%%sql
SELECT first_name, last_name FROM actor LIMIT 5;
* postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.
first_name | last_name |
---|---|
Penelope | Guiness |
Nick | Wahlberg |
Ed | Chase |
Jennifer | Davis |
Johnny | Lollobrigida |
A trick to get the Column Names#
Often times, we won’t remember the names of the columns in a given table. In that case, the following query may come in handy:
%%sql
SELECT * FROM actor WHERE false;
* postgresql://bob:***@localhost:5432/dvdrental
0 rows affected.
actor_id | first_name | last_name | last_update |
---|
Don’t worry too much right now about the meaning of the WHERE clause ;-)
The DISTINCT clause#
Another useful clause we can use with the WHERE statement is the DISTINCT
clause. For example, let’s say we’re interested in the rate for renting movies; if we do:
%%sql
SELECT rental_rate FROM film LIMIT 5;
* postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.
rental_rate |
---|
4.99 |
4.99 |
4.99 |
4.99 |
0.99 |
We can see that among the first movies, we have different rates (4.99
and 0.99
) but we don’t want to check all of the rows to see how many different rates are available. To easily find out how many different rates exist in this table, we could add a little tweak to our last statemement:
%%sql
SELECT DISTINCT rental_rate FROM film;
* postgresql://bob:***@localhost:5432/dvdrental
3 rows affected.
rental_rate |
---|
2.99 |
4.99 |
0.99 |
As we can see, now it’s clear that there are only three type of rates. Another example, imagine we want to know how many movie ratings are available in the USA:
%%sql
SELECT DISTINCT rating FROM film;
* postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.
rating |
---|
G |
PG |
PG-13 |
R |
NC-17 |
The WHERE clause#
Sometimes we want to filter the results of our query to the rows that satisfy a certain condition. The WHERE clause is use to specify that condition. This is the general syntax for it:
SELECT column_1, column_2
FROM table_name
WHERE condition;
Comparison Operators#
As you can see, the WHERE
clause appears after the FROM
clause of the SELECT
statement. condition
represents an expression that evaluates to a boolean value. PostgreSQL provides the standard comparison operators to construct the conditions.
Operator |
Description |
---|---|
|
Equal |
|
Not Equal |
|
Not Equal |
|
Not Equal |
|
Greater than |
|
Greater than or equal |
|
Less than |
|
Less than or equal |
For example, let’s say we want to get all the customers whose first names are Jamie:
%%sql
SELECT last_name, first_name
FROM customer
WHERE first_name = 'Jamie';
* postgresql://bob:***@localhost:5432/dvdrental
2 rows affected.
last_name | first_name |
---|---|
Rice | Jamie |
Waugh | Jamie |
Apparently, we have two Jamies among our customer base.
Comparison Predicates#
On top of the comparison operator we’ve seen above, we can also use comparison predicates to build our conditions (boolean expressions). In this category, the most useful are the ones in the table below (check documentation for more):
Predicate |
Equivalent |
---|---|
|
|
|
|
|
|
|
|
If you are wondering why would you use IS DISTINCT FROM
instead of the simple !=
, that’s because the latter will return null (instead of a boolean value) when any of the operands is NULL
. For example, 7 != NULL
evaluates to NULL
instead of false!.
Logical Operators#
Let’s say we want to find Jamie Rice. PostgreSQL has available the standard logical operators AND
and OR
:
%%sql
SELECT last_name, first_name
FROM customer
WHERE first_name = 'Jamie' AND last_name = 'Rice';
* postgresql://bob:***@localhost:5432/dvdrental
1 rows affected.
last_name | first_name |
---|---|
Rice | Jamie |
Let’s say we found the wallet of a customer and we want to get her email, to let her know we have her wallet:
%%sql
SELECT email
FROM customer
WHERE first_name = 'Nancy'
AND last_name = 'Thomas';
* postgresql://bob:***@localhost:5432/dvdrental
1 rows affected.
nancy.thomas@sakilacustomer.org |
The count function#
The count function returns the number of rows returned by a SELECT
statement. The simplest way to use this function would be:
SELECT count(*)
FROM table;
That would return the total amount of rows in a table:
%%sql
SELECT count(*)
FROM customer;
* postgresql://bob:***@localhost:5432/dvdrental
1 rows affected.
count |
---|
599 |
We could also pass an expression to the count()
function; for example, let’s say we want to count the amount of different amounts that customers have paid:
%%sql
SELECT count(DISTINCT amount)
FROM payment;
* postgresql://bob:***@localhost:5432/dvdrental
1 rows affected.
count |
---|
19 |
ORDER BY#
When we query data from a table, PostgreSQL returns the rows in the order they were inserted. We can change that using the ORDER BY statement. For example, let’s say we want to order the customers by their first name:
%%sql
SELECT *
FROM customer
ORDER BY first_name
LIMIT 5;
* postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.
customer_id | store_id | first_name | last_name | address_id | activebool | create_date | last_update | active | |
---|---|---|---|---|---|---|---|---|---|
375 | 2 | Aaron | Selby | aaron.selby@sakilacustomer.org | 380 | True | 2006-02-14 | 2013-05-26 14:49:45.738000 | 1 |
367 | 1 | Adam | Gooch | adam.gooch@sakilacustomer.org | 372 | True | 2006-02-14 | 2013-05-26 14:49:45.738000 | 1 |
525 | 2 | Adrian | Clary | adrian.clary@sakilacustomer.org | 531 | True | 2006-02-14 | 2013-05-26 14:49:45.738000 | 1 |
217 | 2 | Agnes | Bishop | agnes.bishop@sakilacustomer.org | 221 | True | 2006-02-14 | 2013-05-26 14:49:45.738000 | 1 |
389 | 1 | Alan | Kahn | alan.kahn@sakilacustomer.org | 394 | True | 2006-02-14 | 2013-05-26 14:49:45.738000 | 1 |
As you can see, the names are sorted alphabetically, in ascending order (default). But we can change the default (ASC
) by using the DESC
clause:
%%sql
SELECT *
FROM customer
ORDER BY first_name DESC
LIMIT 5;
* postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.
customer_id | store_id | first_name | last_name | address_id | activebool | create_date | last_update | active | |
---|---|---|---|---|---|---|---|---|---|
479 | 1 | Zachary | Hite | zachary.hite@sakilacustomer.org | 484 | True | 2006-02-14 | 2013-05-26 14:49:45.738000 | 1 |
174 | 2 | Yvonne | Watkins | yvonne.watkins@sakilacustomer.org | 178 | True | 2006-02-14 | 2013-05-26 14:49:45.738000 | 1 |
190 | 2 | Yolanda | Weaver | yolanda.weaver@sakilacustomer.org | 194 | True | 2006-02-14 | 2013-05-26 14:49:45.738000 | 1 |
212 | 2 | Wilma | Richards | wilma.richards@sakilacustomer.org | 216 | True | 2006-02-14 | 2013-05-26 14:49:45.738000 | 1 |
219 | 2 | Willie | Howell | willie.howell@sakilacustomer.org | 223 | True | 2006-02-14 | 2013-05-26 14:49:45.738000 | 1 |
We could apply the ORDER BY
clause to several columns, for example:
%%sql
SELECT first_name, last_name
FROM customer
ORDER BY first_name ASC,
last_name DESC
LIMIT 5 OFFSET 325;
* postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.
first_name | last_name |
---|---|
Keith | Rico |
Kelly | Torres |
Kelly | Knott |
Ken | Prewitt |
Kenneth | Gooden |
In the query above, we’re pulling out all customers ordered by first_name
in ascending order then ordering the results by last_name
in descending order. As you can see, there are two customers named Kelly which are ordered by last_name
in descending order (change the order to ASC
to see the result).
In the example above, we’re selecting the
first_name
andlast_name
, but we could also be using the*
and still be able to sort by any column we wanted to (note that’s a PostgreSQL feature not present in other RDBMS like MySQL).
The IN operator#
Whenever we want to check if a value is included in a list of values, we use the IN operator. The syntax is as follows:
value IN (value1,value2,...)
Instead of a list of values, we could also use a subquery in the parentheses:
value IN (SELECT column_1 FROM table_name)
In any case, the whole thing evaluates to a boolean value (true
or false
).
For example, let’s say we want to see the countries of a few country_id
values we have:
%%sql
SELECT country_id, country
FROM country
WHERE country_id IN (16, 20, 23);
* postgresql://bob:***@localhost:5432/dvdrental
3 rows affected.
country_id | country |
---|---|
16 | Brunei |
20 | Canada |
23 | China |
As you can see, the IN
statement comes in handy, whenever we have to check for rows that are not in sequential order. Another way of writing the statement above would be:
%%sql
SELECT country_id, country
FROM country
WHERE country_id = 16
OR country_id = 20
OR country_id = 23;
* postgresql://bob:***@localhost:5432/dvdrental
3 rows affected.
country_id | country |
---|---|
16 | Brunei |
20 | Canada |
23 | China |
As you can imagine, the longest the list of values to check for, the most useful the IN
statement becomes. Also, the speed we get when using the IN
operator is better.
The LIKE operator#
The LIKE operator allows us to match patterns in strings. Some examples:
The expression
'abc' LIKE 'abc'
will evaluate totrue
The expression
'abc' LIKE 'a%'
will evaluate totrue
The expression
'abc' LIKE '_b_'
will evaluate totrue
The expression
'abc' LIKE 'c'
will evaluate tofalse
As you may have deducted from the examples above:
An underscore (
_
) in a pattern matches any single character.A percent sign (
%
) matches any sequence of zero or more characters.
For example, let’s say we want to find a customer by name, but we only remember that her first name ended in ley:
%%sql
SELECT first_name, last_name
FROM customer
WHERE first_name LIKE '%ley';
* postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.
first_name | last_name |
---|---|
Shirley | Allen |
Ashley | Richardson |
Stanley | Scroggins |
Bradley | Motley |
Wesley | Bull |
The
ILIKE
operator would allow us to match patterns in a case-insensitive way (LIKE
is case-sensitive).