The SELECT statement#

Before going into details, let’s create a connection to our database:

import sqlalchemy
engine = sqlalchemy.create_engine('postgresql://bob:1234@localhost:5432/dvdrental')
engine.execution_options(isolation_level="AUTOCOMMIT")
%load_ext sql
%sql $engine.url

'Connected: bob@dvdrental'

Querying All columns#

Most of the times, the SELECT statement is used to fetch rows from a given table. To query data from all the columns in a given table we would use the following syntax:

SELECT * FROM table_name;

If for some reason, you don’t remember the names of the tables in a given database, you can use the \dt meta-command. Or if you’re using pgAdmin, you can right-click on sql-lab > Databases > dvdrental > Schemas > public > Tables.

In the query above we’re using the * special character. For example, let’s pull all the columns from the actor table:

%%sql
SELECT * FROM actor LIMIT 5;

 * postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.

actor_id	first_name	last_name	last_update
1	Penelope	Guiness	2013-05-26 14:47:57.620000
2	Nick	Wahlberg	2013-05-26 14:47:57.620000
3	Ed	Chase	2013-05-26 14:47:57.620000
4	Jennifer	Davis	2013-05-26 14:47:57.620000
5	Johnny	Lollobrigida	2013-05-26 14:47:57.620000

By the way, in the query above, we can see we’re using the LIMIT statement in order to limit the number of rows in the result.

Querying Some Columns#

For tables tables with small amount of columns, using the * in the SELECT statement is just fine. But for tables with a lot of columns, this is not a good practice. It’d increase the amount of data that our application has to deal with, slowing it down. It’s better to be specific about what columns we’re interested in pulling out. The general syntax for doing that is quite simple:

SELECT column1, column2 FROM table_name;

As you can see, we have to separate the column names by a comma (,). For example, let’s say we want to pull out the first_name and last_name of the first five actors in the actor table:

%%sql
SELECT first_name, last_name FROM actor LIMIT 5;

 * postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.

first_name	last_name
Penelope	Guiness
Nick	Wahlberg
Ed	Chase
Jennifer	Davis
Johnny	Lollobrigida

A trick to get the Column Names#

Often times, we won’t remember the names of the columns in a given table. In that case, the following query may come in handy:

%%sql
SELECT * FROM actor WHERE false;

 * postgresql://bob:***@localhost:5432/dvdrental
0 rows affected.

actor_id	first_name	last_name	last_update

Don’t worry too much right now about the meaning of the WHERE clause ;-)

The DISTINCT clause#

Another useful clause we can use with the WHERE statement is the DISTINCT clause. For example, let’s say we’re interested in the rate for renting movies; if we do:

%%sql
SELECT rental_rate FROM film LIMIT 5;

 * postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.

rental_rate
4.99
4.99
4.99
4.99
0.99

We can see that among the first movies, we have different rates (4.99 and 0.99) but we don’t want to check all of the rows to see how many different rates are available. To easily find out how many different rates exist in this table, we could add a little tweak to our last statemement:

%%sql
SELECT DISTINCT rental_rate FROM film;

 * postgresql://bob:***@localhost:5432/dvdrental
3 rows affected.

rental_rate
2.99
4.99
0.99

As we can see, now it’s clear that there are only three type of rates. Another example, imagine we want to know how many movie ratings are available in the USA:

%%sql
SELECT DISTINCT rating FROM film;

 * postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.

rating
G
PG
PG-13
R
NC-17

The WHERE clause#

Sometimes we want to filter the results of our query to the rows that satisfy a certain condition. The WHERE clause is use to specify that condition. This is the general syntax for it:

SELECT column_1, column_2
FROM table_name
WHERE condition;

Comparison Operators#

As you can see, the WHERE clause appears after the FROM clause of the SELECT statement. condition represents an expression that evaluates to a boolean value. PostgreSQL provides the standard comparison operators to construct the conditions.

Operator	Description
`=`	Equal
`!=`	Not Equal
`<>`	Not Equal
`!=`	Not Equal
`>`	Greater than
`>=`	Greater than or equal
`<`	Less than
`<=`	Less than or equal

For example, let’s say we want to get all the customers whose first names are Jamie:

%%sql
SELECT last_name, first_name
FROM customer
WHERE first_name = 'Jamie';

 * postgresql://bob:***@localhost:5432/dvdrental
2 rows affected.

last_name	first_name
Rice	Jamie
Waugh	Jamie

Apparently, we have two Jamies among our customer base.

Comparison Predicates#

On top of the comparison operator we’ve seen above, we can also use comparison predicates to build our conditions (boolean expressions). In this category, the most useful are the ones in the table below (check documentation for more):

Predicate	Equivalent
`val BETWEEN low AND hi`	`WHERE val >= low AND val <= hi`
`val NOT BETWEEN low AND hi`	`WHERE val < low OR val > hi`
`valA IS DISTINCT FROM valb`	`valA != valb`
`valA IS NOT DISTINCT FROM valb`	`valA = valb`

If you are wondering why would you use IS DISTINCT FROM instead of the simple !=, that’s because the latter will return null (instead of a boolean value) when any of the operands is NULL. For example, 7 != NULL evaluates to NULL instead of false!.

Logical Operators#

Let’s say we want to find Jamie Rice. PostgreSQL has available the standard logical operators AND and OR:

%%sql
SELECT last_name, first_name
FROM customer
WHERE first_name = 'Jamie' AND last_name = 'Rice';

 * postgresql://bob:***@localhost:5432/dvdrental
1 rows affected.

last_name	first_name
Rice	Jamie

Let’s say we found the wallet of a customer and we want to get her email, to let her know we have her wallet:

%%sql
SELECT email
FROM customer
WHERE first_name = 'Nancy'
  AND last_name = 'Thomas';

 * postgresql://bob:***@localhost:5432/dvdrental
1 rows affected.

email
nancy.thomas@sakilacustomer.org

The count function#

The count function returns the number of rows returned by a SELECT statement. The simplest way to use this function would be:

SELECT count(*)
FROM table;

That would return the total amount of rows in a table:

%%sql
SELECT count(*)
FROM customer;

 * postgresql://bob:***@localhost:5432/dvdrental
1 rows affected.

count
599

We could also pass an expression to the count() function; for example, let’s say we want to count the amount of different amounts that customers have paid:

%%sql
SELECT count(DISTINCT amount)
FROM payment;

 * postgresql://bob:***@localhost:5432/dvdrental
1 rows affected.

count
19

ORDER BY#

When we query data from a table, PostgreSQL returns the rows in the order they were inserted. We can change that using the ORDER BY statement. For example, let’s say we want to order the customers by their first name:

%%sql
SELECT *
FROM customer
ORDER BY first_name
LIMIT 5;

 * postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.

customer_id	store_id	first_name	last_name	email	address_id	activebool	create_date	last_update	active
375	2	Aaron	Selby	aaron.selby@sakilacustomer.org	380	True	2006-02-14	2013-05-26 14:49:45.738000	1
367	1	Adam	Gooch	adam.gooch@sakilacustomer.org	372	True	2006-02-14	2013-05-26 14:49:45.738000	1
525	2	Adrian	Clary	adrian.clary@sakilacustomer.org	531	True	2006-02-14	2013-05-26 14:49:45.738000	1
217	2	Agnes	Bishop	agnes.bishop@sakilacustomer.org	221	True	2006-02-14	2013-05-26 14:49:45.738000	1
389	1	Alan	Kahn	alan.kahn@sakilacustomer.org	394	True	2006-02-14	2013-05-26 14:49:45.738000	1

As you can see, the names are sorted alphabetically, in ascending order (default). But we can change the default (ASC) by using the DESC clause:

%%sql
SELECT *
FROM customer
ORDER BY first_name DESC
LIMIT 5;

 * postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.

customer_id	store_id	first_name	last_name	email	address_id	activebool	create_date	last_update	active
479	1	Zachary	Hite	zachary.hite@sakilacustomer.org	484	True	2006-02-14	2013-05-26 14:49:45.738000	1
174	2	Yvonne	Watkins	yvonne.watkins@sakilacustomer.org	178	True	2006-02-14	2013-05-26 14:49:45.738000	1
190	2	Yolanda	Weaver	yolanda.weaver@sakilacustomer.org	194	True	2006-02-14	2013-05-26 14:49:45.738000	1
212	2	Wilma	Richards	wilma.richards@sakilacustomer.org	216	True	2006-02-14	2013-05-26 14:49:45.738000	1
219	2	Willie	Howell	willie.howell@sakilacustomer.org	223	True	2006-02-14	2013-05-26 14:49:45.738000	1

We could apply the ORDER BY clause to several columns, for example:

%%sql
SELECT first_name, last_name
FROM customer
ORDER BY first_name ASC,
         last_name DESC
LIMIT 5 OFFSET 325;

 * postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.

first_name	last_name
Keith	Rico
Kelly	Torres
Kelly	Knott
Ken	Prewitt
Kenneth	Gooden

In the query above, we’re pulling out all customers ordered by first_name in ascending order then ordering the results by last_name in descending order. As you can see, there are two customers named Kelly which are ordered by last_name in descending order (change the order to ASC to see the result).

In the example above, we’re selecting the first_name and last_name, but we could also be using the * and still be able to sort by any column we wanted to (note that’s a PostgreSQL feature not present in other RDBMS like MySQL).

The IN operator#

Whenever we want to check if a value is included in a list of values, we use the IN operator. The syntax is as follows:

value IN (value1,value2,...)

Instead of a list of values, we could also use a subquery in the parentheses:

value IN (SELECT column_1 FROM table_name)

In any case, the whole thing evaluates to a boolean value (true or false).

For example, let’s say we want to see the countries of a few country_id values we have:

%%sql
SELECT country_id, country
FROM country
WHERE country_id IN (16, 20, 23);

 * postgresql://bob:***@localhost:5432/dvdrental
3 rows affected.

country_id	country
16	Brunei
20	Canada
23	China

As you can see, the IN statement comes in handy, whenever we have to check for rows that are not in sequential order. Another way of writing the statement above would be:

%%sql
SELECT country_id, country
FROM country
WHERE country_id = 16
   OR country_id = 20
   OR country_id = 23;

 * postgresql://bob:***@localhost:5432/dvdrental
3 rows affected.

country_id	country
16	Brunei
20	Canada
23	China

As you can imagine, the longest the list of values to check for, the most useful the IN statement becomes. Also, the speed we get when using the IN operator is better.

The LIKE operator#

The LIKE operator allows us to match patterns in strings. Some examples:

The expression 'abc' LIKE 'abc' will evaluate to true
The expression 'abc' LIKE 'a%' will evaluate to true
The expression 'abc' LIKE '_b_' will evaluate to true
The expression 'abc' LIKE 'c' will evaluate to false

As you may have deducted from the examples above:

An underscore (_) in a pattern matches any single character.
A percent sign (%) matches any sequence of zero or more characters.

For example, let’s say we want to find a customer by name, but we only remember that her first name ended in ley:

%%sql
SELECT first_name, last_name
FROM customer
WHERE first_name LIKE '%ley';

 * postgresql://bob:***@localhost:5432/dvdrental
5 rows affected.

first_name	last_name
Shirley	Allen
Ashley	Richardson
Stanley	Scroggins
Bradley	Motley
Wesley	Bull

The ILIKE operator would allow us to match patterns in a case-insensitive way (LIKE is case-sensitive).

SQL Lab

The SELECT statement

Contents