import sqlalchemy
engine = sqlalchemy.create_engine('postgresql://bob:1234@localhost:5432/test_joins')
engine.execution_options(isolation_level="AUTOCOMMIT")
%load_ext sql
%sql $engine.url
'Connected: bob@test_joins'

Unions#

The UNION operator allows us to combine the results of two or more SELECT statements into a single result set. The general syntax for combining two SELECT expressions would be:

SELECT column_1, column_2
FROM table_1
UNION
SELECT column_1, column_2
FROM table_2;

There are a couple of restrictions when using unions:

  • Both queries must return the same number of columns.

  • The corresponding columns in the queries must have compatible data types.

Let’s see it in action with a couple of examples using the small database we used to explain joins.

%%sql
SELECT * name FROM "TableA"
UNION
SELECT * name FROM "TableB";
 * postgresql://bob:***@localhost:5432/test_joins
(psycopg2.errors.SyntaxError) syntax error at or near "name"
LINE 1: SELECT * name FROM "TableA"
                 ^

[SQL: SELECT * name FROM "TableA"
UNION
SELECT * name FROM "TableB";]
(Background on this error at: https://sqlalche.me/e/14/f405)

As you can see, since both tables have the same number of columns and the data types in them are equal, we get all the rows in both tables without issue.

Ordering the results#

By the way, the UNION operator may place the rows in the first query before, after or between the rows in the result set of the second query. That can easily be remedied using the ORDER BY clause:

%%sql
SELECT id, name FROM "TableA"
UNION
SELECT id, name FROM "TableB"
ORDER BY id;
 * postgresql://bob:***@localhost:5432/test_joins
8 rows affected.
id name
1 Rutabaga
1 Pirate
2 Pirate
2 Monkey
3 Ninja
3 Darth Vader
4 Spaguetti
4 Ninja
## UNION ALL
The `UNION` operator removes all duplicate rows from the results, for example:
  Cell In[4], line 2
    The `UNION` operator removes all duplicate rows from the results, for example:
        ^
SyntaxError: invalid syntax
%%sql
SELECT id name FROM "TableA"
UNION
SELECT id name FROM "TableB";
   postgresql://bob:***@localhost:5432/dvdrental
 * postgresql://bob:***@localhost:5432/test_joins
4 rows affected.
name
2
3
4
1

As you can see, we don’t get the rows for the id columns repeated. A way around that could be using UNION ALL:

%%sql
SELECT id name FROM "TableA"
UNION ALL
SELECT id name FROM "TableB";
   postgresql://bob:***@localhost:5432/dvdrental
 * postgresql://bob:***@localhost:5432/test_joins
8 rows affected.
name
1
2
3
4
1
2
3
4