JOIN clause

Join is a fundamental relational operation used to horizontally join two tables or relations. Relationships are called the left and right sides of the join depending on how they are written in the JOIN clause. Each row of the result contains columns from both relations.

The join uses a rule to match pairs of rows from each relationship. Often this is a predicate, but other rules may be specified.

Syntax

The following syntax variants may be used:

SELECT ...
FROM <object_ref1> [
                     {
                       INNER
                       | { LEFT | RIGHT | FULL } [ OUTER ]
                     }
                   ]
                   JOIN <object_ref2>
  [ ON <condition> ]
[ ... ]
SELECT *
FROM <object_ref1> [
                     {
                       INNER
                       | { LEFT | RIGHT | FULL } [ OUTER ]
                     }
                   ]
                   JOIN <object_ref2>
  [ USING( <column_list> ) ]
[ ... ]
SELECT ...
FROM <object_ref1> [
                     {
                       | NATURAL [ { LEFT | RIGHT | FULL } [ OUTER ] ]
                       | CROSS
                     }
                   ]
                   JOIN <object_ref2>
[ ... ]

Parameters

  • <object_ref1> and <object_ref2> +. Each <object_ref> represents a table or table-like data source.


  • JOIN
    A JOIN keyword indicating that tables should be joined. JOIN is combined with other keywords (such as INNER or OUTER) to specify the type of join.


  • ON <condition>
    A boolean expression that specifies the conditions of the connection.
    For a detailed description, see Conditional connection.


  • USING <column_list>
    A list of columns by which the join is performed. The columns must be of the same name in the tables being joined.
    For a detailed description, see Conditional connection.

OUTER JOIN — external connection

Strings that have no matches can be returned if an OUTER join is specified. External joins can have one of the types:

  • LEFT (All rows from the left relationship occur at least once)

  • RIGHT (All rows from the right relationship occur at least once)

  • FULL (All strings from both relations occur at least once)

A connection that is not outer (OUTER) is inner (INNER) — only those strings that satisfy the condition are returned.

If an unpaired row is returned, the attributes of the other table are set to NULL.

CROSS JOIN — cross-join (Cartesian product)

The simplest type of join is the CROSS JOIN. There are no conditions for this join type and it simply returns all possible pairs.

Let’s return all pairs of strings:

SELECT a.*, b.*
FROM a
CROSS JOIN b;

This is equivalent to a query that simply omits the JOIN condition:

SELECT a.*, b.*
FROM a, b;

Conditional connection

Most conjunctions are specified by a predicate that connects attributes on one side to attributes on the other side. Conditions can be explicitly specified using the ON and WHERE operators.

Let’s show the use of conditional joins on the example of two tables — capitals with countries and their capitals and population with countries and their population (in millions).

CREATE TABLE capitals (cap_country VARCHAR, capital VARCHAR);
CREATE TABLE population (pop_country VARCHAR, population_mil BIGINT);
INSERT INTO capitals VALUES
    ('Russia', 'Moscow'),
    ('Italy', 'Rome'),
    ('Spain', 'Madrid'),
    ('France', 'Paris');
INSERT INTO population VALUES
    ('Russia', 143),
    ('Spain', 48),
    ('Brazil', 211);

Let’s output the countries with their capitals and populations using JOIN with the ON operator:

SELECT *
FROM capitals
JOIN population ON (cap_country = pop_country);
+-------------+---------+-------------+----------------+
| cap_country | capital | pop_country | population_mil |
+-------------+---------+-------------+----------------+
| Russia      | Moscow  | Russia      | 143            |
+-------------+---------+-------------+----------------+
| Spain       | Madrid  | Spain       | 48             |
+-------------+---------+-------------+----------------+

Now let’s output the same data using JOIN with the WHERE operator:

SELECT t1.*, t2.*
FROM capitals t1, population t2
WHERE t1.cap_country = t2.pop_country
+-------------+---------+-------------+----------------+
| cap_country | capital | pop_country | population_mil |
+-------------+---------+-------------+----------------+
| Russia      | Moscow  | Russia      | 143            |
+-------------+---------+-------------+----------------+
| Spain       | Madrid  | Spain       | 48             |
+-------------+---------+-------------+----------------+

Not only equality but also other predicates can be used with the WHERE operator.

If the names of columns in two tables are the same and the values in them must be equal, you can use a simpler syntax USING.

Let’s define the same tables, but with the same name columns country:

CREATE TABLE capitals (country VARCHAR, capital VARCHAR);
CREATE TABLE population (country VARCHAR, population_mil BIGINT);
INSERT INTO capitals VALUES
    ('Russia', 'Moscow'),
    ('Italy', 'Rome'),
    ('Spain', 'Madrid'),
    ('France', 'Paris');
INSERT INTO population VALUES
    ('Russia', 143),
    ('Spain', 48),
    ('Brazil', 211);

Now output the countries with their capital and population through a simpler query with USING:

SELECT *
FROM capitals
JOIN population USING (country);
+---------+---------+----------------+
| country | capital | population_mil |
+---------+---------+----------------+
| Russia  | Moscow  | 143            |
+---------+---------+----------------+
| Spain   | Madrid  | 48             |
+---------+---------+----------------+

NATURAL JOIN — natural connection

Natural joins join two tables based on columns with the same names.

Let’s show the use of this type of join on the example of the same pair of tables with the same name columns country.

To join tables based on their common column, execute the command:

SELECT *
FROM capitals
NATURAL JOIN population;
+---------+---------+---------+----------------+
| country | capital | country | population_mil |
+---------+---------+---------+----------------+
| Russia  | Moscow  | Russia  | 143            |
+---------+---------+---------+----------------+
| Spain   | Madrid  | Spain   | 48             |
+---------+---------+---------+----------------+

Note that only rows that had the same country attribute in both tables were included in the result.

We can perform a similar query using the JOIN clause with the USING keyword:

SELECT *
FROM capitals
JOIN population
USING (country);
+---------+---------+----------------+
| country | capital | population_mil |
+---------+---------+----------------+
| Russia  | Moscow  | 143            |
+---------+---------+----------------+
| Spain   | Madrid  | 48             |
+---------+---------+----------------+

Note that in this case the same-named columns from different tables will "collapse" into one.

SEMI JOIN and ANTI JOIN — semi-joins and anti-joins

Semi joins return rows from the left table that have at least one match in the right table.

Anti-joins return rows from the left table that have no matches in the right table.

When using semi-joins or anti-joins, the result will never contain more rows than the left table. Semi-joins provide the same logic as the IN operator. Anti-joins provide the same logic as the NOT IN operator, except that anti-joins ignore NULL values from the right table.

Example of a semi-join

Let’s derive a list of such country-capital pairs from the capitals table for which the country name is present in the population table:

SELECT *
FROM capitals
SEMI JOIN population
    USING (country);
+---------+---------+
| country | capital |
+---------+---------+
| Russia  | Moscow  |
+---------+---------+
| Spain   | Madrid  |
+---------+---------+

This request is equivalent to the following:

SELECT *
FROM capitals
WHERE country IN (SELECT country FROM population);
+---------+---------+
| country | capital |
+---------+---------+
| Russia  | Moscow  |
+---------+---------+
| Spain   | Madrid  |
+---------+---------+

Example of an anti-coupling

Let’s output a list of such country-capital pairs from the capitals table for which the country name is not in the population table:

SELECT *
FROM capitals
ANTI JOIN population
    USING (country);
+---------+---------+
| country | capital |
+---------+---------+
| Italy   | Rome    |
+---------+---------+
| France  | Paris   |
+---------+---------+

This request is equivalent to the following:

SELECT *
FROM capitals
WHERE country NOT IN
    (SELECT country FROM population WHERE country IS NOT NULL);
+---------+---------+
| country | capital |
+---------+---------+
| Italy   | Rome    |
+---------+---------+
| France  | Paris   |
+---------+---------+

Closed connection (Self-Join)

Tengri allows you to use a closed join (joining a table to itself) for all types of joins. Note that tables must be specified via aliases, using the same table name without aliases will cause an error:

CREATE TABLE t (x INTEGER);
SELECT * FROM t JOIN t USING(x);
Binder Error:
Duplicate alias "t" in query!

Adding aliases allows the request to be processed successfully:

CREATE TABLE t (num INTEGER);
SELECT * FROM t t1 JOIN t t2 USING(num);
+-----+
| num |
+-----+