Skip to main content

FROM / JOIN

The FROM clause specifies the source of the data on which the remainder of the query should operate. Logically, the FROM clause is where the query starts execution. The FROM clause can contain a single table, a combination of multiple tables that are joined together using JOIN clauses, or another SELECT query inside a subquery node. SereneDB also has an optional FROM-first syntax which enables you to also query without a SELECT statement.

Examples

Select all columns from the table called tbl:

Query
SELECT *FROM tbl;
Result
 id | name     | col1 | col2 | some column name | i | a  | b  | c  | number | number1 | number2----+----------+------+------+------------------+---+----+----+----+--------+---------+---------  1 | mark     |    1 |    2 | alpha            | 1 | 10 | 20 | 30 |     10 |     100 |    1000  3 | bookmark |    4 |    5 | beta             | 2 | 11 | 21 | 31 |     20 |     200 |    2000  7 | other    |    9 |   16 | gamma            | 3 | 12 | 22 | 32 |   NULL |     300 |    3000

Select all columns from the table using the FROM-first syntax:

Query
FROM tblSELECT *;
Result
 id | name     | col1 | col2 | some column name | i | a  | b  | c  | number | number1 | number2----+----------+------+------+------------------+---+----+----+----+--------+---------+---------  1 | mark     |    1 |    2 | alpha            | 1 | 10 | 20 | 30 |     10 |     100 |    1000  3 | bookmark |    4 |    5 | beta             | 2 | 11 | 21 | 31 |     20 |     200 |    2000  7 | other    |    9 |   16 | gamma            | 3 | 12 | 22 | 32 |   NULL |     300 |    3000

Select all columns using the FROM-first syntax and omitting the SELECT clause:

Query
FROM tbl;
Result
 id | name     | col1 | col2 | some column name | i | a  | b  | c  | number | number1 | number2----+----------+------+------+------------------+---+----+----+----+--------+---------+---------  1 | mark     |    1 |    2 | alpha            | 1 | 10 | 20 | 30 |     10 |     100 |    1000  3 | bookmark |    4 |    5 | beta             | 2 | 11 | 21 | 31 |     20 |     200 |    2000  7 | other    |    9 |   16 | gamma            | 3 | 12 | 22 | 32 |   NULL |     300 |    3000

Select all columns from the table called tbl through an alias tn:

Query
SELECT tn.*FROM tbl tn;
Result
 id | name     | col1 | col2 | some column name | i | a  | b  | c  | number | number1 | number2----+----------+------+------+------------------+---+----+----+----+--------+---------+---------  1 | mark     |    1 |    2 | alpha            | 1 | 10 | 20 | 30 |     10 |     100 |    1000  3 | bookmark |    4 |    5 | beta             | 2 | 11 | 21 | 31 |     20 |     200 |    2000  7 | other    |    9 |   16 | gamma            | 3 | 12 | 22 | 32 |   NULL |     300 |    3000

Use a prefix alias:

Query
SELECT tn.*FROM tn: tbl;
Result
 id | name     | col1 | col2 | some column name | i | a  | b  | c  | number | number1 | number2----+----------+------+------+------------------+---+----+----+----+--------+---------+---------  1 | mark     |    1 |    2 | alpha            | 1 | 10 | 20 | 30 |     10 |     100 |    1000  3 | bookmark |    4 |    5 | beta             | 2 | 11 | 21 | 31 |     20 |     200 |    2000  7 | other    |    9 |   16 | gamma            | 3 | 12 | 22 | 32 |   NULL |     300 |    3000

Select all columns from the table tbl in the schema schema_name:

Query
SELECT *FROM schema_name.tbl;
Result
 id | name     | col1 | col2 | some column name | i | a  | b  | c  | number | number1 | number2----+----------+------+------+------------------+---+----+----+----+--------+---------+---------  1 | mark     |    1 |    2 | alpha            | 1 | 10 | 20 | 30 |     10 |     100 |    1000  3 | bookmark |    4 |    5 | beta             | 2 | 11 | 21 | 31 |     20 |     200 |    2000  7 | other    |    9 |   16 | gamma            | 3 | 12 | 22 | 32 |   NULL |     300 |    3000

Select the column i from the table function range, where the first column of the range function is renamed to i:

Query
SELECT t.iFROM range(100) AS t(i);
Result
 i----  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

Select all columns from the CSV file called test.csv:

Query
SELECT *FROM 'test.csv';
Result
 id | name  | value----+-------+-------  1 | alpha |    10  2 | beta  |    20

Select all columns from a subquery:

Query
SELECT *FROM (SELECT * FROM tbl);
Result
 id | name     | col1 | col2 | some column name | i | a  | b  | c  | number | number1 | number2----+----------+------+------+------------------+---+----+----+----+--------+---------+---------  1 | mark     |    1 |    2 | alpha            | 1 | 10 | 20 | 30 |     10 |     100 |    1000  3 | bookmark |    4 |    5 | beta             | 2 | 11 | 21 | 31 |     20 |     200 |    2000  7 | other    |    9 |   16 | gamma            | 3 | 12 | 22 | 32 |   NULL |     300 |    3000

Select the entire row of the table as a struct:

Query
SELECT tFROM t;
Result
 t-------- (1,10) (2,20)

Select the entire row of the subquery as a struct (i.e., a single column):

Query
SELECT tFROM (SELECT unnest(generate_series(41, 43)) AS x, 'hello' AS y) t;
Result
 t------------ (41,hello) (42,hello) (43,hello)

Join two tables together:

Query
SELECT *FROM tblJOIN other_table  ON tbl.key = other_table.key;
Result
 key | value    | key | description-----+----------+-----+-------------   1 | left one |   1 | right one   2 | left two |   2 | right two

Select a 10% sample from a table:

Query
SELECT *FROM tblTABLESAMPLE 10%;

Select a sample of 10 rows from a table:

Query
SELECT *FROM tblTABLESAMPLE 10 ROWS;
Result
 id | name     | col1 | col2 | some column name | i | a  | b  | c  | number | number1 | number2----+----------+------+------+------------------+---+----+----+----+--------+---------+---------  1 | mark     |    1 |    2 | alpha            | 1 | 10 | 20 | 30 |     10 |     100 |    1000  3 | bookmark |    4 |    5 | beta             | 2 | 11 | 21 | 31 |     20 |     200 |    2000  7 | other    |    9 |   16 | gamma            | 3 | 12 | 22 | 32 |   NULL |     300 |    3000

Use the FROM-first syntax with WHERE clause and aggregation:

Query
FROM range(100) AS t(i)SELECT sum(t.i)WHERE i % 2 = 0;
Result
 sum------ 2450

Table Functions

Some functions in SereneDB return entire tables rather than individual values. These functions are accordingly called table functions and can be used with a FROM clause like regular table references. Examples include read_csv, read_parquet, range, generate_series, repeat, unnest, and glob (note that some of the examples here can be used as both scalar and table functions).

For example,

Query
SELECT *FROM 'test.csv';
Result
 id | name  | value----+-------+-------  1 | alpha |    10  2 | beta  |    20

is implicitly translated to a call of the read_csv table function:

Query
SELECT *FROM read_csv('test.csv');
Result
 id | name  | value----+-------+-------  1 | alpha |    10  2 | beta  |    20

All table functions support a WITH ORDINALITY suffix, which extends the returned table by an integer column ordinality that enumerates the generated rows starting at 1.

Query
SELECT *FROM read_csv('test.csv') WITH ORDINALITY;
Result
 id | name  | value | ordinality----+-------+-------+------------  1 | alpha |    10 |          1  2 | beta  |    20 |          2

Note that the same result could be achieved using the row_number window function. In the presence of joins, however, WITH ORDINALITY allows enumerating one side of the join instead of the final result set, without having to resort to sub-queries.

Joins

Joins are a fundamental relational operation used to connect two tables or relations horizontally. The relations are referred to as the left and right sides of the join based on how they are written in the join clause. Each result row has the columns from both relations.

A join uses a rule to match pairs of rows from each relation. Often this is a predicate, but there are other implied rules that may be specified.

Outer Joins

Rows that do not have any matches can still be returned if an OUTER join is specified. Outer joins can be one of:

  • LEFT (All rows from the left relation appear at least once)
  • RIGHT (All rows from the right relation appear at least once)
  • FULL (All rows from both relations appear at least once)

A join that is not OUTER is INNER (only rows that get paired are returned).

When an unpaired row is returned, the attributes from the other table are set to NULL.

Cross Product Joins (Cartesian Product)

The simplest type of join is a CROSS JOIN. There are no conditions for this type of join, and it just returns all the possible pairs.

Return all pairs of rows:

Query
SELECT a.*, b.*FROM aCROSS JOIN b;
Result
 id | x   | id | x----+-----+----+----  1 | 100 |  1 | 10  2 | 200 |  1 | 10  1 | 100 |  2 | 20  2 | 200 |  2 | 20

This is equivalent to omitting the JOIN clause:

Query
SELECT a.*, b.*FROM a, b;
Result
 id | x   | id | x----+-----+----+----  1 | 100 |  1 | 10  2 | 200 |  1 | 10  1 | 100 |  2 | 20  2 | 200 |  2 | 20

Conditional Joins

Most joins are specified by a predicate that connects attributes from one side to attributes from the other side. The conditions can be explicitly specified using an ON clause with the join (clearer) or implied by the WHERE clause (old-fashioned).

We use the l_regions and the l_nations tables from the TPC-H schema:

Query
CREATE TABLE l_regions (    r_regionkey INTEGER NOT NULL PRIMARY KEY,    r_name      CHAR(25) NOT NULL,    r_comment   VARCHAR(152));
CREATE TABLE l_nations (    n_nationkey INTEGER NOT NULL PRIMARY KEY,    n_name      CHAR(25) NOT NULL,    n_regionkey INTEGER NOT NULL,    n_comment   VARCHAR(152),    FOREIGN KEY (n_regionkey) REFERENCES l_regions(r_regionkey));

Return the regions for the nations:

Query
SELECT n.*, r.*FROM l_nations nJOIN l_regions r ON (n_regionkey = r_regionkey);
Result
 n_nationkey | n_name    | n_regionkey | n_comment                                                                                                   | r_regionkey | r_name  | r_comment-------------+-----------+-------------+-------------------------------------------------------------------------------------------------------------+-------------+---------+--------------------------------------------------------------------------------------------------------------------           0 | ALGERIA   |           0 |  haggle. carefully final deposits detect slyly agai                                                         |           0 | AFRICA  | lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to           1 | ARGENTINA |           1 | al foxes promise slyly according to the regular accounts. bold requests alon                                |           1 | AMERICA | hs use ironic, even requests. s           2 | BRAZIL    |           1 | y alongside of the pending deposits. carefully special packages are about the ironic forges. slyly special  |           1 | AMERICA | hs use ironic, even requests. s           3 | CHINA     |           2 | even packages wake furiously alongside of the regular courts. final requests haggle carefully.              |           2 | ASIA    | ges. thinly even pinto beans ca

If the column names are the same and are required to be equal, then the simpler USING syntax can be used:

Query
CREATE TABLE l_regions (regionkey INTEGER NOT NULL PRIMARY KEY,                        name      CHAR(25) NOT NULL,                        comment   VARCHAR(152));
CREATE TABLE l_nations (nationkey INTEGER NOT NULL PRIMARY KEY,                        name      CHAR(25) NOT NULL,                        regionkey INTEGER NOT NULL,                        comment   VARCHAR(152),                        FOREIGN KEY (regionkey) REFERENCES l_regions(regionkey));

Return the regions for the nations:

Query
SELECT n.*, r.*FROM l_nations nJOIN l_regions r USING (regionkey);
Result
 nationkey | name      | regionkey | comment                                                                                                     | regionkey | name    | comment-----------+-----------+-----------+-------------------------------------------------------------------------------------------------------------+-----------+---------+--------------------------------------------------------------------------------------------------------------------         0 | ALGERIA   |         0 |  haggle. carefully final deposits detect slyly agai                                                         |         0 | AFRICA  | lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to         1 | ARGENTINA |         1 | al foxes promise slyly according to the regular accounts. bold requests alon                                |         1 | AMERICA | hs use ironic, even requests. s         2 | BRAZIL    |         1 | y alongside of the pending deposits. carefully special packages are about the ironic forges. slyly special  |         1 | AMERICA | hs use ironic, even requests. s         3 | CHINA     |         2 | even packages wake furiously alongside of the regular courts. final requests haggle carefully.              |         2 | ASIA    | ges. thinly even pinto beans ca

The expressions do not have to be equalities – any predicate can be used:

Return the pairs of jobs where one ran longer but cost less:

Query
SELECT s1.t_id, s2.t_idFROM west s1, west s2WHERE s1.time > s2.time  AND s1.cost < s2.cost;
Result
 t_id | t_id------+------    2 |    1    4 |    1    4 |    2    4 |    3

Natural Joins

Natural joins join two tables based on attributes that share the same name.

For example, take the following example with cities, airport codes and airport names. Note that both tables are intentionally incomplete, i.e., they do not have a matching pair in the other table.

Query
CREATE TABLE city_airport (city_name VARCHAR, iata VARCHAR);
CREATE TABLE airport_names (iata VARCHAR, airport_name VARCHAR);
INSERT INTO city_airport VALUES    ('Amsterdam', 'AMS'),    ('Rotterdam', 'RTM'),    ('Eindhoven', 'EIN'),    ('Groningen', 'GRQ');
INSERT INTO airport_names VALUES    ('AMS', 'Amsterdam Airport Schiphol'),    ('RTM', 'Rotterdam The Hague Airport'),    ('MST', 'Maastricht Aachen Airport');

To join the tables on their shared IATA attributes, run:

Query
SELECT *FROM city_airportNATURAL JOIN airport_names;
Result
 iata | city_name | airport_name------+-----------+----------------------------- AMS  | Amsterdam | Amsterdam Airport Schiphol RTM  | Rotterdam | Rotterdam The Hague Airport

Note that only rows where the same iata attribute was present in both tables were included in the result.

We can also express this query using the vanilla JOIN clause with the USING keyword:

Query
SELECT *FROM city_airportJOIN airport_namesUSING (iata);
Result
 iata | city_name | airport_name------+-----------+----------------------------- AMS  | Amsterdam | Amsterdam Airport Schiphol RTM  | Rotterdam | Rotterdam The Hague Airport

Semi and Anti Joins

Semi joins return rows from the left table that have at least one match in the right table. Anti joins return rows from the left table that have no matches in the right table. When using a semi or anti join the result will never have more rows than the left hand side table. Semi joins provide the same logic as the IN operator statement. Anti joins provide the same logic as the NOT IN operator, except anti joins ignore NULL values from the right table.

Semi Join Example

Return a list of city–airport code pairs from the city_airport table where the airport name is available in the airport_names table:

Query
SELECT *FROM city_airportSEMI JOIN airport_names    USING (iata);
Result
 iata | city_name------+----------- AMS  | Amsterdam RTM  | Rotterdam

This query is equivalent to:

Query
SELECT *FROM city_airportWHERE iata IN (SELECT iata FROM airport_names);
Result
 city_name | iata-----------+------ Amsterdam | AMS Rotterdam | RTM

Anti Join Example

Return a list of city–airport code pairs from the city_airport table where the airport name is not available in the airport_names table:

Query
SELECT *FROM city_airportANTI JOIN airport_names    USING (iata);
Result
 iata | city_name------+----------- EIN  | Eindhoven GRQ  | Groningen

This query is equivalent to:

Query
SELECT *FROM city_airportWHERE iata NOT IN (SELECT iata FROM airport_names WHERE iata IS NOT NULL);
Result
 city_name | iata-----------+------ Eindhoven | EIN Groningen | GRQ

Lateral Joins

The LATERAL keyword allows subqueries in the FROM clause to refer to previous subqueries. This feature is also known as a lateral join.

Query
SELECT *FROM range(3) t(i), LATERAL (SELECT i + 1) t2(j);
Result
 i | j---+--- 0 | 1 1 | 2 2 | 3

Lateral joins are a generalization of correlated subqueries, as they can return multiple values per input value rather than only a single value.

Query
SELECT *FROM    generate_series(0, 1) t(i),    LATERAL (SELECT i + 10 UNION ALL SELECT i + 100) t2(j);
Result
 i | j---+----- 0 |  10 0 | 100 1 | 101 1 |  11

It may be helpful to think about LATERAL as a loop where we iterate through the rows of the first subquery and use it as input to the second (LATERAL) subquery. In the examples above, we iterate through table t and refer to its column i from the definition of table t2. The rows of t2 form column j in the result.

It is possible to refer to multiple attributes from the LATERAL subquery. Using the table from the first example:

Query
CREATE TABLE t1 AS    SELECT *    FROM range(3) t(i), LATERAL (SELECT i + 1) t2(j);
SELECT *    FROM t1, LATERAL (SELECT i + j) t2(k)    ORDER BY ALL;
Result
 i | j | k---+---+--- 0 | 1 | 1 1 | 2 | 3 2 | 3 | 5

Positional Joins

When working with data frames or other embedded tables of the same size, the rows may have a natural correspondence based on their physical order. In scripting languages, this is easily expressed using a loop:

for (i = 0; i < n; i++) {
f(t1.a[i], t2.b[i]);
}

It is difficult to express this in standard SQL because relational tables are not ordered, but imported tables such as data frames or disk files (like CSVs or Parquet files) do have a natural ordering.

Connecting them using this ordering is called a positional join:

Query
CREATE TABLE t1 (x INTEGER);
CREATE TABLE t2 (s VARCHAR);
INSERT INTO t1 VALUES (1), (2), (3);
INSERT INTO t2 VALUES ('a'), ('b');
SELECT *FROM t1POSITIONAL JOIN t2;
Result
 x | s---+------ 1 | a 2 | b 3 | NULL

Positional joins are always FULL OUTER joins, i.e., the resulting table has the length of the longer input table and the missing entries are filled with NULL values.

As-Of Joins

A common operation when working with temporal or similarly-ordered data is to find the nearest (first) event in a reference table (such as prices). This is called an as-of join:

Attach prices to stock trades:

Query
SELECT t.*, p.priceFROM trades tASOF JOIN prices p       ON t.symbol = p.symbol AND t.when >= p.when;
Result
 symbol | when                | quantity | price--------+---------------------+----------+------- AAPL   | 2025-01-01 10:00:00 |        5 |   100 MSFT   | 2025-01-01 10:15:00 |        2 |   200

The ASOF join requires at least one inequality condition on the ordering field. The inequality can be any inequality condition (>=, >, <=, <) on any data type, but the most common form is >= on a temporal type. Any other conditions must be equalities (or NOT DISTINCT). This means that the left/right order of the tables is significant.

ASOF joins each left side row with at most one right side row. It can be specified as an OUTER join to find unpaired rows (e.g., trades without prices or prices which have no trades.)

Attach prices or NULLs to stock trades:

Query
SELECT *FROM trades tASOF LEFT JOIN prices p            ON t.symbol = p.symbol           AND t.when >= p.when;
Result
 symbol | when                | quantity | symbol | when                | price | ticker--------+---------------------+----------+--------+---------------------+-------+-------- AAPL   | 2025-01-01 10:00:00 |        5 | AAPL   | 2025-01-01 09:30:00 |   100 | AAPL MSFT   | 2025-01-01 10:15:00 |        2 | MSFT   | 2025-01-01 09:45:00 |   200 | MSFT

ASOF joins can also specify join conditions on matching column names with the USING syntax, but the last attribute in the list must be the inequality, which will be greater than or equal to (>=):

Query
SELECT *FROM trades tASOF JOIN prices p USING (symbol, "when");
Result
 symbol | when                | quantity | price | ticker--------+---------------------+----------+-------+-------- AAPL   | 2025-01-01 10:00:00 |        5 |   100 | AAPL MSFT   | 2025-01-01 10:15:00 |        2 |   200 | MSFT

Returns symbol, trades.when, price (but NOT prices.when):

If you combine USING with a SELECT * like this, the query will return the left side (probe) column values for the matches, not the right side (build) column values. To get the prices times in the example, you will need to list the columns explicitly:

Query
SELECT t.symbol, t.when AS trade_when, p.when AS price_when, priceFROM trades tASOF LEFT JOIN prices p USING (symbol, "when");
Result
 symbol | trade_when          | price_when          | price--------+---------------------+---------------------+------- AAPL   | 2025-01-01 10:00:00 | 2025-01-01 09:30:00 |   100 MSFT   | 2025-01-01 10:15:00 | 2025-01-01 09:45:00 |   200

Self-Joins

SereneDB allows self-joins for all types of joins. Note that tables need to be aliased, using the same table name without aliases will result in an error:

Query
CREATE TABLE t (x INTEGER);
SELECT * FROM t JOIN t USING(x);
Result
db error: ERROR: Ambiguous reference to table "(varies).public.t" (duplicate alias "(varies).public.t", explicitly alias one of the tables using "AS my_alias")

Adding the aliases allows the query to parse successfully:

Query
SELECT * FROM t AS t1 JOIN t AS t2 USING(x);
Result
x

Shorthands in the JOIN Clause

You can specify column names in the JOIN clause:

Query
CREATE TABLE t1 (x INTEGER);
CREATE TABLE t2 (y INTEGER);
INSERT INTO t1 VALUES (1), (2), (4);
INSERT INTO t2 VALUES (2), (3);
SELECT * FROM t1 NATURAL JOIN t2 t2(x);
Result
 x--- 2

You can also use the VALUES clause in the JOIN clause:

Query
SELECT * FROM t1 NATURAL JOIN (VALUES (2), (4)) _(x);
Result
 x--- 2 4

FROM-First Syntax

SereneDB's SQL supports the FROM-first syntax, i.e., it allows putting the FROM clause before the SELECT clause or completely omitting the SELECT clause. We use the following example to demonstrate it:

Query
CREATE TABLE tbl AS    SELECT *    FROM (VALUES ('a'), ('b')) t1(s), range(1, 3) t2(i);

FROM-First Syntax with a SELECT Clause

The following statement demonstrates the use of the FROM-first syntax:

Query
FROM tblSELECT i, s;
Result
 i | s---+--- 1 | a 1 | b 2 | a 2 | b

This is equivalent to:

Query
SELECT i, sFROM tbl;
Result
 i | s---+--- 1 | a 1 | b 2 | a 2 | b

FROM-First Syntax without a SELECT Clause

The following statement demonstrates the use of the optional SELECT clause:

Query
FROM tbl;
Result
 s | i---+--- a | 1 b | 1 a | 2 b | 2

This is equivalent to:

Query
SELECT *FROM tbl;
Result
 s | i---+--- a | 1 b | 1 a | 2 b | 2

Syntax