Skip to main content

Subqueries

Subqueries are parenthesized query expressions that appear as part of a larger, outer query. Subqueries are usually based on SELECT ... FROM, but in SereneDB other query constructs such as PIVOT can also appear as a subquery.

Scalar Subquery

Scalar subqueries are subqueries that return a single value. They can be used anywhere where an expression can be used. If a scalar subquery returns more than a single value, an error is raised (unless scalar_subquery_error_on_multiple_rows is set to false, in which case a row is selected randomly).

Consider the following table:

Grades

gradecourse
7Math
9Math
8CS
Query
CREATE TABLE grades (grade INTEGER, course VARCHAR);
INSERT INTO grades VALUES (7, 'Math'), (9, 'Math'), (8, 'CS');

We can run the following query to obtain the minimum grade:

Query
SELECT min(grade) FROM grades;
Result
 min-----   7

By using a scalar subquery in the WHERE clause, we can figure out for which course this grade was obtained:

Query
SELECT course FROM grades WHERE grade = (SELECT min(grade) FROM grades);
Result
 course-------- Math

ARRAY Subqueries

Subqueries that return multiple values can be wrapped with ARRAY to collect all results in a list.

Query
SELECT ARRAY(SELECT grade FROM grades) AS all_grades;
Result
 all_grades------------ {7,9,8}

Subquery Comparisons: ALL, ANY and SOME

In the section on scalar subqueries, a scalar expression was compared directly to a subquery using the equality comparison operator (=). Such direct comparisons only make sense with scalar subqueries.

Scalar expressions can still be compared to single-column subqueries returning multiple rows by specifying a quantifier. Available quantifiers are ALL, ANY and SOME. The quantifiers ANY and SOME are equivalent.

ALL

The ALL quantifier specifies that the comparison as a whole evaluates to true when the individual comparison results of the expression at the left hand side of the comparison operator with each of the values from the subquery at the right hand side of the comparison operator all evaluate to true:

Query
SELECT 6 <= ALL (SELECT grade FROM grades) AS adequate;
Result
 adequate---------- t

because 6 is less than or equal to each of the subquery results 7, 8 and 9.

However, the following query

Query
SELECT 8 >= ALL (SELECT grade FROM grades) AS excellent;
Result
 excellent----------- f

because 8 is not greater than or equal to the subquery result 9. And thus, because not all comparisons evaluate to true, >= ALL as a whole evaluates to false.

ANY

The ANY quantifier specifies that the comparison as a whole evaluates to true when at least one of the individual comparison results evaluates to true. For example:

Query
SELECT 5 >= ANY (SELECT grade FROM grades) AS fail;
Result
 fail------ f

because no result of the subquery is less than or equal to 5.

The quantifier SOME may be used instead of ANY: ANY and SOME are interchangeable.

EXISTS

The EXISTS operator tests for the existence of any row inside the subquery. It returns either true when the subquery returns one or more records, and false otherwise. The EXISTS operator is generally the most useful as a correlated subquery to express semijoin operations. However, it can be used as an uncorrelated subquery as well.

For example, we can use it to figure out if there are any grades present for a given course:

Query
SELECT EXISTS (FROM grades WHERE course = 'Math') AS math_grades_present;
Result
 math_grades_present--------------------- t
Query
SELECT EXISTS (FROM grades WHERE course = 'History') AS history_grades_present;
Result
 history_grades_present------------------------ f

NOT EXISTS

The NOT EXISTS operator tests for the absence of any row inside the subquery. It returns either true when the subquery returns an empty result, and false otherwise. The NOT EXISTS operator is generally the most useful as a correlated subquery to express antijoin operations. For example, to find rows in Person with no matching row in interest:

Query
CREATE TABLE Person (id BIGINT, name VARCHAR);
CREATE TABLE interest (PersonId BIGINT, topic VARCHAR);
INSERT INTO Person VALUES (1, 'Jane'), (2, 'Joe');
INSERT INTO interest VALUES (2, 'Music');
SELECT *FROM PersonWHERE NOT EXISTS (FROM interest WHERE interest.PersonId = Person.id);
Result
 id | name----+------  1 | Jane

IN Operator

The IN operator checks containment of the left expression inside the result defined by the subquery or the set of expressions on the right hand side (RHS). The IN operator returns true if the expression is present in the RHS, false if the expression is not in the RHS and the RHS has no NULL values, or NULL if the expression is not in the RHS and the RHS has NULL values.

We can use the IN operator in a similar manner as we used the EXISTS operator:

Query
SELECT 'Math' IN (SELECT course FROM grades) AS math_grades_present;
Result
 math_grades_present--------------------- t

Correlated Subqueries

All the subqueries presented here so far have been uncorrelated subqueries, where the subqueries themselves are entirely self-contained and can be run without the parent query. There exists a second type of subqueries called correlated subqueries. For correlated subqueries, the subquery uses values from the parent subquery.

Conceptually, the subqueries are run once for every single row in the parent query. Perhaps a simple way of envisioning this is that the correlated subquery is a function that is applied to every row in the source dataset.

For example, suppose that we want to find the minimum grade for every course. We could do that as follows:

Query
SELECT *FROM grades grades_parentWHERE grade =    (SELECT min(grade)     FROM grades     WHERE grades.course = grades_parent.course);
Result
 grade | course-------+--------     7 | Math     8 | CS

The subquery uses a column from the parent query (grades_parent.course). Conceptually, we can see the subquery as a function where the correlated column is a parameter to that function. We can evaluate that function for every row by projecting the correlated subquery alongside the grade:

Query
SELECT    course,    grade,    (SELECT min(grade) FROM grades sub WHERE sub.course = g.course) AS course_minFROM grades gORDER BY course, grade;
Result
 course | grade | course_min--------+-------+------------ CS     |     8 |          8 Math   |     7 |          7 Math   |     9 |          7

For Math the function returns 7 and for CS it returns 8. Comparing each result against the grade in that row, the row (Math, 9) is filtered out by the query above, as 9 <> 7.

Returning Each Row of the Subquery as a Struct

Using the name of a subquery in the SELECT clause (without referring to a specific column) turns each row of the subquery into a struct whose fields correspond to the columns of the subquery. For example:

Query
SELECT tFROM (SELECT unnest(generate_series(41, 43)) AS x, 'hello' AS y) t;
Result
 t------------ (41,hello) (42,hello) (43,hello)