GROUPING SETS
GROUPING SETS, ROLLUP and CUBE can be used in the GROUP BY clause to perform a grouping over multiple dimensions within the same query.
Note that this syntax is not compatible with GROUP BY ALL.
Examples
Compute the average income along the provided four different dimensions:
-- the syntax () denotes the empty set (i.e., computing an ungrouped aggregate)SELECT city, street_name, avg(income)FROM addressesGROUP BY GROUPING SETS ((city, street_name), (city), (street_name), ()); city | street_name | avg----------+-------------+------- Lakeside | Birch Ln | 75000 Lakeside | NULL | 75000 Linden | Birch Ln | 60000 Linden | NULL | 55000 Linden | Spruce Blvd | 50000 Mallard | Lake View | 40000 Mallard | NULL | 65000 Mallard | Pond Road | 90000 NULL | Birch Ln | 70000 NULL | Lake View | 40000 NULL | NULL | 65000 NULL | Pond Road | 90000 NULL | Spruce Blvd | 50000Compute the average income along the same dimensions:
SELECT city, street_name, avg(income)FROM addressesGROUP BY CUBE (city, street_name); city | street_name | avg----------+-------------+------- Lakeside | Birch Ln | 75000 Lakeside | NULL | 75000 Linden | Birch Ln | 60000 Linden | NULL | 55000 Linden | Spruce Blvd | 50000 Mallard | Lake View | 40000 Mallard | NULL | 65000 Mallard | Pond Road | 90000 NULL | Birch Ln | 70000 NULL | Lake View | 40000 NULL | NULL | 65000 NULL | Pond Road | 90000 NULL | Spruce Blvd | 50000Compute the average income along the dimensions (city, street_name), (city) and ():
SELECT city, street_name, avg(income)FROM addressesGROUP BY ROLLUP (city, street_name); city | street_name | avg----------+-------------+------- Lakeside | Birch Ln | 75000 Lakeside | NULL | 75000 Linden | Birch Ln | 60000 Linden | NULL | 55000 Linden | Spruce Blvd | 50000 Mallard | Lake View | 40000 Mallard | NULL | 65000 Mallard | Pond Road | 90000 NULL | NULL | 65000Description
GROUPING SETS performs the same aggregate across different GROUP BY clauses in a single query.
CREATE TABLE students (course VARCHAR, type VARCHAR);
INSERT INTO students (course, type)VALUES ('CS', 'Bachelor'), ('CS', 'Bachelor'), ('CS', 'PhD'), ('Math', 'Masters'), ('CS', NULL), ('CS', NULL), ('Math', NULL);SELECT course, type, count(*)FROM studentsGROUP BY GROUPING SETS ((course, type), course, type, ()); course | type | count--------+----------+------- CS | Bachelor | 2 CS | NULL | 2 CS | NULL | 5 CS | PhD | 1 Math | Masters | 1 Math | NULL | 1 Math | NULL | 2 NULL | Bachelor | 2 NULL | Masters | 1 NULL | NULL | 3 NULL | NULL | 7 NULL | PhD | 1In the above query, we group across four different sets: course, type, course, type and () (the empty group). The result contains NULL for a group which is not in the grouping set for the result, i.e., the above query is equivalent to the following statement of UNION ALL clauses:
-- Group by course, type:SELECT course, type, count(*)FROM studentsGROUP BY course, typeUNION ALL-- Group by type:SELECT NULL AS course, type, count(*)FROM studentsGROUP BY typeUNION ALL-- Group by course:SELECT course, NULL AS type, count(*)FROM studentsGROUP BY courseUNION ALL-- Group by nothing:SELECT NULL AS course, NULL AS type, count(*)FROM students; course | type | count--------+----------+------- CS | Bachelor | 2 CS | NULL | 2 CS | NULL | 5 CS | PhD | 1 Math | Masters | 1 Math | NULL | 1 Math | NULL | 2 NULL | Bachelor | 2 NULL | Masters | 1 NULL | NULL | 3 NULL | NULL | 7 NULL | PhD | 1CUBE and ROLLUP are syntactic sugar to easily produce commonly used grouping sets.
The ROLLUP clause will produce all “sub-groups” of a grouping set, e.g., ROLLUP (country, city, zip) produces the grouping sets (country, city, zip), (country, city), (country), (). This can be useful for producing different levels of detail of a group by clause. This produces n+1 grouping sets where n is the amount of terms in the ROLLUP clause.
CUBE produces grouping sets for all combinations of the inputs, e.g., CUBE (country, city, zip) will produce (country, city, zip), (country, city), (country, zip), (city, zip), (country), (city), (zip), (). This produces 2^n grouping sets.
Identifying Grouping Sets with GROUPING_ID()
The super-aggregate rows generated by GROUPING SETS, ROLLUP and CUBE can often be identified by NULL-values returned for the respective column in the grouping. But if the columns used in the grouping can themselves contain actual NULL-values, then it can be challenging to distinguish whether the value in the resultset is a “real” NULL-value coming out of the data itself, or a NULL-value generated by the grouping construct. The GROUPING_ID() or GROUPING() function is designed to identify which groups generated the super-aggregate rows in the result.
GROUPING_ID() is an aggregate function that takes the column expressions that make up the grouping(s). It returns a BIGINT value. The return value is 0 for the rows that are not super-aggregate rows. But for the super-aggregate rows, it returns an integer value that identifies the combination of expressions that make up the group for which the super-aggregate is generated. At this point, an example might help. Consider the following query:
WITH days AS ( SELECT year("generate_series") AS y, quarter("generate_series") AS q, month("generate_series") AS m FROM generate_series(DATE '2023-01-01', DATE '2023-12-31', INTERVAL 1 DAY))SELECT y, q, m, GROUPING_ID(y, q, m) AS "grouping_id()"FROM daysGROUP BY GROUPING SETS ( (y, q, m), (y, q), (y), ())ORDER BY y, q, m; y | q | m | grouping_id()------+------+------+--------------- 2023 | 1 | 1 | 0 2023 | 1 | 2 | 0 2023 | 1 | 3 | 0 2023 | 1 | NULL | 1 2023 | 2 | 4 | 0 2023 | 2 | 5 | 0 2023 | 2 | 6 | 0 2023 | 2 | NULL | 1 2023 | 3 | 7 | 0 2023 | 3 | 8 | 0 2023 | 3 | 9 | 0 2023 | 3 | NULL | 1 2023 | 4 | 10 | 0 2023 | 4 | 11 | 0 2023 | 4 | 12 | 0 2023 | 4 | NULL | 1 2023 | NULL | NULL | 3 NULL | NULL | NULL | 7In this example, the lowest level of grouping is at the month level, defined by the grouping set (y, q, m). Result rows corresponding to that level are simply aggregate rows and the GROUPING_ID(y, q, m) function returns 0 for those. The grouping set (y, q) results in super-aggregate rows over the month level, leaving a NULL-value for the m column, and for which GROUPING_ID(y, q, m) returns 1. The grouping set (y) results in super-aggregate rows over the quarter level, leaving NULL-values for the m and q column, for which GROUPING_ID(y, q, m) returns 3. Finally, the () grouping set results in one super-aggregate row for the entire resultset, leaving NULL-values for y, q and m and for which GROUPING_ID(y, q, m) returns 7.
To understand the relationship between the return value and the grouping set, you can think of GROUPING_ID(y, q, m) writing to a bitfield, where the first bit corresponds to the last expression passed to GROUPING_ID(), the second bit to the one-but-last expression passed to GROUPING_ID(), and so on. This may become clearer by casting GROUPING_ID() to BIT:
WITH days AS ( SELECT year("generate_series") AS y, quarter("generate_series") AS q, month("generate_series") AS m FROM generate_series(DATE '2023-01-01', DATE '2023-12-31', INTERVAL 1 DAY))SELECT y, q, m, GROUPING_ID(y, q, m) AS "grouping_id(y, q, m)", right(GROUPING_ID(y, q, m)::BIT::VARCHAR, 3) AS "y_q_m_bits"FROM daysGROUP BY GROUPING SETS ( (y, q, m), (y, q), (y), ())ORDER BY y, q, m; y | q | m | grouping_id(y, q, m) | y_q_m_bits------+------+------+----------------------+------------ 2023 | 1 | 1 | 0 | 000 2023 | 1 | 2 | 0 | 000 2023 | 1 | 3 | 0 | 000 2023 | 1 | NULL | 1 | 001 2023 | 2 | 4 | 0 | 000 2023 | 2 | 5 | 0 | 000 2023 | 2 | 6 | 0 | 000 2023 | 2 | NULL | 1 | 001 2023 | 3 | 7 | 0 | 000 2023 | 3 | 8 | 0 | 000 2023 | 3 | 9 | 0 | 000 2023 | 3 | NULL | 1 | 001 2023 | 4 | 10 | 0 | 000 2023 | 4 | 11 | 0 | 000 2023 | 4 | 12 | 0 | 000 2023 | 4 | NULL | 1 | 001 2023 | NULL | NULL | 3 | 011 NULL | NULL | NULL | 7 | 111Note that the number of expressions passed to GROUPING_ID(), or the order in which they are passed is independent from the actual group definitions appearing in the GROUPING SETS-clause (or the groups implied by ROLLUP and CUBE). As long as the expressions passed to GROUPING_ID() are expressions that appear somewhere in the GROUPING SETS-clause, GROUPING_ID() will set a bit corresponding to the position of the expression whenever that expression is rolled up to a super-aggregate.