Skip to main content

Geometry

NameDescription
GEOMETRYGeospatial entity

The GEOMETRY data type is used to store and manipulate geometric objects, such as points, lines, and polygons.

Types of Geometries

Conceptually, the GEOMETRY type follows the core data model defined in the Simple Features standard, which is widely used in geospatial databases and GIS software. A GEOMETRY value can therefore represent 7 types of shapes:

Geometry TypeDescription
PointA single location in space, defined by its coordinates (e.g., longitude and latitude).
LineStringA sequence of points connected by straight lines, representing a path or route.
PolygonA set of closed rings defined by a sequence of points, representing an area such as a country border or a building footprint. The first ring is the "shell", and "interior" rings represent holes in the polygon.
MultiPointA collection of points.
MultiLineStringA collection of LineStrings.
MultiPolygonA collection of Polygons.
GeometryCollectionA collection of different geometry types, allowing for complex geometries that combine points, lines, and polygons or even other nested geometry collections.

The textual representation of geometries uses "Well-Known Text" (WKT) format. Geometries can be cast to and from WKT strings, so you can use string literals to create geometries directly in SQL statements.

In the following example, we create a GEOMETRY column with the 7 different types of supported geometries:

Query
CREATE TABLE geometries (    id INTEGER,    geom GEOMETRY);
INSERT INTO geometries VALUES  (1, 'POINT (30 10)'),  (2, 'LINESTRING (30 10, 10 30, 40 40)'),  (3, 'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'),  (4, 'MULTIPOINT ((10 40), (40 30), (20 20), (30 10))'),  (5, 'MULTILINESTRING ((10 10, 20 20, 10 40), (40 40, 30 30, 40 20))'),  (6, 'MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10,15 5)))'),  (7, 'GEOMETRYCOLLECTION (POINT(40 10), LINESTRING(10 10,20 20,10 40), POLYGON((40 40,20 45,45 30,40 40)))');
SELECT id, geom::TEXT AS geom FROM geometries ORDER BY id;
Result
 id | geom----+--------------------------------------------------------------------------------------------------------------  1 | POINT (30 10)  2 | LINESTRING (30 10, 10 30, 40 40)  3 | POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))  4 | MULTIPOINT (10 40, 40 30, 20 20, 30 10)  5 | MULTILINESTRING ((10 10, 20 20, 10 40), (40 40, 30 30, 40 20))  6 | MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))  7 | GEOMETRYCOLLECTION (POINT (40 10), LINESTRING (10 10, 20 20, 10 40), POLYGON ((40 40, 20 45, 45 30, 40 40)))

Multi-Dimensional Geometries

The GEOMETRY type is primarily used to model shapes in two dimensions (e.g. X/Y or longitude/latitude), but it also supports shapes with additional vertex dimensions such as Z for elevation or M for "measure", or both.

The vertex dimensions of a GEOMETRY value must be consistent across all vertices. For example, if one vertex has X, Y, and Z coordinates, then all other vertices in that geometry must also have X, Y, and Z coordinates. This means that you cannot have a mix of 2D and 3D vertices within the same geometry. This also applies for collections of geometries, such as MULTIPOINT or GEOMETRYCOLLECTION, where all geometries within the collection must have the same vertex dimensions.

Functions that operate on GEOMETRY values typically ignore any additional dimensions beyond the X and Y unless explicitly specified, but they can still be stored and can be retrieved if needed.

In the following example, we create a GEOMETRY table with 2D, 3D(Z), 3D(M) and 4D(ZM) points:

Query
CREATE TABLE points (    id INTEGER,    geom GEOMETRY);
INSERT INTO points VALUES  (1, 'POINT (30 10)'),  (2, 'POINT Z (30 10 5)'),  (3, 'POINT M (30 10 1)'),  (4, 'POINT ZM (30 10 5 1)');
SELECT id, geom::TEXT AS geom FROM points ORDER BY id;
-- But we cannot mix different vertex dimensions within the same geometry!INSERT INTO points VALUES  (5, 'MULTIPOINT (POINT (30 10), POINT Z (30 10 5))');
Result
 id | geom----+----------------------  1 | POINT (30 10)  2 | POINT Z (30 10 5)  3 | POINT M (30 10 1)  4 | POINT ZM (30 10 5 1)
db error: ERROR: Failed to parse geometry: Expected number at offset 12

Empty Geometries

Geometries can also be "empty" (e.g., POINT EMPTY, LINESTRING EMPTY, MULTIPOLYGON EMPTY, etc.) which means they don't contain any vertices. Empty geometries are still valid geometries and can be used in spatial operations, but they are mostly useful for representing the result of topological operations that don't have a valid geometrical representation (e.g., the intersection of two non-overlapping geometries is an empty geometry).

Geometry Storage

Shredding and Compression

The GEOMETRY type supports a storage optimization called "shredding", which improves compression for geometry columns where all values share the same geometry type and vertex dimensions.

When a row group qualifies, SereneDB splits the geometry segment within the row group into primitive STRUCT, LIST, and DOUBLE segments that can be compressed independently using lightweight algorithms - far more efficiently than storing variable-size binary blobs.

The shredded layout depends on the geometry type:

  • POINT - STRUCT(X DOUBLE, Y DOUBLE) (and/or Z, M)
  • LINESTRING - STRUCT(X DOUBLE, Y DOUBLE)[]
  • POLYGON - STRUCT(X DOUBLE, Y DOUBLE)[][]
  • MULTIPOINT, MULTILINESTRING, MULTIPOLYGON - same as above, with one additional level of list nesting

Row groups are not shredded if they contain GEOMETRYCOLLECTIONs, any EMPTY geometries, or multiple geometry sub-types.

Additionally, row groups are not shredded if they fall below the minimum size threshold (default: ~25% of the maximum row group size, i.e., 30,000 rows).

This threshold is configurable via the geometry_minimum_shredding_size setting. Set it to 0 to always shred, or -1 to disable shredding entirely.

Query
-- Disable shredding for geometry columnsSET geometry_minimum_shredding_size = -1;
-- Always shred geometry columns regardless of row group sizeSET geometry_minimum_shredding_size = 0;

The primary benefit of shredding is significantly improved compression, but in the future we plan to add ways to expose the shredded representation directly to the execution engine without having to "reassemble" the geometry back into binary again.

The following example illustrates the effects of shredding on the storage footprint of a GEOMETRY column.

Query
-- Attach a persistent database with storage version v1.5ATTACH 'geometry_db.db' as geometry_db (STORAGE_VERSION 'v1.5.0');
USE geometry_db;
-- Disable shredding completely and create a table with 1 million 2D pointsSET geometry_minimum_shredding_size = -1;
CREATE OR REPLACE TABLE points AS SELECT printf('POINT (%d %d)', x, y)::GEOMETRY AS geomFROM range(0, 1000) AS rx(x), range(0, 1000) AS ry(y);
-- Checkpoint the database to persist the data and storage layout to diskCHECKPOINT;
-- Attach a second databaseATTACH 'shredded_db.db' as shredded_db (STORAGE_VERSION 'v1.5.0');
USE shredded_db;
-- This time, set the minimum shredding size to 0 to always shred geometry columns,-- and create the same table with 1 million 2D pointsSET geometry_minimum_shredding_size = 0;
CREATE OR REPLACE TABLE points AS SELECT printf('POINT (%d %d)', x, y)::GEOMETRY AS geomFROM range(0, 1000) AS rx(x), range(0, 1000) AS ry(y);
-- Checkpoint to persist the data and storage layout to disk, and apply shreddingCHECKPOINT;
-- Now check the storage layout and memory usage of the geometry column in both attached databasesSELECT database_name, database_size FROM pragma_database_size();
-- The geometry column in `geometry_db` is stored as regular GEOMETRY segmentsSELECT DISTINCT(segment_type) FROM pragma_storage_info('geometry_db.points');
-- While the geometry column in `shredded_db` is decomposed into primitive DOUBLE segments,-- which can be compressed much more efficiently!SELECT DISTINCT(segment_type) FROM pragma_storage_info('shredded_db.points');

Geometry Statistics

GEOMETRY columns contain geometry-specific statistics that track the bounding box of the geometries in each row group, as well as the set of geometry types and vertex dimensions that are present within the row group.

You can inspect the statistics of a column using the stats() function:

Query
CREATE TABLE geometries as select 'POINT Z (30 10 5)'::GEOMETRY as geom;
SELECT stats(geom) AS geom_stats FROM geometries;
Result
 geom_stats-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- {"approx_unique":1,"extent":{"x_max":30.0,"x_min":30.0,"y_max":10.0,"y_min":10.0,"z_max":5.0,"z_min":5.0},"has_empty_geom":false,"has_empty_part":false,"has_no_null":true,"has_non_empty_geom":true,"has_non_empty_part":true,"has_null":false}

These statistics can be used by the query optimizer to skip row groups that don't match the geometry type or vertex dimensions required by a query, or to speed up spatial predicates by first checking if the bounding box of the geometries in the row group overlaps with the bounding box of the query geometry.

Currently, only the && operator, which is used to check if the bounding box of a geometry intersects the bounding box of another geometry, can take advantage of geometry statistics when used in a WHERE clause. There is ongoing work to add support for more statistics-based optimizations to spatial functions, such as ST_Intersects, ST_Distance, etc.

Persisting geometry statistics is only possible in storage versions v1.5 and above, and so if you are using an older storage version, the geometry statistics will turn into "unknown" statistics when checkpointing. In other words, the bounding box will be set to an infinitely large bounding box and all geometry types and vertex dimensions will be marked as maybe present, which means that the execution engine will not be able to do any optimizations based on the geometry statistics.

Coordinate Reference Systems

As far as the execution engine is concerned, geometries are considered to exist in a Cartesian coordinate system. In practice, however, most geospatial data is associated with a specific Coordinate Reference System (CRS) that defines how the coordinates relate to real-world locations on the Earth's surface.

A helpful analogy is to think of CRSs as the equivalent of "time zones", but for geospatial data. Just like how time zones define how local time relates to a standard reference time (e.g., UTC), CRSs define how the coordinates of a geometry relate to a standard reference system (e.g., WGS 84). CRSs are usually either geographic (e.g., WGS 84, which uses latitude and longitude) or projected (e.g., UTM, which uses linear units like meters).

When working with geospatial data, it's important to be aware of the CRS associated with different datasets. Performing spatial operations on geometries in different CRSs without proper transformation will most likely lead to incorrect results.

How are Coordinate Reference Systems Stored in SereneDB?

To avoid these kinds of mistakes, SereneDB makes it possible to explicitly associate a CRS with a GEOMETRY column.

This is done by passing a CRS "identifier" as a parameter of the GEOMETRY type. For example, a column of type GEOMETRY('OGC:CRS84') stores geometries that are associated with the "OGC CRS84" coordinate reference system.

CRS identifiers in SereneDB are always strings. OGC:CRS84 is the identifier for a common geographic coordinate system spanning the whole globe where the X coordinate represents longitude and the Y coordinate represents latitude. SereneDB only knows this because the identifier 'OGC:CRS84' is registered as a known CRS in the system catalog.

Only a handful of common CRSs are registered as known in this build of SereneDB. Registering additional CRSs, such as the over 7000 CRSs from the EPSG Geodetic Parameter Dataset, is not available in this build.

You can list all available CRSs known to SereneDB using the duckdb_coordinate_systems() function:

Query
SELECT crs_name, auth_name, auth_codeFROM duckdb_coordinate_systems()ORDER BY crs_name;
Result
 crs_name  | auth_name | auth_code-----------+-----------+----------- OGC:CRS83 | OGC       | CRS83 OGC:CRS84 | OGC       | CRS84

Handling Unknown Coordinate Reference Systems

As mentioned above, only coordinate systems that are registered in the system catalog (and therefore "known" to SereneDB) can be used when creating GEOMETRY columns. If you try to create a GEOMETRY column with an unknown CRS identifier, either manually or by importing an external geospatial dataset, the statement will fail with an error.

Query
SELECT 'POINT(1 2)'::GEOMETRY('FOO:1337') AS my_point;
Result
db error: ERROR: Encountered unrecognized coordinate system 'FOO:1337' when trying to create GEOMETRY typeThe coordinate system definition may be incomplete or invalid. Your options are as follows:* Load an extension that can identify this coordinate system* Provide a full coordinate system definition in e.g. "PROJJSON" or "WKT2" format* Set the 'ignore_unknown_crs' configuration option to drop the coordinate system from the resulting geometry type and make this error go away

This restriction exists because SereneDB needs the complete CRS definition, not just an identifier, to perform coordinate transformations and to export to formats that embed CRS metadata, such as GeoParquet. Without a system catalog entry, there is no way to resolve an identifier to its full definition.

You can set the ignore_unknown_crs configuration option to true to simply skip any unknown CRSs and create GEOMETRY columns without CRS instead.

Query
-- Ignore any unknown CRS identifiersSET ignore_unknown_crs = true;
SELECT 'POINT(1 2)'::GEOMETRY('FOO:1337')::TEXT AS my_point;
Result
 my_point------------- POINT (1 2)

Alternatively, if you are trying to define a GEOMETRY column yourself, you can provide a complete CRS definition in WKT or PROJJSON format instead of a shorthand identifier as the CRS parameter. However, as complete CRS definitions are usually very large, this gets unwieldy very quickly and is not recommended for interactive use.

It is currently not possible to define a custom CRS from within SQL, or to persist custom CRS definitions in a database such that SereneDB can use them to resolve CRS identifiers for geometry columns, but this is something we are considering for the future.

Working with Geometries in Different Coordinate Reference Systems

One benefit of tracking CRSs as part of the type system is that it prevents a lot of common mistakes that can occur when working with geometries from different coordinate systems. Most spatial functions that operate on multiple GEOMETRY values verify that all input expressions have the same CRS before performing the operation. Similarly, GEOMETRY columns can only be implicitly cast to and from other GEOMETRY columns if the source or the target don't have a CRS specified.

Converting a geometry from one CRS to another with ST_Transform(geom, crs) is not available in this build of SereneDB.

You can also use the ST_SetCRS(geom, crs) function to assign a CRS to a geometry that doesn't have one, or to reassign a CRS without transforming coordinates (e.g., when the data is already in the correct coordinate system but lacks the correct CRS).

Query
SELECT ST_SetCRS('POINT(4.897070 52.377956)'::GEOMETRY, 'OGC:CRS84')::TEXT AS with_crs;
Result
 with_crs--------------------------- POINT (4.89707 52.377956)

Or if you want to remove the CRS from a geometry, you can either just cast to GEOMETRY, or set the CRS to '':

Query
SELECT 'POINT(4.897070 52.377956)'::GEOMETRY('OGC:CRS84')::GEOMETRY::TEXT AS no_crs;
SELECT ST_SetCRS('POINT(4.897070 52.377956)'::GEOMETRY('OGC:CRS84'), '')::TEXT AS no_crs;
Result
 no_crs--------------------------- POINT (4.89707 52.377956)
 no_crs--------------------------- POINT (4.89707 52.377956)

You can of course also use ST_CRS(geom) to retrieve the CRS of a geometry:

Query
SELECT ST_CRS('POINT(4.897070 52.377956)'::GEOMETRY('OGC:CRS84')) AS crs;
USE $__DATABASE__;
DETACH IF EXISTS db161;
DETACH IF EXISTS db162;
Result
 crs----------- OGC:CRS84

Functions