Edit this page

Indexing Views

An inverted index can be built over a view, not just a base table. This is how SereneDB searches data it does not own a primary copy of — most importantly external files (Parquet, CSV, JSON on disk or S3) exposed through a view.

A view has no primary key of its own, so two things have to be resolved: a row identity (so each indexed document can be located again) and, when a query needs a column the index does not hold, a way to fetch that column on demand. SereneDB handles both through a fast-path source.

Counts, relevance scores and any indexed or INCLUDEd column come straight from the frozen index; any other column is fetched live from the source by row identity at query time. That split — frozen postings, live values — is the whole model.

All examples below use this setup — a base table and a view over it:

Query

CREATE TABLE base_docs (id INTEGER PRIMARY KEY, body VARCHAR, extra VARCHAR);
CREATE TEXT SEARCH DICTIONARY view_en (    template = 'text',    locale = 'en_US.UTF-8',    case = 'lower',    stemming = false,    accent = false,    frequency = true,    position = true);
INSERT INTO base_docs VALUES    (1, 'quick brown fox', 'alpha'),    (2, 'lazy dog sleeps', 'beta'),    (3, 'quick red fox', 'gamma');
CREATE VIEW v_docs AS SELECT id, body, extra FROM base_docs;
CREATE INDEX v_docs_idx ON v_docs USING inverted (id, body view_en);

Row identity

How the index derives a stable per-row identity depends on the view body:

View body	Row identity	Explicit `pk`?
`SELECT … FROM base_table` (has `PRIMARY KEY`)	The base table's primary key	Auto
`SELECT … FROM base_table` (no PK)	The hidden row id	Auto
`SELECT … FROM read_parquet('file')`	The reader's `file_row_number`	Auto
`read_csv` / `read_json` single file	The byte offset within the file	Auto
Reader over a glob (`'…/*.parquet'`)	`(file_index, position)` pair	Auto
Attached DuckDB / Iceberg table	The source row id / `(file_index, row)`	Auto
Generic body (inline `VALUES`, `UNION ALL`, joins)	None detectable	Required — `WITH (pk = '…')`

Fast-path sources

A fast-path source is a view body SereneDB recognizes well enough to both derive the row identity and re-read columns by it later. The recognized sources are: SereneDB base tables, read_parquet / read_csv / read_json (and their *_auto / parquet_scan / read_ndjson variants), Iceberg tables, attached DuckDB tables and read_text / read_blob.

The view body may shape the source freely and still qualify as fast-path:

a column subset, reordering or renaming (SELECT body, id FROM …, SELECT a AS x FROM …);
a cast on the indexed column (SELECT id::BIGINT, body FROM …);
an indexed expression over source columns (upper(body), (json ->> 'b'));
a WHERE / ORDER BY / LIMIT in the view body (the index then captures only the rows the view emits).

A body with no fast-path leaf — inline VALUES, a UNION ALL, a join — is a generic view: it must be given WITH (pk = '…') and supports only the non-materializing queries below.

What runs without materialization

These queries are answered entirely from the index, never touching the source — so they work on every view shape, including generic views:

COUNT(*) / COUNT(1), with or without a @@ filter;
full-text @@ filters and secondary scalar filters (=, <, BETWEEN, IN) that get pushed into the index scan;
relevance scores — BM25, TFIDF — and the index tableoid;
ts_offsets;
projections of indexed columns and of INCLUDEd columns and aggregates over them.

Query

SELECT count(*) FROM v_docs_idx WHERE body @@ ts_phrase('quick');

Result

 count-------     2

Query

SELECT id FROM v_docs_idxWHERE body @@ ts_phrase('quick')ORDER BY BM25(v_docs_idx.tableoid) DESC, id;

Result

 id----  1  3

Materializing real columns

Selecting a real source column that is neither indexed nor INCLUDEd materializes it: the index hands back the row identities for the matches, and SereneDB re-reads those columns from the source through the fast-path lookup. The column values are not stored in the index — they are fetched live at query time:

Query

SELECT id, extra FROM v_docs_idx WHERE body @@ ts_phrase('quick') ORDER BY id;

Result

 id | extra----+-------  1 | alpha  3 | gamma

Because the read is live, if the underlying source changed after the index was built, materialized values reflect the current source: rows deleted from the source come back as NULL, and edited content shows its new value (or raises an error if a file became unreadable). Counts and scores, by contrast, still reflect the frozen build-time snapshot.

Generic views

A generic view (no fast-path source) still indexes and answers the non-materializing queries, but it must be given an explicit primary key:

Query

SELECT count(*) FROM u_docs_idx WHERE body @@ ts_phrase('quick');

Result

 count-------     1

Selecting its real columns is not supported and raises an error:

Query

SELECT id, body FROM u_docs_idx WHERE body @@ ts_phrase('quick');

Result

db error: ERROR: materialising real columns from this view-backed inverted index is not yet supported -- view body must be a simple `SELECT * FROM <reader>(literal_args)` over a recognised fast-path source (read_parquet/csv/json/...)

`INCLUDE` columns

INCLUDEd columns on a view are stored in the index's columnstore, so they are returned without materializing the source — the same as on a base table. Use INCLUDE for columns you frequently return but never search, to avoid the per-row source lookup.

Snapshot and isolation

A view-backed index is a static snapshot: its postings are captured at CREATE INDEX time and are not refreshed when the source changes (there is no background DML tracking as there is for base tables). To pick up new data, rebuild the index. A reader transaction keeps a consistent view of the index even if the underlying view is dropped concurrently.

Row identity​

Fast-path sources​

What runs without materialization​

Materializing real columns​

Generic views​

INCLUDE columns​

Snapshot and isolation​

See also​

Row identity

Fast-path sources

What runs without materialization

Materializing real columns

Generic views

`INCLUDE` columns

Snapshot and isolation

See also