Edit this page

Indexing External Data

SereneDB can create inverted indexes over external files — Parquet, CSV or ORC files on local disk or S3 — enabling full-text search, phrase queries, fuzzy matching and relevance ranking without importing data into the database. You expose the files through a view over a reader function, then index the view.

Expose the files through a view

Define a view backed by a remote file:

Query

CREATE VIEW logs AS    SELECT * FROM read_parquet('s3://my-bucket/logs/2025/*.parquet');

Local files work the same way:

Query

CREATE VIEW logs_csv AS    SELECT * FROM read_csv('/data/logs/*.csv');

Create a text search dictionary

Define how text columns should be tokenized:

Query

CREATE TEXT SEARCH DICTIONARY english_dict (    template = 'text',    locale = 'en_US.UTF-8',    case = 'lower',    stemming = true,    accent = false,    frequency = true,    position = true,    offset = true);

Build an inverted index

Create the index over the view. Columns without a dictionary are indexed as-is (exact matching); columns with a dictionary get full-text processing:

Query

CREATE INDEX logs_idx ON logs    USING inverted (id, level, service, message english_dict);

Query the index

Once built, query the index by name — just like any other inverted index:

Query

-- Full-text search ranked by relevance, combined with an exact filterSELECT id, level, messageFROM logs_idxWHERE message @@ ts_phrase('out of memory') AND level @@ 'ERROR'ORDER BY BM25(logs_idx.tableoid) DESC, id;

Result

 id | level | message----+-------+------------------------------  3 | ERROR | out of memory while indexing

Rebuild the index

An external-data index is a static snapshot of the files at build time. When the underlying files change, rebuild the index:

Query

DROP INDEX logs_idx;
CREATE INDEX logs_idx ON logs    USING inverted (id, level, service, message english_dict);

Use cases

Log analysis — search through terabytes of Parquet log files on S3 without loading them into the database.
Data lake search — add full-text search to your existing data lake files.
Archival queries — index historical data that rarely changes but needs to be searchable.
Hybrid pipelines — combine external data with local tables in the same query.

Expose the files through a view​

Create a text search dictionary​

Build an inverted index​

Query the index​

Rebuild the index​

Use cases​

See also​

Expose the files through a view

Create a text search dictionary

Build an inverted index

Query the index

Rebuild the index

Use cases

See also