Edit this page

Maintenance & Introspection

An inverted index is eventually consistent. Writes to the base table are buffered and become searchable only after the index is refreshed; segments are compacted in the background to keep queries fast; and per-term statistics used by scorers can be recomputed. This page covers that lifecycle and how to inspect an index.

Visibility and the refresh model

A newly written row is not searchable until a refresh publishes it to readers. There are two ways a refresh happens:

Automatically — a background thread refreshes each index on an interval (refresh_interval, default 1000 ms).
Explicitly — run VACUUM (REFRESH_TABLE) to publish pending writes immediately. This is what you want right after a bulk load, before querying:

Query

VACUUM (REFRESH_TABLE) my_logs;
SELECT id FROM logs_idx WHERE msg @@ 'disk' ORDER BY id;

Result

 id----  1

Background intervals

Three index WITH options control the background lifecycle (set at CREATE INDEX); 0 disables each:

Option	Default	Controls
`refresh_interval`	`1000`	Milliseconds between automatic refreshes
`compaction_interval`	`1000`	Milliseconds between automatic segment compactions
`cleanup_interval_step`	`1`	Commit ticks between cleanup passes

Manual maintenance with `VACUUM`

Each operation comes in a family scoped to a single index, a table, a schema, a database, or instance-wide (*_ALL, which takes no argument). At most one maintenance option may appear per VACUUM statement.

Operation	Forms	Effect
Refresh	`REFRESH_INDEX` · `REFRESH_TABLE` · `REFRESH_SCHEMA` · `REFRESH_DATABASE` · `REFRESH_ALL`	Publish pending writes to readers
Compact	`COMPACT_INDEX` · `COMPACT_TABLE` · `COMPACT_SCHEMA` · `COMPACT_DATABASE` · `COMPACT_ALL`	Merge segments to reclaim space and speed queries
Recompute statistics	`RECOMPUTE_STATS_COLUMN` · `RECOMPUTE_STATS_TABLE` · `RECOMPUTE_STATS_SCHEMA` · `RECOMPUTE_STATS_DATABASE` · `RECOMPUTE_STATS_ALL`	Rebuild the term statistics used by relevance scoring and planning

Query

VACUUM (COMPACT_TABLE) my_logs;

Query

VACUUM (RECOMPUTE_STATS_TABLE) my_logs;

Recompute statistics after large changes in data distribution so that relevance scores and planning stay accurate.

Rebuilding

A view- or external-data-backed index is a static snapshot taken at CREATE INDEX time — it does not track later changes to its source. To pick up new data, rebuild it with DROP INDEX followed by CREATE INDEX.

Schema changes on an indexed table

An inverted index pins every column it reads, including columns reached only through an indexed expression — for example a struct sub-field such as (s['id']::INTEGER) pins the whole s column.

RENAME column, table or index — allowed; the index keys columns by id and follows the rename automatically.
ADD COLUMN — allowed; it is index-neutral (existing rows are backfilled).
ALTER COLUMN … TYPE on a pinned column — rejected; the index stores values of the old type. Drop the index first, change the type, then recreate it.
ADD / DROP / RENAME of a struct field on a pinned column — rejected for the same reason, even when the indexed expression targets a different sub-field (the whole column is pinned). Drop the index first.
DROP COLUMN on a pinned column — allowed; it cascade-drops every index that covers the column.

Performance

Tuning is mostly about the background cadence and segment layout; use only the options that exist:

Refresh vs. compaction cadence — lower refresh_interval for fresher results, raise it (or 0) to reduce overhead on write-heavy tables; compaction_interval / cleanup_interval_step govern how aggressively segments merge.
Row-group sizes — row_group_size (stored INCLUDEd columns) and norm_row_group_size (norm columns) control the columnstore batch size.
Build then index — for a bulk load, create the table, load the data, then create the index; this produces a more compact index than loading into an already-indexed table.
Top-K — set optimize_top_k to accelerate ORDER BY <scorer> … LIMIT k.

Session settings

Beyond the per-index WITH options, a few sdb_-prefixed session settings tune search at query time. Set them per connection with SET (and restore with RESET); they affect only the current session:

Setting	Default	Effect
`sdb_disable_top_k_optimization`	`false`	When `true`, the optimizer does not pull `ORDER BY <scorer> DESC LIMIT k` into the index scan, so WAND top-K pruning never engages. Useful to A/B the optimization or work around a plan regression.
`sdb_scored_terms_limit`	`1024`	Maximum number of terms considered for scoring in multi-term filters. Higher values give more accurate IDF-style scoring at the cost of memory and per-query work; `0` disables scored-term collection entirely.
`sdb_ef_search`	`0`	Per-session override for the HNSW search-time neighbourhood size (`efSearch`) used by vector search. Higher = better recall, slower queries; `0` uses the query's `LIMIT` (Top-K) value instead.

SET sdb_ef_search = 64;          -- widen the HNSW search beam for this session
SET sdb_scored_terms_limit = 4096;
-- … run queries …
RESET sdb_ef_search;             -- back to the default

Introspection

List the inverted indexes with the duckdb_indexes() table function:

Query

SELECT index_name, table_nameFROM duckdb_indexes()WHERE table_name = 'my_logs';

Result

 index_name | table_name------------+------------ logs_idx   | my_logs

The standard pg_indexes view also lists them (alongside the primary-key index), with the CREATE INDEX statement in its indexdef column:

Query

SELECT indexnameFROM pg_indexesWHERE tablename = 'my_logs'ORDER BY indexname;

Result

 indexname-------------- logs_idx my_logs_pkey

Visibility and the refresh model​

Background intervals​

Manual maintenance with VACUUM​

Rebuilding​

Schema changes on an indexed table​

Performance​

Session settings​

Introspection​

See also​

Visibility and the refresh model

Background intervals

Manual maintenance with `VACUUM`

Rebuilding

Schema changes on an indexed table

Performance

Session settings

Introspection

See also