Maintenance & Introspection
An inverted index is eventually consistent. Writes to the base table are buffered and become searchable only after the index is refreshed; segments are compacted in the background to keep queries fast; and per-term statistics used by scorers can be recomputed. This page covers that lifecycle and how to inspect an index.
Visibility and the refresh model
A newly written row is not searchable until a refresh publishes it to readers. There are two ways a refresh happens:
- Automatically — a background thread refreshes each index on an interval (
refresh_interval, default 1000 ms). - Explicitly — run
VACUUM (REFRESH_TABLE)to publish pending writes immediately. This is what you want right after a bulk load, before querying:
VACUUM (REFRESH_TABLE) my_logs;
SELECT id FROM logs_idx WHERE msg @@ 'disk' ORDER BY id; id---- 1Background intervals
Three index WITH options control the background lifecycle (set at CREATE INDEX); 0 disables each:
| Option | Default | Controls |
|---|---|---|
refresh_interval | 1000 | Milliseconds between automatic refreshes |
compaction_interval | 1000 | Milliseconds between automatic segment compactions |
cleanup_interval_step | 1 | Commit ticks between cleanup passes |
Manual maintenance with VACUUM
Each operation comes in a family scoped to a single index, a table, a schema, a database, or instance-wide (*_ALL, which takes no argument). At most one maintenance option may appear per VACUUM statement.
| Operation | Forms | Effect |
|---|---|---|
| Refresh | REFRESH_INDEX · REFRESH_TABLE · REFRESH_SCHEMA · REFRESH_DATABASE · REFRESH_ALL | Publish pending writes to readers |
| Compact | COMPACT_INDEX · COMPACT_TABLE · COMPACT_SCHEMA · COMPACT_DATABASE · COMPACT_ALL | Merge segments to reclaim space and speed queries |
| Recompute statistics | RECOMPUTE_STATS_COLUMN · RECOMPUTE_STATS_TABLE · RECOMPUTE_STATS_SCHEMA · RECOMPUTE_STATS_DATABASE · RECOMPUTE_STATS_ALL | Rebuild the term statistics used by relevance scoring and planning |
VACUUM (COMPACT_TABLE) my_logs;VACUUM (RECOMPUTE_STATS_TABLE) my_logs;Recompute statistics after large changes in data distribution so that relevance scores and planning stay accurate.
Rebuilding
A view- or external-data-backed index is a static snapshot taken at CREATE INDEX time — it does not track later changes to its source. To pick up new data, rebuild it with DROP INDEX followed by CREATE INDEX.
Schema changes on an indexed table
An inverted index pins every column it reads, including columns reached only through an indexed expression — for example a struct sub-field such as (s['id']::INTEGER) pins the whole s column.
RENAMEcolumn, table or index — allowed; the index keys columns by id and follows the rename automatically.ADD COLUMN— allowed; it is index-neutral (existing rows are backfilled).ALTER COLUMN … TYPEon a pinned column — rejected; the index stores values of the old type. Drop the index first, change the type, then recreate it.ADD/DROP/RENAMEof a struct field on a pinned column — rejected for the same reason, even when the indexed expression targets a different sub-field (the whole column is pinned). Drop the index first.DROP COLUMNon a pinned column — allowed; it cascade-drops every index that covers the column.
Performance
Tuning is mostly about the background cadence and segment layout; use only the options that exist:
- Refresh vs. compaction cadence — lower
refresh_intervalfor fresher results, raise it (or0) to reduce overhead on write-heavy tables;compaction_interval/cleanup_interval_stepgovern how aggressively segments merge. - Row-group sizes —
row_group_size(storedINCLUDEd columns) andnorm_row_group_size(norm columns) control the columnstore batch size. - Build then index — for a bulk load, create the table, load the data, then create the index; this produces a more compact index than loading into an already-indexed table.
- Top-K — set
optimize_top_kto accelerateORDER BY <scorer> … LIMIT k.
Session settings
Beyond the per-index WITH options, a few sdb_-prefixed session settings tune search at query time. Set them per connection with SET (and restore with RESET); they affect only the current session:
| Setting | Default | Effect |
|---|---|---|
sdb_disable_top_k_optimization | false | When true, the optimizer does not pull ORDER BY <scorer> DESC LIMIT k into the index scan, so WAND top-K pruning never engages. Useful to A/B the optimization or work around a plan regression. |
sdb_scored_terms_limit | 1024 | Maximum number of terms considered for scoring in multi-term filters. Higher values give more accurate IDF-style scoring at the cost of memory and per-query work; 0 disables scored-term collection entirely. |
sdb_ef_search | 0 | Per-session override for the HNSW search-time neighbourhood size (efSearch) used by vector search. Higher = better recall, slower queries; 0 uses the query's LIMIT (Top-K) value instead. |
SET sdb_ef_search = 64; -- widen the HNSW search beam for this session
SET sdb_scored_terms_limit = 4096;
-- … run queries …
RESET sdb_ef_search; -- back to the default
Introspection
List the inverted indexes with the duckdb_indexes() table function:
SELECT index_name, table_nameFROM duckdb_indexes()WHERE table_name = 'my_logs'; index_name | table_name------------+------------ logs_idx | my_logsThe standard pg_indexes view also lists them (alongside the primary-key index), with the CREATE INDEX statement in its indexdef column:
SELECT indexnameFROM pg_indexesWHERE tablename = 'my_logs'ORDER BY indexname; indexname-------------- logs_idx my_logs_pkeySee also
- Inverted Index · Ranking
VACUUM— full statement referenceCREATE INDEX … USING inverted— indexWITHoptions