Skip to main content

What to Index

Deciding what goes into an inverted index — which columns, whether to index a raw column or an expression over it, what to merely store for retrieval, and how big the result will be — is the main modeling decision. This page covers those choices. For the exact CREATE INDEX grammar, see the statement reference.

Indexed vs. INCLUDEd columns

A column can play one of two roles:

  • Indexed (USING inverted (col …)) — searchable with @@. The column's tokens go into the inverted structure.
  • INCLUDEd (INCLUDE (col)) — stored but not searchable. The raw value is kept in the index's columnstore so a query against the index can return it without a second lookup against the base table.

Use INCLUDE for columns you frequently return or filter on but never full-text search — ids, URLs, prices, timestamps, JSON payloads. Numeric and temporal indexed columns are stored in a columnar form too, so they support fast range filtering and sorting alongside search.

Indexing expressions

You can index an expression over one or more columns, not just a bare column. The query must use the same expression. For example, index lower(s) to get case-insensitive exact matching on a verbatim column:

Query
CREATE INDEX notes_idx ON notes    USING inverted (id, (lower(s)) kw);
SELECT id, s FROM notes_idx WHERE lower(s) @@ 'hello' ORDER BY id;
Result
 id | s----+-------  1 | HELLO

Indexed expressions must be deterministic and reference at least one column. Aggregates, subqueries and volatile functions are rejected at CREATE INDEX:

Query
CREATE INDEX metrics_idx ON metrics USING inverted (id, (sum(n)));
Result
db error: ERROR: aggregate functions are not allowed in index expressions
Query
CREATE INDEX metrics_idx2 ON metrics USING inverted (id, (n + random()));
Result
db error: ERROR: Index keys cannot contain expressions with side effects.

Generated columns

A generated column is indexed like any other column — index the column itself rather than repeating its expression. A STORED generated column works directly:

Query
CREATE INDEX line_items_idx ON line_items    USING inverted (id, price_with_tax);
SELECT id, price_with_tax FROM line_items_idxWHERE price_with_tax @@ ts_le(150)ORDER BY id;
Result
 id | price_with_tax----+----------------  1 |            110

Indexing JSON

Semi-structured JSON is indexed by its sub-fields. Prefer the shredded VARIANT type — it is the fastest way to store, search and return JSON — and index typed sub-field extractions such as (doc['title']::VARCHAR).

The dedicated Indexing JSON page covers the full treatment:

Indexing arrays

A TEXT[] / VARCHAR[] column is indexed element-by-element: a row matches if any element matches. Tokenization and the operator class apply to each element just as for a scalar text column:

Query
CREATE INDEX articles_tags_idx ON articles_tags    USING inverted (id, tags kw);
SELECT id, tags FROM articles_tags_idx WHERE tags @@ 'blue' ORDER BY id;
Result
 id | tags----+------------  1 | {red,blue}

Sizing and cost

The index only stores what you ask it to — keep it lean:

  • Feature flags (frequency, position, offset, norm) each enlarge the index. Enable only what your queries need — see Text Analysis.
  • INCLUDE codecs: each INCLUDEd column can set a compression codec (uncompressed, bitpacking, alp, rle, fsst).
  • Vector columns: HNSW m and ef_construction trade recall against index size and build time — see Vector Search.
  • Prefer INCLUDE over post-hoc materialization for columns you routinely return.

Limitations

  • Partial indexes (CREATE INDEX … WHERE …) are not supported.
  • Composite ROW(...) / STRUCT columns cannot be indexed; list the columns individually instead.
  • The same indexed expression cannot be listed twice with different dictionaries.
  • HUGEINT, DECIMAL, UUID and INTERVAL columns cannot be indexed.

See also