Skip to main content

Ranking

Matching answers a yes/no question — does a row match? Ranking answers a different one — how well does it match? — and orders the results accordingly. The two are separate steps: a WHERE col @@ query filter selects the matching rows, and a scorer in ORDER BY sorts them by relevance.

Scoring with BM25

Okapi BM25 is the standard relevance scorer. Use it in ORDER BY, highest score first:

Query
SELECT id, title, BM25(movies_idx.tableoid) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('alien')ORDER BY relevance DESC, id;
Result
 id | title                         | relevance----+-------------------------------+-----------  7 | Star Trek: The Motion Picture | 1.9693236

Other scorers

SereneDB ships several scorers; TFIDF is the classic alternative:

Query
SELECT id, title, TFIDF(movies_idx.tableoid) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('alien')ORDER BY relevance DESC, id;
Result
 id | title                         | relevance----+-------------------------------+-----------  7 | Star Trek: The Motion Picture | 1.2527629
ScorerParameters (defaults)Notes
BM25k1 (1.2), b (0.75)Okapi BM25; b = 0 disables length normalization (BM15)
TFIDFwith_norms (false)Classic TF-IDF
lm_jmlambda (0.1)Language model, Jelinek-Mercer smoothing
lm_dirichletmu (2000)Language model, Dirichlet smoothing
indri_dirichletmu (2000)Indri-style Dirichlet (no floor clamp)
dfimeasure ('standardized')Divergence-from-independence; also 'saturated', 'chi_squared'
raw_tf / raw_boost / raw_dlRaw term frequency, boost and document length

Boosting

The ^ operator multiplies a sub-query's contribution to the score, so you can weight some clauses above others. Here a title match is boosted so it outranks a description-only match:

Query
SELECT id, titleFROM movies_idxWHERE title @@ ('alien'::tsquery ^ 5.0) OR description @@ 'galaxy'ORDER BY BM25(movies_idx.tableoid) DESC, id;
Result
 id | title----+-------------------------------  8 | Alien  7 | Star Trek: The Motion Picture

Top-K queries and WAND pruning

The common shape ORDER BY <scorer> DESC LIMIT k returns the best k matches. Building the index with the optimize_top_k index option enables WAND pruning, which skips candidates that provably cannot reach the top k:

CREATE INDEX movies_idx ON movies
USING inverted (id, description ranking_dict)
WITH (optimize_top_k = 'bm25(1.2, 0.75)');
Query
SELECT id, titleFROM movies_idxWHERE description @@ 'galaxy'ORDER BY BM25(movies_idx.tableoid) DESC, idLIMIT 1;
Result
 id | title----+-------------------------------  7 | Star Trek: The Motion Picture

Pruning engages only when all of the following hold; otherwise the query still runs correctly, just without the optimization:

  • the query is ORDER BY <scorer>(idx.tableoid) DESC with a LIMIT;
  • the scorer matches the one named in optimize_top_k exactly (a different scorer falls back to a full scan);
  • the filter is a single term or an OR of terms (not a phrase, AND or NOT).

You can confirm pruning is active in the query plan — EXPLAIN shows Top: k, optimized on the scan. The sdb_disable_top_k_optimization and sdb_scored_terms_limit session settings tune this at query time.

Tie-breaking

Scores can tie. Add further ORDER BY columns after the scorer for a deterministic order — typically the primary key:

SELECT id, title
FROM movies_idx
WHERE description @@ 'galaxy'
ORDER BY BM25(movies_idx.tableoid) DESC, id;

Combining ranked queries

To blend a lexical (BM25) ranking with a vector ranking — or any two ranked result sets — use Reciprocal Rank Fusion, which combines ranks rather than incomparable scores. See Hybrid Search and the Reciprocal Rank Fusion recipe.

See also