Ranking
Matching answers a yes/no question — does a row match? Ranking answers a different one — how well does it match? — and orders the results accordingly. The two are separate steps: a WHERE col @@ query filter selects the matching rows, and a scorer in ORDER BY sorts them by relevance.
Scoring with BM25
Okapi BM25 is the standard relevance scorer. Use it in ORDER BY, highest score first:
SELECT id, title, BM25(movies_idx.tableoid) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('alien')ORDER BY relevance DESC, id; id | title | relevance----+-------------------------------+----------- 7 | Star Trek: The Motion Picture | 1.9693236Other scorers
SereneDB ships several scorers; TFIDF is the classic alternative:
SELECT id, title, TFIDF(movies_idx.tableoid) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('alien')ORDER BY relevance DESC, id; id | title | relevance----+-------------------------------+----------- 7 | Star Trek: The Motion Picture | 1.2527629| Scorer | Parameters (defaults) | Notes |
|---|---|---|
BM25 | k1 (1.2), b (0.75) | Okapi BM25; b = 0 disables length normalization (BM15) |
TFIDF | with_norms (false) | Classic TF-IDF |
lm_jm | lambda (0.1) | Language model, Jelinek-Mercer smoothing |
lm_dirichlet | mu (2000) | Language model, Dirichlet smoothing |
indri_dirichlet | mu (2000) | Indri-style Dirichlet (no floor clamp) |
dfi | measure ('standardized') | Divergence-from-independence; also 'saturated', 'chi_squared' |
raw_tf / raw_boost / raw_dl | — | Raw term frequency, boost and document length |
Boosting
The ^ operator multiplies a sub-query's contribution to the score, so you can weight some clauses above others. Here a title match is boosted so it outranks a description-only match:
SELECT id, titleFROM movies_idxWHERE title @@ ('alien'::tsquery ^ 5.0) OR description @@ 'galaxy'ORDER BY BM25(movies_idx.tableoid) DESC, id; id | title----+------------------------------- 8 | Alien 7 | Star Trek: The Motion PictureTop-K queries and WAND pruning
The common shape ORDER BY <scorer> DESC LIMIT k returns the best k matches. Building the index with the optimize_top_k index option enables WAND pruning, which skips candidates that provably cannot reach the top k:
CREATE INDEX movies_idx ON movies
USING inverted (id, description ranking_dict)
WITH (optimize_top_k = 'bm25(1.2, 0.75)');
SELECT id, titleFROM movies_idxWHERE description @@ 'galaxy'ORDER BY BM25(movies_idx.tableoid) DESC, idLIMIT 1; id | title----+------------------------------- 7 | Star Trek: The Motion PicturePruning engages only when all of the following hold; otherwise the query still runs correctly, just without the optimization:
- the query is
ORDER BY <scorer>(idx.tableoid) DESCwith aLIMIT; - the scorer matches the one named in
optimize_top_kexactly (a different scorer falls back to a full scan); - the filter is a single term or an
ORof terms (not a phrase,ANDorNOT).
You can confirm pruning is active in the query plan — EXPLAIN shows Top: k, optimized on the scan. The sdb_disable_top_k_optimization and sdb_scored_terms_limit session settings tune this at query time.
Tie-breaking
Scores can tie. Add further ORDER BY columns after the scorer for a deterministic order — typically the primary key:
SELECT id, title
FROM movies_idx
WHERE description @@ 'galaxy'
ORDER BY BM25(movies_idx.tableoid) DESC, id;
Combining ranked queries
To blend a lexical (BM25) ranking with a vector ranking — or any two ranked result sets — use Reciprocal Rank Fusion, which combines ranks rather than incomparable scores. See Hybrid Search and the Reciprocal Rank Fusion recipe.
See also
- Full-Text Search — matching and query construction
- Scoring functions — full scorer reference
- Text Analysis — the
frequencyflag scoring requires - BM25/TFIDF Ranking — parameter tuning