BM25/TFIDF Ranking
Relevance scoring ranks search results by how well they match a query. The two most common algorithms are BM25 and TF-IDF, both based on term frequency and inverse document frequency.
See Setup for the shared dataset used in all examples.
How ranking works
Both algorithms rely on two statistical measures:
- Term frequency (TF) — how often a search term appears in a given document. A document mentioning "galaxy" five times is considered more relevant than one mentioning it once.
- Inverse document frequency (IDF) — how rare the term is across all indexed documents. Common words like "the" appear everywhere and carry little signal. A rare term like "paleontologist" is a much stronger match indicator.
BM25 vs TF-IDF
The key difference is that BM25 adds document length normalization and term frequency saturation on top of TF-IDF. In practice this means BM25 handles varying document lengths better — a short document with two mentions of a term can rank above a long document with three.
| TF-IDF | BM25 | |
|---|---|---|
| Length normalization | No | Yes (parameter b) |
| TF saturation | No — score grows linearly | Yes — diminishing returns (parameter k1) |
| Reads norms from index | No | Yes |
| Performance | Faster — fewer index reads | Slightly slower due to norm lookups |
| Best for | Uniform-length documents, latency-sensitive workloads | Mixed-length documents, general-purpose ranking |
Use BM25 as the default. Use TF-IDF when your documents are roughly the same length or when you need lower scoring latency — TF-IDF is faster because it does not need to read document length norms from the index.
BM25 scoring
Use BM25(<index>.tableoid) in the SELECT and ORDER BY clauses to rank results by relevance:
SELECT id, title, BM25(movies_idx.tableoid) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('alien')ORDER BY relevance DESC, id; id | title | relevance----+-------------------------------+----------- 7 | Star Trek: The Motion Picture | 3.2757118Custom parameters
Pass k1 and b to tune the ranking:
SELECT id, title, BM25(movies_idx.tableoid, 1.2, 0.75) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('galaxy')ORDER BY relevance DESC, id; id | title | relevance----+-------------------------------+----------- 7 | Star Trek: The Motion Picture | 2.4358742 8 | Alien | 2.4358742| Parameter | Default | Description |
|---|---|---|
k1 | 1.2 | Term frequency saturation. Higher values increase the impact of term frequency |
b | 0.75 | Document length normalization. 0 disables normalization, 1 fully normalizes |
Favor exact matches over frequency by lowering k1, and disable length normalization with b = 0:
SELECT id, title, BM25(movies_idx.tableoid, 0.5, 0) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('alien')ORDER BY relevance DESC, id; id | title | relevance----+-------------------------------+----------- 7 | Star Trek: The Motion Picture | 1.9924302Increase k1 to reward documents that mention the term many times:
SELECT id, title, BM25(movies_idx.tableoid, 2.0, 0.75) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('film')ORDER BY relevance DESC, id; id | title | relevance----+--------------+----------- 9 | Café Society | 3.8228743Named variants
Specific combinations of k1 and b produce well-known BM25 variants:
| Variant | Parameters | Behavior |
|---|---|---|
| BM25 | BM25(1.2, 0.75) | Default — balanced saturation and length normalization |
| BM15 | BM25(1.2, 0) | No length normalization (b=0). Treats all documents equally regardless of length |
| BM11 | BM25(1.2, 1) | Full length normalization (b=1). Strongly penalizes long documents |
| BM0 | BM25(0, 0) | Pure IDF — term frequency is ignored entirely. Only document rarity matters |
-- BM15: no length normalization (b = 0)SELECT id, title, BM25(movies_idx.tableoid, 1.2, 0) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('alien')ORDER BY relevance DESC, id; id | title | relevance----+-------------------------------+----------- 7 | Star Trek: The Motion Picture | 1.9924302Combine with filters
SELECT id, title, genre, BM25(movies_idx.tableoid) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('film') AND genre @@ 'drama'ORDER BY relevance DESC, id; id | title | genre | relevance----+--------------+-------+----------- 9 | Café Society | drama | 3.2757118Combine with analytics
SELECT genre, COUNT(*) AS matches, ROUND(AVG(relevance)::NUMERIC, 3) AS avg_relevanceFROM ( SELECT genre, BM25(movies_idx.tableoid) AS relevance FROM movies_idx WHERE description @@ ts_phrase('biggest blockbuster')) rankedGROUP BY genreORDER BY avg_relevance DESC, genre; genre | matches | avg_relevance-----------+---------+--------------- adventure | 1 | 4.872 comedy | 1 | 4.872Pagination with stable ordering
When paginating, add a tiebreaker column to ensure consistent ordering across pages:
SELECT id, title, BM25(movies_idx.tableoid) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('alien')ORDER BY relevance DESC, idLIMIT 10 OFFSET 0; id | title | relevance----+-------------------------------+----------- 7 | Star Trek: The Motion Picture | 3.2757118TFIDF scoring
Use TFIDF(<index>.tableoid) as an alternative scoring function:
SELECT id, title, TFIDF(movies_idx.tableoid) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('alien')ORDER BY relevance DESC, id; id | title | relevance----+-------------------------------+----------- 7 | Star Trek: The Motion Picture | 1.8718022With normalization
Pass true to enable normalization:
SELECT id, title, TFIDF(movies_idx.tableoid, true) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('alien')ORDER BY relevance DESC, id; id | title | relevance----+-------------------------------+----------- 7 | Star Trek: The Motion Picture | 1.8718022Custom scoring
Combine relevance scores with other columns for domain-specific ranking:
SELECT id, title, BM25(movies_idx.tableoid) AS relevance, runtime, BM25(movies_idx.tableoid) * LOG(runtime + 1) AS custom_scoreFROM movies_idxWHERE description @@ ts_phrase('alien')ORDER BY custom_score DESC, id; id | title | relevance | runtime | custom_score----+-------------------------------+-----------+---------+------------------- 7 | Star Trek: The Motion Picture | 3.2757118 | 132 | 6.957125828299511Dictionary requirements
To use scoring functions, your dictionary must have FREQUENCY = true:
CREATE TEXT SEARCH DICTIONARY ranking_dict ( template = 'text', locale = 'en_US.UTF-8', case = 'lower', stemming = true, accent = false, frequency = true, position = true);The FREQUENCY flag stores term frequency data in the index, which BM25 and TF-IDF need for scoring.
See also
- Phrase and Proximity Search — finding phrase matches to rank
- CREATE TEXT SEARCH DICTIONARY — frequency and position flags