Skip to main content

BM25/TFIDF Ranking

Relevance scoring ranks search results by how well they match a query. The two most common algorithms are BM25 and TF-IDF, both based on term frequency and inverse document frequency.

See Setup for the shared dataset used in all examples.

How ranking works

Both algorithms rely on two statistical measures:

  • Term frequency (TF) — how often a search term appears in a given document. A document mentioning "galaxy" five times is considered more relevant than one mentioning it once.
  • Inverse document frequency (IDF) — how rare the term is across all indexed documents. Common words like "the" appear everywhere and carry little signal. A rare term like "paleontologist" is a much stronger match indicator.

BM25 vs TF-IDF

The key difference is that BM25 adds document length normalization and term frequency saturation on top of TF-IDF. In practice this means BM25 handles varying document lengths better — a short document with two mentions of a term can rank above a long document with three.

TF-IDFBM25
Length normalizationNoYes (parameter b)
TF saturationNo — score grows linearlyYes — diminishing returns (parameter k1)
Reads norms from indexNoYes
PerformanceFaster — fewer index readsSlightly slower due to norm lookups
Best forUniform-length documents, latency-sensitive workloadsMixed-length documents, general-purpose ranking

Use BM25 as the default. Use TF-IDF when your documents are roughly the same length or when you need lower scoring latency — TF-IDF is faster because it does not need to read document length norms from the index.

BM25 scoring

Use BM25(<index>.tableoid) in the SELECT and ORDER BY clauses to rank results by relevance:

Query
SELECT id, title, BM25(movies_idx.tableoid) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('alien')ORDER BY relevance DESC, id;
Result
 id | title                         | relevance----+-------------------------------+-----------  7 | Star Trek: The Motion Picture | 3.2757118

Custom parameters

Pass k1 and b to tune the ranking:

Query
SELECT id, title, BM25(movies_idx.tableoid, 1.2, 0.75) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('galaxy')ORDER BY relevance DESC, id;
Result
 id | title                         | relevance----+-------------------------------+-----------  7 | Star Trek: The Motion Picture | 2.4358742  8 | Alien                         | 2.4358742
ParameterDefaultDescription
k11.2Term frequency saturation. Higher values increase the impact of term frequency
b0.75Document length normalization. 0 disables normalization, 1 fully normalizes

Favor exact matches over frequency by lowering k1, and disable length normalization with b = 0:

Query
SELECT id, title, BM25(movies_idx.tableoid, 0.5, 0) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('alien')ORDER BY relevance DESC, id;
Result
 id | title                         | relevance----+-------------------------------+-----------  7 | Star Trek: The Motion Picture | 1.9924302

Increase k1 to reward documents that mention the term many times:

Query
SELECT id, title, BM25(movies_idx.tableoid, 2.0, 0.75) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('film')ORDER BY relevance DESC, id;
Result
 id | title        | relevance----+--------------+-----------  9 | Café Society | 3.8228743

Named variants

Specific combinations of k1 and b produce well-known BM25 variants:

VariantParametersBehavior
BM25BM25(1.2, 0.75)Default — balanced saturation and length normalization
BM15BM25(1.2, 0)No length normalization (b=0). Treats all documents equally regardless of length
BM11BM25(1.2, 1)Full length normalization (b=1). Strongly penalizes long documents
BM0BM25(0, 0)Pure IDF — term frequency is ignored entirely. Only document rarity matters
Query
-- BM15: no length normalization (b = 0)SELECT id, title, BM25(movies_idx.tableoid, 1.2, 0) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('alien')ORDER BY relevance DESC, id;
Result
 id | title                         | relevance----+-------------------------------+-----------  7 | Star Trek: The Motion Picture | 1.9924302

Combine with filters

Query
SELECT id, title, genre, BM25(movies_idx.tableoid) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('film') AND genre @@ 'drama'ORDER BY relevance DESC, id;
Result
 id | title        | genre | relevance----+--------------+-------+-----------  9 | Café Society | drama | 3.2757118

Combine with analytics

Query
SELECT genre, COUNT(*) AS matches, ROUND(AVG(relevance)::NUMERIC, 3) AS avg_relevanceFROM (    SELECT genre, BM25(movies_idx.tableoid) AS relevance    FROM movies_idx    WHERE description @@ ts_phrase('biggest blockbuster')) rankedGROUP BY genreORDER BY avg_relevance DESC, genre;
Result
 genre     | matches | avg_relevance-----------+---------+--------------- adventure |       1 |         4.872 comedy    |       1 |         4.872

Pagination with stable ordering

When paginating, add a tiebreaker column to ensure consistent ordering across pages:

Query
SELECT id, title, BM25(movies_idx.tableoid) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('alien')ORDER BY relevance DESC, idLIMIT 10 OFFSET 0;
Result
 id | title                         | relevance----+-------------------------------+-----------  7 | Star Trek: The Motion Picture | 3.2757118

TFIDF scoring

Use TFIDF(<index>.tableoid) as an alternative scoring function:

Query
SELECT id, title, TFIDF(movies_idx.tableoid) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('alien')ORDER BY relevance DESC, id;
Result
 id | title                         | relevance----+-------------------------------+-----------  7 | Star Trek: The Motion Picture | 1.8718022

With normalization

Pass true to enable normalization:

Query
SELECT id, title, TFIDF(movies_idx.tableoid, true) AS relevanceFROM movies_idxWHERE description @@ ts_phrase('alien')ORDER BY relevance DESC, id;
Result
 id | title                         | relevance----+-------------------------------+-----------  7 | Star Trek: The Motion Picture | 1.8718022

Custom scoring

Combine relevance scores with other columns for domain-specific ranking:

Query
SELECT id, title, BM25(movies_idx.tableoid) AS relevance, runtime,       BM25(movies_idx.tableoid) * LOG(runtime + 1) AS custom_scoreFROM movies_idxWHERE description @@ ts_phrase('alien')ORDER BY custom_score DESC, id;
Result
 id | title                         | relevance | runtime | custom_score----+-------------------------------+-----------+---------+-------------------  7 | Star Trek: The Motion Picture | 3.2757118 |     132 | 6.957125828299511

Dictionary requirements

To use scoring functions, your dictionary must have FREQUENCY = true:

Query
CREATE TEXT SEARCH DICTIONARY ranking_dict (    template = 'text',    locale = 'en_US.UTF-8',    case = 'lower',    stemming = true,    accent = false,    frequency = true,    position = true);

The FREQUENCY flag stores term frequency data in the index, which BM25 and TF-IDF need for scoring.

See also