Skip to main content

Highlighting

Highlighting marks the query terms inside a matching document's text so a search UI can show why a result matched — the bold terms in a results list or a short "keyword in context" snippet (passage) drawn from a long field. It is a presentation step applied to the rows a query already matched: it does not change which rows match or how they rank.

Highlighting Functions

SereneDB highlights matches in two steps: ts_offsets locates the matched tokens, then ts_highlight wraps them in markup. Used together they produce search-result snippets. For the common case ts_highlight(column) does both at once.

FunctionDescription
ts_highlight(text, offsets [, options])Wrap the tokens at the given offsets in markup.
ts_highlight(column [, options])Sugar form — derive offsets from the FTS predicate automatically.
ts_offsets(column [, limit])Byte offsets of the matched tokens for an indexed column.
ts_offsets(dictionary, text, query)Compute offsets on the fly for arbitrary text.

ts_highlight(text, offsets [, options])

Wraps the tokens of text located at offsets in markup. offsets is an INTEGER[] of interleaved start, end byte pairs — typically the output of ts_offsets. By default each match is wrapped in <b>...</b>:

ParameterTypeDescription
textVARCHARThe document text to render. NULL text yields NULL.
offsetsINTEGER[]Interleaved start, end byte pairs, sorted ascending by start. Must have an even length; NULL yields NULL.
optionsVARCHAROptional comma-separated key = value string. Must be a constant — per-row option expressions are rejected at bind time.
Query
SELECT ts_highlight('the quick brown fox', [4, 9]) AS snippet;
Result
 snippet---------------------------- the <b>quick</b> brown fox

The optional third argument is a string of comma-separated key = value options:

OptionDefaultDescription
StartSel<b>Markup inserted before each matched range.
StopSel</b>Markup inserted after each matched range.
HighlightAllfalseHighlight the whole text verbatim, skipping passage selection.
MaxWords35Hard cap on tokens per fragment. Must be a positive integer.
MaxFragments00 returns the single best-scoring fragment; a value > 0 returns up to that many top-scored fragments.
FragmentDelimiter...String used to join multiple fragments.
MaxOffsets0Caps how many offset pairs are considered per document (0 means unlimited).

By default (HighlightAll = false) SereneDB picks the best-scoring passage around the matches and clips it to MaxWords tokens. StartSel and StopSel override the surrounding markup — here, <mark> tags for a custom search UI:

Query
SELECT ts_highlight('the quick brown fox', [4, 9], 'StartSel=<mark>, StopSel=</mark>') AS snippet;
Result
 snippet---------------------------------- the <mark>quick</mark> brown fox

How it works.

  • Passage selection vs. whole text. With HighlightAll = false the function returns the single best-scoring window of up to MaxWords tokens around the matches — ideal for a long body field where you want a focused snippet. Set HighlightAll = true to wrap every match in the full, unclipped text instead:

    Query
    SELECT ts_highlight(    'First sentence here. The quick brown fox is in this sentence. Third one.',    [25, 30],    'HighlightAll=true') AS snippet;
    Result
     snippet--------------------------------------------------------------------------------- First sentence here. The <b>quick</b> brown fox is in this sentence. Third one.
  • Multiple fragments. Raising MaxFragments returns several top-scored passages joined by FragmentDelimiter (" ... " by default) — useful when matches are scattered through a document. Adjacent matched tokens merge into one wrapped span, and trailing sentence punctuation is omitted (the analyzer does not tokenize it):

    Query
    SELECT ts_highlight(    'A quick fox runs. Slow turtle naps. Another quick fox.',    [2, 7, 8, 11, 44, 49, 50, 53],    'MaxFragments=2') AS snippet;
    Result
     snippet------------------------------------------------------ A <b>quick fox</b> runs ... Another <b>quick fox</b>
  • Validation. offsets must have an even length and be sorted ascending by start, and each start must be ≤ end and within the document — malformed pairs are rejected. Unknown option keys raise an error with a "did you mean" suggestion, and options must be a constant expression.

ts_highlight(column [, options]) — virtual-column form

In the common case — highlighting the same column you searched — pass the indexed column directly and SereneDB synthesizes the offsets for you from the query predicate. This is exactly equivalent to ts_highlight(column, ts_offsets(column) [, options]) but spares you spelling out the offset call:

Query
SELECT id, ts_highlight(body, ts_offsets(body)) AS snippetFROM passages_idxWHERE body @@ ts_phrase('quick')ORDER BY id;
Result
 id | snippet----+----------------------------------------------------  1 | the <b>quick</b> brown fox jumps over the lazy dog  2 | a <b>quick</b> red fox runs fast  3 | The <b>quick</b> brown fox is in this sentence

The sugar form takes the same options string. It requires an inverted-index scan with an FTS predicate on the column (WHERE body @@ ...); calling it on a plain scan or on a literal raises an error.

ts_offsets(column [, limit])

Returns the byte offsets of the matched tokens as interleaved start, end pairs — the building block for ts_highlight. The optional limit caps the number of pairs returned per row.

ParameterTypeDescription
columnindexed columnThe FTS-indexed column whose matches to locate (column-reference form, inside an index scan).
limitINTEGEROptional cap on pairs returned per row.
dictionary, text, queryVARCHAR, VARCHAR, TSQUERYStandalone form: compute offsets for arbitrary text against query using the analyzer of dictionary.
Query
SELECT id, ts_offsets(docs_idx.body) AS offsets FROM docs_idx WHERE body @@ ts_phrase('fox') ORDER BY id;
Result
 id | offsets----+---------  1 | {12,15}  2 | {10,13}

The standalone three-argument form computes offsets for arbitrary text — no index scan required — by analyzing text with a named dictionary and matching it against query:

Query
SELECT ts_offsets('passages_en', 'the quick brown fox', 'quick'::TSQUERY) AS offsets;
Result
 offsets--------- {4,9}
Stored vs. on-the-fly offsets — the offset feature flag

ts_offsets can source offsets two ways. When the column's text search dictionary is created with offset = true (which in turn requires position = true), the byte offsets are stored in the inverted index and read back directly — fast at query time, at the cost of a larger index. Otherwise SereneDB derives the offsets on the fly, re-analyzing the matched text for each result — no extra storage, but more work per query. The standalone form ts_offsets(dictionary, text, query) always computes offsets on the fly.

Coming from Elasticsearch

Elasticsearch / OpenSearch highlighting options map onto SereneDB's ts_highlight as follows:

Elasticsearch / OpenSearchSereneDB
pre_tags / post_tagsStartSel / StopSel
number_of_fragments (1)ts_highlight (default, MaxFragments = 0)
number_of_fragments (> 1)MaxFragments > 0 (joined by FragmentDelimiter)
fragment_size (characters)MaxWords (tokens, not characters)
number_of_fragments = 0 (whole field)HighlightAll = true
term offsetsts_offsets
index_options: offsets / term vectorsdictionary offset = true

Notable differences. SereneDB caps fragments by token count (MaxWords), whereas Elasticsearch caps by characters (fragment_size). It returns the top-scoring fragments only, with no equivalent of:

Elasticsearch / OpenSearchSereneDB
per-fragment order / paginationnone (top fragments only)
highlight_query (highlight a different query)none (highlights the search predicate)
highlighter types unified / fvh / plainsingle highlighter

See also