Highlighting
Highlighting marks the query terms inside a matching document's text so a search UI can show why a result matched — the bold terms in a results list or a short "keyword in context" snippet (passage) drawn from a long field. It is a presentation step applied to the rows a query already matched: it does not change which rows match or how they rank.
Highlighting Functions
SereneDB highlights matches in two steps: ts_offsets locates the matched tokens, then ts_highlight wraps them in markup. Used together they produce search-result snippets. For the common case ts_highlight(column) does both at once.
| Function | Description |
|---|---|
ts_highlight(text, offsets [, options]) | Wrap the tokens at the given offsets in markup. |
ts_highlight(column [, options]) | Sugar form — derive offsets from the FTS predicate automatically. |
ts_offsets(column [, limit]) | Byte offsets of the matched tokens for an indexed column. |
ts_offsets(dictionary, text, query) | Compute offsets on the fly for arbitrary text. |
ts_highlight(text, offsets [, options])
Wraps the tokens of text located at offsets in markup. offsets is an INTEGER[] of interleaved start, end byte pairs — typically the output of ts_offsets. By default each match is wrapped in <b>...</b>:
| Parameter | Type | Description |
|---|---|---|
text | VARCHAR | The document text to render. NULL text yields NULL. |
offsets | INTEGER[] | Interleaved start, end byte pairs, sorted ascending by start. Must have an even length; NULL yields NULL. |
options | VARCHAR | Optional comma-separated key = value string. Must be a constant — per-row option expressions are rejected at bind time. |
SELECT ts_highlight('the quick brown fox', [4, 9]) AS snippet; snippet---------------------------- the <b>quick</b> brown foxThe optional third argument is a string of comma-separated key = value options:
| Option | Default | Description |
|---|---|---|
StartSel | <b> | Markup inserted before each matched range. |
StopSel | </b> | Markup inserted after each matched range. |
HighlightAll | false | Highlight the whole text verbatim, skipping passage selection. |
MaxWords | 35 | Hard cap on tokens per fragment. Must be a positive integer. |
MaxFragments | 0 | 0 returns the single best-scoring fragment; a value > 0 returns up to that many top-scored fragments. |
FragmentDelimiter | ... | String used to join multiple fragments. |
MaxOffsets | 0 | Caps how many offset pairs are considered per document (0 means unlimited). |
By default (HighlightAll = false) SereneDB picks the best-scoring passage around the matches and clips it to MaxWords tokens. StartSel and StopSel override the surrounding markup — here, <mark> tags for a custom search UI:
SELECT ts_highlight('the quick brown fox', [4, 9], 'StartSel=<mark>, StopSel=</mark>') AS snippet; snippet---------------------------------- the <mark>quick</mark> brown foxHow it works.
-
Passage selection vs. whole text. With
HighlightAll = falsethe function returns the single best-scoring window of up toMaxWordstokens around the matches — ideal for a long body field where you want a focused snippet. SetHighlightAll = trueto wrap every match in the full, unclipped text instead:QuerySELECT ts_highlight( 'First sentence here. The quick brown fox is in this sentence. Third one.', [25, 30], 'HighlightAll=true') AS snippet;Resultsnippet--------------------------------------------------------------------------------- First sentence here. The <b>quick</b> brown fox is in this sentence. Third one. -
Multiple fragments. Raising
MaxFragmentsreturns several top-scored passages joined byFragmentDelimiter(" ... "by default) — useful when matches are scattered through a document. Adjacent matched tokens merge into one wrapped span, and trailing sentence punctuation is omitted (the analyzer does not tokenize it):QuerySELECT ts_highlight( 'A quick fox runs. Slow turtle naps. Another quick fox.', [2, 7, 8, 11, 44, 49, 50, 53], 'MaxFragments=2') AS snippet;Resultsnippet------------------------------------------------------ A <b>quick fox</b> runs ... Another <b>quick fox</b> -
Validation.
offsetsmust have an even length and be sorted ascending bystart, and eachstartmust be≤ endand within the document — malformed pairs are rejected. Unknown option keys raise an error with a "did you mean" suggestion, andoptionsmust be a constant expression.
ts_highlight(column [, options]) — virtual-column form
In the common case — highlighting the same column you searched — pass the indexed column directly and SereneDB synthesizes the offsets for you from the query predicate. This is exactly equivalent to ts_highlight(column, ts_offsets(column) [, options]) but spares you spelling out the offset call:
SELECT id, ts_highlight(body, ts_offsets(body)) AS snippetFROM passages_idxWHERE body @@ ts_phrase('quick')ORDER BY id; id | snippet----+---------------------------------------------------- 1 | the <b>quick</b> brown fox jumps over the lazy dog 2 | a <b>quick</b> red fox runs fast 3 | The <b>quick</b> brown fox is in this sentenceThe sugar form takes the same options string. It requires an inverted-index scan with an FTS predicate on the column (WHERE body @@ ...); calling it on a plain scan or on a literal raises an error.
ts_offsets(column [, limit])
Returns the byte offsets of the matched tokens as interleaved start, end pairs — the building block for ts_highlight. The optional limit caps the number of pairs returned per row.
| Parameter | Type | Description |
|---|---|---|
column | indexed column | The FTS-indexed column whose matches to locate (column-reference form, inside an index scan). |
limit | INTEGER | Optional cap on pairs returned per row. |
dictionary, text, query | VARCHAR, VARCHAR, TSQUERY | Standalone form: compute offsets for arbitrary text against query using the analyzer of dictionary. |
SELECT id, ts_offsets(docs_idx.body) AS offsets FROM docs_idx WHERE body @@ ts_phrase('fox') ORDER BY id; id | offsets----+--------- 1 | {12,15} 2 | {10,13}The standalone three-argument form computes offsets for arbitrary text — no index scan required — by analyzing text with a named dictionary and matching it against query:
SELECT ts_offsets('passages_en', 'the quick brown fox', 'quick'::TSQUERY) AS offsets; offsets--------- {4,9}offset feature flagts_offsets can source offsets two ways. When the column's text search dictionary is created with offset = true (which in turn requires position = true), the byte offsets are stored in the inverted index and read back directly — fast at query time, at the cost of a larger index. Otherwise SereneDB derives the offsets on the fly, re-analyzing the matched text for each result — no extra storage, but more work per query. The standalone form ts_offsets(dictionary, text, query) always computes offsets on the fly.
Coming from Elasticsearch
Elasticsearch / OpenSearch highlighting options map onto SereneDB's ts_highlight as follows:
| Elasticsearch / OpenSearch | SereneDB |
|---|---|
pre_tags / post_tags | StartSel / StopSel |
number_of_fragments (1) | ts_highlight (default, MaxFragments = 0) |
number_of_fragments (> 1) | MaxFragments > 0 (joined by FragmentDelimiter) |
fragment_size (characters) | MaxWords (tokens, not characters) |
number_of_fragments = 0 (whole field) | HighlightAll = true |
| term offsets | ts_offsets |
index_options: offsets / term vectors | dictionary offset = true |
Notable differences. SereneDB caps fragments by token count (MaxWords), whereas Elasticsearch caps by characters (fragment_size). It returns the top-scoring fragments only, with no equivalent of:
| Elasticsearch / OpenSearch | SereneDB |
|---|---|
per-fragment order / pagination | none (top fragments only) |
highlight_query (highlight a different query) | none (highlights the search predicate) |
highlighter types unified / fvh / plain | single highlighter |
See also
- Full-Text Search Functions — operators, constructors, parsers
- Inverted Index · Highlighting guide
- CREATE TEXT SEARCH DICTIONARY — the
offsetandpositionflags