Edit this page

stem

The stem template reduces each input token to its morphological root with the Snowball stemmer for the configured LOCALE, and does nothing else — it does not split text into words. Stemming lets different inflections of a word match one another: connection, connected and connecting all collapse to connect, so a query for one form retrieves documents written with another.

On its own stem treats its whole input as a single token, so it is meant to receive pre-tokenized input. Place it as a stage inside a pipeline, after a word splitter such as text or delimiter. Reach for it when you want to control where stemming happens in a custom pipeline rather than the all-in-one behavior of text, which already stems.

Options

Option	Type	Default	Description
`LOCALE`	string	—	ICU locale (e.g., `'en'`, `'de'`) — selects the Snowball language

Tokenization

The template stems whatever token it receives. Applied to a single word it returns that word's root. When the input is several words separated by spaces, stem sees them as one token and leaves it unchanged — which is why it is normally fed pre-tokenized input from a pipeline.

Input	LOCALE	Output tokens
`running`	`en`	`run`
`connections`	`en`	`connect`
`running runners ran`	`en`	`running runners ran` (single untouched token)

Query

CREATE TEXT SEARCH DICTIONARY stem_dict (    template = 'stem',    locale = 'en');
SELECT ts_lexize('stem_dict', 'running');

Result

 ts_lexize----------- {run}

Stemming each word of a phrase

To stem every word in a phrase, split first and stem second. A pipeline that runs delimiter then stem reduces each token independently:

Input	Pipeline	Output tokens
`running runners ran`	`delimiter` (space) → `stem` (`en`)	`run`, `runner`, `ran`

Query

CREATE TEXT SEARCH DICTIONARY stem_dict_words (    template = 'pipeline',    step1_template = 'delimiter',    step1_delimiter = ' ',    step2_template = 'stem',    step2_locale = 'en');
SELECT ts_lexize('stem_dict_words', 'running runners ran');

Result

 ts_lexize------------------ {run,runner,ran}

Options​

Tokenization​

Stemming each word of a phrase​

See also​

Options

Tokenization

Stemming each word of a phrase

See also