Skip to main content

stem

The stem template reduces each input token to its morphological root with the Snowball stemmer for the configured LOCALE, and does nothing else — it does not split text into words. Stemming lets different inflections of a word match one another: connection, connected and connecting all collapse to connect, so a query for one form retrieves documents written with another.

On its own stem treats its whole input as a single token, so it is meant to receive pre-tokenized input. Place it as a stage inside a pipeline, after a word splitter such as text or delimiter. Reach for it when you want to control where stemming happens in a custom pipeline rather than the all-in-one behavior of text, which already stems.

Options

OptionTypeDefaultDescription
LOCALEstringICU locale (e.g., 'en', 'de') — selects the Snowball language

Tokenization

The template stems whatever token it receives. Applied to a single word it returns that word's root. When the input is several words separated by spaces, stem sees them as one token and leaves it unchanged — which is why it is normally fed pre-tokenized input from a pipeline.

InputLOCALEOutput tokens
runningenrun
connectionsenconnect
running runners ranenrunning runners ran (single untouched token)
Query
CREATE TEXT SEARCH DICTIONARY stem_dict (    template = 'stem',    locale = 'en');
SELECT ts_lexize('stem_dict', 'running');
Result
 ts_lexize----------- {run}

Stemming each word of a phrase

To stem every word in a phrase, split first and stem second. A pipeline that runs delimiter then stem reduces each token independently:

InputPipelineOutput tokens
running runners randelimiter (space) → stem (en)run, runner, ran
Query
CREATE TEXT SEARCH DICTIONARY stem_dict_words (    template = 'pipeline',    step1_template = 'delimiter',    step1_delimiter = ' ',    step2_template = 'stem',    step2_locale = 'en');
SELECT ts_lexize('stem_dict_words', 'running runners ran');
Result
 ts_lexize------------------ {run,runner,ran}

See also