Edit this page

pipeline

The pipeline template composes several analyzers into one dictionary, feeding the output of each step as the input to the next. This builds behavior no single template offers — for example, split a field on a delimiter and then apply full text analysis (case folding, stemming, stopwords) to each resulting piece.

Each step names its own template and options, with every option prefixed by the step's position, numbered from 1: STEP1_TEMPLATE and its STEP1_… options, then STEP2_TEMPLATE and so on. Steps run strictly in order, so a tokenizer that splits text must come before filters like stopwords or stem that refine the tokens it produces. Where union runs members in parallel and merges their output, pipeline chains them so each step transforms the previous step's tokens.

Options

Option	Type	Default	Description
`STEP⟨N⟩_TEMPLATE`	string	required	Template of the Nth step (numbered from 1)
`STEP⟨N⟩_*`	—	—	Options for the Nth step, prefixed with `STEP⟨N⟩_`

A step may itself be a pipeline; nest by chaining the prefixes, e.g. STEP2_STEP1_TEMPLATE.

Tokenization

Each step consumes the tokens emitted by the one before it. A first step of delimiter on , splits RED,Green,BLUE into three tokens, then a norm second step lowercases each, giving {red,green,blue}. Swap the second step for text with stemming and the same split feeds a stemmer, so Cats,RUNNING becomes {cat,run} — a split-then-analyze behavior no single template provides.

Input	Steps	Tokens
`RED,Green,BLUE`	`delimiter` (`,`) → `norm` (`CASE = 'lower'`)	`{red,green,blue}`
`Cats,RUNNING`	`delimiter` (`,`) → `text` (`CASE = 'lower'`, `STEMMING = true`)	`{cat,run}`

Split on commas, then lowercase each piece:

Query

CREATE TEXT SEARCH DICTIONARY pipe_delim_norm (    template = 'pipeline',    -- step 1 splits on commas, step 2 lowercases each piece    STEP1_TEMPLATE = 'delimiter',    STEP1_DELIMITER = ',',    STEP2_TEMPLATE = 'norm',    STEP2_LOCALE = 'en_US.UTF-8',    STEP2_CASE = 'lower');
SELECT ts_lexize('pipe_delim_norm', 'RED,Green,BLUE');

Result

 ts_lexize------------------ {red,green,blue}

Replace the second step with text analysis so each piece is also stemmed:

Query

CREATE TEXT SEARCH DICTIONARY pipe_delim_stem (    template = 'pipeline',    -- step 1 splits on commas, step 2 lowercases and stems each piece    STEP1_TEMPLATE = 'delimiter',    STEP1_DELIMITER = ',',    STEP2_TEMPLATE = 'text',    STEP2_LOCALE = 'en_US.UTF-8',    STEP2_CASE = 'lower',    STEP2_STEMMING = true);
SELECT ts_lexize('pipe_delim_stem', 'Cats,RUNNING');

Result

 ts_lexize----------- {cat,run}

Examples

Delimiter then text analysis

Query

CREATE TEXT SEARCH DICTIONARY pipe_dict (    template = 'pipeline',    STEP1_TEMPLATE = 'delimiter',    STEP1_DELIMITER = ',',    STEP2_TEMPLATE = 'text',    STEP2_LOCALE = 'en_US.UTF-8',    STEP2_CASE = 'lower',    STEP2_STEMMING = true);

Three-step pipeline with stopwords

Query

CREATE TEXT SEARCH DICTIONARY advanced_pipe (    template = 'pipeline',    STEP1_TEMPLATE = 'delimiter',    STEP1_DELIMITER = 'A',    STEP2_TEMPLATE = 'text',    STEP2_LOCALE = 'en_US.UTF-8',    STEP2_CASE = 'lower',    STEP2_ACCENT = false,    STEP2_STEMMING = true,    STEP2_STOPWORDS = '"fox"',    STEP3_TEMPLATE = 'norm',    STEP3_LOCALE = 'en_US.UTF-8',    STEP3_CASE = 'upper');

N-grams then normalization

Query

CREATE TEXT SEARCH DICTIONARY ngram_norm (    template = 'pipeline',    STEP1_TEMPLATE = 'ngram',    STEP1_MINGRAM = 2,    STEP1_MAXGRAM = 3,    STEP2_TEMPLATE = 'norm',    STEP2_LOCALE = 'en_US.UTF-8',    STEP2_CASE = 'lower');

Nested pipeline

A pipeline step can itself be a pipeline. Nest with STEP⟨N⟩_STEP⟨M⟩_:

Query

CREATE TEXT SEARCH DICTIONARY nested_pipe (    template = 'pipeline',    STEP1_TEMPLATE = 'norm',    STEP1_LOCALE = 'en_US.UTF-8',    STEP1_CASE = 'lower',    STEP2_TEMPLATE = 'pipeline',    STEP2_STEP1_TEMPLATE = 'delimiter',    STEP2_STEP1_DELIMITER = '|',    STEP2_STEP2_TEMPLATE = 'norm',    STEP2_STEP2_LOCALE = 'en_US.UTF-8',    STEP2_STEP2_CASE = 'upper',    STEP3_TEMPLATE = 'stopwords',    STEP3_STOPWORDS = '"A"');

pipeline

Options

Tokenization

Examples

Delimiter then text analysis

Three-step pipeline with stopwords

N-grams then normalization

Nested pipeline

See also

Syntax

Options​

Tokenization​

Examples​

Delimiter then text analysis​

Three-step pipeline with stopwords​

N-grams then normalization​

Nested pipeline​

See also​

Syntax​

Options

Tokenization

Examples

Delimiter then text analysis

Three-step pipeline with stopwords

N-grams then normalization

Nested pipeline

See also

Syntax