Skip to main content

pipeline

The pipeline template composes several analyzers into one dictionary, feeding the output of each step as the input to the next. This builds behavior no single template offers — for example, split a field on a delimiter and then apply full text analysis (case folding, stemming, stopwords) to each resulting piece.

Each step names its own template and options, with every option prefixed by the step's position, numbered from 1: STEP1_TEMPLATE and its STEP1_… options, then STEP2_TEMPLATE and so on. Steps run strictly in order, so a tokenizer that splits text must come before filters like stopwords or stem that refine the tokens it produces. Where union runs members in parallel and merges their output, pipeline chains them so each step transforms the previous step's tokens.

Options

OptionTypeDefaultDescription
STEP⟨N⟩_TEMPLATEstringrequiredTemplate of the Nth step (numbered from 1)
STEP⟨N⟩_*Options for the Nth step, prefixed with STEP⟨N⟩_

A step may itself be a pipeline; nest by chaining the prefixes, e.g. STEP2_STEP1_TEMPLATE.

Tokenization

Each step consumes the tokens emitted by the one before it. A first step of delimiter on , splits RED,Green,BLUE into three tokens, then a norm second step lowercases each, giving {red,green,blue}. Swap the second step for text with stemming and the same split feeds a stemmer, so Cats,RUNNING becomes {cat,run} — a split-then-analyze behavior no single template provides.

InputStepsTokens
RED,Green,BLUEdelimiter (,) → norm (CASE = 'lower'){red,green,blue}
Cats,RUNNINGdelimiter (,) → text (CASE = 'lower', STEMMING = true){cat,run}

Split on commas, then lowercase each piece:

Query
CREATE TEXT SEARCH DICTIONARY pipe_delim_norm (    template = 'pipeline',    -- step 1 splits on commas, step 2 lowercases each piece    STEP1_TEMPLATE = 'delimiter',    STEP1_DELIMITER = ',',    STEP2_TEMPLATE = 'norm',    STEP2_LOCALE = 'en_US.UTF-8',    STEP2_CASE = 'lower');
SELECT ts_lexize('pipe_delim_norm', 'RED,Green,BLUE');
Result
 ts_lexize------------------ {red,green,blue}

Replace the second step with text analysis so each piece is also stemmed:

Query
CREATE TEXT SEARCH DICTIONARY pipe_delim_stem (    template = 'pipeline',    -- step 1 splits on commas, step 2 lowercases and stems each piece    STEP1_TEMPLATE = 'delimiter',    STEP1_DELIMITER = ',',    STEP2_TEMPLATE = 'text',    STEP2_LOCALE = 'en_US.UTF-8',    STEP2_CASE = 'lower',    STEP2_STEMMING = true);
SELECT ts_lexize('pipe_delim_stem', 'Cats,RUNNING');
Result
 ts_lexize----------- {cat,run}

Examples

Delimiter then text analysis

Query
CREATE TEXT SEARCH DICTIONARY pipe_dict (    template = 'pipeline',    STEP1_TEMPLATE = 'delimiter',    STEP1_DELIMITER = ',',    STEP2_TEMPLATE = 'text',    STEP2_LOCALE = 'en_US.UTF-8',    STEP2_CASE = 'lower',    STEP2_STEMMING = true);

Three-step pipeline with stopwords

Query
CREATE TEXT SEARCH DICTIONARY advanced_pipe (    template = 'pipeline',    STEP1_TEMPLATE = 'delimiter',    STEP1_DELIMITER = 'A',    STEP2_TEMPLATE = 'text',    STEP2_LOCALE = 'en_US.UTF-8',    STEP2_CASE = 'lower',    STEP2_ACCENT = false,    STEP2_STEMMING = true,    STEP2_STOPWORDS = '"fox"',    STEP3_TEMPLATE = 'norm',    STEP3_LOCALE = 'en_US.UTF-8',    STEP3_CASE = 'upper');

N-grams then normalization

Query
CREATE TEXT SEARCH DICTIONARY ngram_norm (    template = 'pipeline',    STEP1_TEMPLATE = 'ngram',    STEP1_MINGRAM = 2,    STEP1_MAXGRAM = 3,    STEP2_TEMPLATE = 'norm',    STEP2_LOCALE = 'en_US.UTF-8',    STEP2_CASE = 'lower');

Nested pipeline

A pipeline step can itself be a pipeline. Nest with STEP⟨N⟩_STEP⟨M⟩_:

Query
CREATE TEXT SEARCH DICTIONARY nested_pipe (    template = 'pipeline',    STEP1_TEMPLATE = 'norm',    STEP1_LOCALE = 'en_US.UTF-8',    STEP1_CASE = 'lower',    STEP2_TEMPLATE = 'pipeline',    STEP2_STEP1_TEMPLATE = 'delimiter',    STEP2_STEP1_DELIMITER = '|',    STEP2_STEP2_TEMPLATE = 'norm',    STEP2_STEP2_LOCALE = 'en_US.UTF-8',    STEP2_STEP2_CASE = 'upper',    STEP3_TEMPLATE = 'stopwords',    STEP3_STOPWORDS = '"A"');

See also

Syntax