pipeline
The pipeline template composes several analyzers into one dictionary, feeding the output of each step as the input to the next. This builds behavior no single template offers — for example, split a field on a delimiter and then apply full text analysis (case folding, stemming, stopwords) to each resulting piece.
Each step names its own template and options, with every option prefixed by the step's position, numbered from 1: STEP1_TEMPLATE and its STEP1_… options, then STEP2_TEMPLATE and so on. Steps run strictly in order, so a tokenizer that splits text must come before filters like stopwords or stem that refine the tokens it produces. Where union runs members in parallel and merges their output, pipeline chains them so each step transforms the previous step's tokens.
Options
| Option | Type | Default | Description |
|---|---|---|---|
STEP⟨N⟩_TEMPLATE | string | required | Template of the Nth step (numbered from 1) |
STEP⟨N⟩_* | — | — | Options for the Nth step, prefixed with STEP⟨N⟩_ |
A step may itself be a pipeline; nest by chaining the prefixes, e.g. STEP2_STEP1_TEMPLATE.
Tokenization
Each step consumes the tokens emitted by the one before it. A first step of delimiter on , splits RED,Green,BLUE into three tokens, then a norm second step lowercases each, giving {red,green,blue}. Swap the second step for text with stemming and the same split feeds a stemmer, so Cats,RUNNING becomes {cat,run} — a split-then-analyze behavior no single template provides.
| Input | Steps | Tokens |
|---|---|---|
RED,Green,BLUE | delimiter (,) → norm (CASE = 'lower') | {red,green,blue} |
Cats,RUNNING | delimiter (,) → text (CASE = 'lower', STEMMING = true) | {cat,run} |
Split on commas, then lowercase each piece:
CREATE TEXT SEARCH DICTIONARY pipe_delim_norm ( template = 'pipeline', -- step 1 splits on commas, step 2 lowercases each piece STEP1_TEMPLATE = 'delimiter', STEP1_DELIMITER = ',', STEP2_TEMPLATE = 'norm', STEP2_LOCALE = 'en_US.UTF-8', STEP2_CASE = 'lower');
SELECT ts_lexize('pipe_delim_norm', 'RED,Green,BLUE'); ts_lexize------------------ {red,green,blue}Replace the second step with text analysis so each piece is also stemmed:
CREATE TEXT SEARCH DICTIONARY pipe_delim_stem ( template = 'pipeline', -- step 1 splits on commas, step 2 lowercases and stems each piece STEP1_TEMPLATE = 'delimiter', STEP1_DELIMITER = ',', STEP2_TEMPLATE = 'text', STEP2_LOCALE = 'en_US.UTF-8', STEP2_CASE = 'lower', STEP2_STEMMING = true);
SELECT ts_lexize('pipe_delim_stem', 'Cats,RUNNING'); ts_lexize----------- {cat,run}Examples
Delimiter then text analysis
CREATE TEXT SEARCH DICTIONARY pipe_dict ( template = 'pipeline', STEP1_TEMPLATE = 'delimiter', STEP1_DELIMITER = ',', STEP2_TEMPLATE = 'text', STEP2_LOCALE = 'en_US.UTF-8', STEP2_CASE = 'lower', STEP2_STEMMING = true);Three-step pipeline with stopwords
CREATE TEXT SEARCH DICTIONARY advanced_pipe ( template = 'pipeline', STEP1_TEMPLATE = 'delimiter', STEP1_DELIMITER = 'A', STEP2_TEMPLATE = 'text', STEP2_LOCALE = 'en_US.UTF-8', STEP2_CASE = 'lower', STEP2_ACCENT = false, STEP2_STEMMING = true, STEP2_STOPWORDS = '"fox"', STEP3_TEMPLATE = 'norm', STEP3_LOCALE = 'en_US.UTF-8', STEP3_CASE = 'upper');N-grams then normalization
CREATE TEXT SEARCH DICTIONARY ngram_norm ( template = 'pipeline', STEP1_TEMPLATE = 'ngram', STEP1_MINGRAM = 2, STEP1_MAXGRAM = 3, STEP2_TEMPLATE = 'norm', STEP2_LOCALE = 'en_US.UTF-8', STEP2_CASE = 'lower');Nested pipeline
A pipeline step can itself be a pipeline. Nest with STEP⟨N⟩_STEP⟨M⟩_:
CREATE TEXT SEARCH DICTIONARY nested_pipe ( template = 'pipeline', STEP1_TEMPLATE = 'norm', STEP1_LOCALE = 'en_US.UTF-8', STEP1_CASE = 'lower', STEP2_TEMPLATE = 'pipeline', STEP2_STEP1_TEMPLATE = 'delimiter', STEP2_STEP1_DELIMITER = '|', STEP2_STEP2_TEMPLATE = 'norm', STEP2_STEP2_LOCALE = 'en_US.UTF-8', STEP2_STEP2_CASE = 'upper', STEP3_TEMPLATE = 'stopwords', STEP3_STOPWORDS = '"A"');See also
union— run analyzers in parallel and merge their tokensminhash— another composition template- CREATE TEXT SEARCH DICTIONARY