union
The union template runs several independent sub-tokenizers over the same input and merges their tokens into one stream. Use it when a column needs to be searchable in more than one way at once — for example as a whole keyword and as character n-grams — without maintaining separate indexes.
Each member is configured with a TOKENIZER⟨N⟩_ prefix, numbered densely from 1: TOKENIZER1_TEMPLATE selects the first sub-tokenizer and its TOKENIZER1_* options configure it, TOKENIZER2_TEMPLATE the second, and so on. At least one member is required. Where pipeline feeds one analyzer's output into the next, union runs them in parallel over the original input and combines the results.
Options
| Option | Type | Default | Description |
|---|---|---|---|
TOKENIZER⟨N⟩_TEMPLATE | string | required | Template of the Nth sub-tokenizer (numbered densely from 1) |
TOKENIZER⟨N⟩_* | — | — | Options for the Nth sub-tokenizer, prefixed with TOKENIZER⟨N⟩_ |
Tokenization
Every member analyzes the original input, and their outputs are pooled into a single token set. Pairing keyword (which keeps the value verbatim) with a 2-gram ngram member makes abcd searchable both as the exact term and by any of its bigrams. Pairing a delimiter member with keyword indexes hello world both as its individual words and as the whole phrase, so exact-phrase and per-word queries both hit.
| Input | Members | Tokens |
|---|---|---|
abcd | keyword + ngram (MINGRAM = MAXGRAM = 2) | {abcd,ab,bc,cd} |
hello world | delimiter (' ') + keyword | {hello,"hello world",world} |
Index each value both verbatim and as 2-grams:
CREATE TEXT SEARCH DICTIONARY union_dict ( template = 'union', -- member 1 keeps the value verbatim, member 2 emits 2-grams TOKENIZER1_TEMPLATE = 'keyword', TOKENIZER2_TEMPLATE = 'ngram', TOKENIZER2_MINGRAM = 2, TOKENIZER2_MAXGRAM = 2);
SELECT ts_lexize('union_dict', 'abcd'); ts_lexize----------------- {abcd,ab,bc,cd}Index text both as individual words and as the whole phrase:
CREATE TEXT SEARCH DICTIONARY union_word_phrase ( template = 'union', -- member 1 splits into words, member 2 keeps the whole phrase TOKENIZER1_TEMPLATE = 'delimiter', TOKENIZER1_DELIMITER = ' ', TOKENIZER2_TEMPLATE = 'keyword');
SELECT ts_lexize('union_word_phrase', 'hello world'); ts_lexize----------------------------- {hello,"hello world",world}See also
pipeline— chain analyzers in sequence (vs. union's parallel merge)minhash— composition template that emits similarity signatures- CREATE TEXT SEARCH DICTIONARY