text
The text template is the general-purpose word tokenizer and the one to reach for first for natural-language search. It splits input into words on Unicode boundaries and then, under the control of an ICU LOCALE, optionally folds case, strips accent marks, applies Snowball stemming and removes stop words.
Stemming maps inflected forms to a common root — running, runs and ran all index as run — so a query matches a document even when the surface forms differ. Because the same dictionary analyzes both the indexed text and the query, the search term is reduced the same way, so the two always meet. Stop words can be supplied inline with STOPWORDS or loaded from a file with STOPWORDSPATH, and accent folding (ACCENT) lets café match cafe.
Enable the FREQUENCY and POSITION feature flags on the indexed column when you need relevance ranking or phrase and proximity search, respectively.
Options
| Option | Type | Default | Description |
|---|---|---|---|
LOCALE | string | — | ICU locale (e.g., 'en_US.UTF-8', 'fr', 'de') |
CASE | string | 'none' | Case conversion: 'none', 'lower', 'upper' |
STEMMING | boolean | true | Apply word stemming |
ACCENT | boolean | true | Preserve accent marks |
STOPWORDS | string list | — | Inline stop words (e.g., '"the","a","an"') |
STOPWORDSPATH | string | — | Path to a stopwords file |
MINGRAM | integer | 2 | Edge n-gram minimum length |
MAXGRAM | integer | 3 | Edge n-gram maximum length |
PRESERVEORIGINAL | boolean | false | Emit original token alongside n-grams |
Tokenization
The text template splits input on Unicode word boundaries, then applies the normalization steps you enable: case folding (CASE), accent folding (ACCENT = false), Snowball stemming (STEMMING) and stop-word removal (STOPWORDS). With stemming on, inflected forms collapse to a shared root so a query meets a document even when the surface forms differ. Setting MINGRAM/MAXGRAM adds edge n-grams — prefix-anchored fragments of each word — which is what powers as-you-type autocomplete.
| Input | Options | Tokens |
|---|---|---|
The runners were running quickly | CASE = 'lower', STEMMING = true | {the,runner,were,run,quick} |
The Runners Café | CASE = 'none', STEMMING = false, ACCENT = true | {The,Runners,Café} |
The cat is a hunter | STOPWORDS = '"the","a","an","is"' | {cat,hunter} |
Search | MINGRAM = 2, MAXGRAM = 4, PRESERVEORIGINAL = true | {se,sea,sear,search} |
Stemming reduces runners to runner and running to run, so both index under a shared root; quickly becomes quick. Note that stop words are only removed when STOPWORDS is set — by default common words like the are kept. Use ts_lexize to preview the exact token stream for any configuration:
CREATE TEXT SEARCH DICTIONARY tok_text_stem ( template = 'text', locale = 'en_US.UTF-8', case = 'lower', stemming = true, accent = true);
SELECT ts_lexize('tok_text_stem', 'The runners were running quickly'); ts_lexize----------------------------- {the,runner,were,run,quick}With CASE = 'none' and STEMMING = false the words keep their original form and casing, and accent marks survive because ACCENT = true:
CREATE TEXT SEARCH DICTIONARY tok_text_exact ( template = 'text', locale = 'en_US.UTF-8', case = 'none', stemming = false, accent = true);
SELECT ts_lexize('tok_text_exact', 'The Runners Café'); ts_lexize-------------------- {The,Runners,Café}Supplying STOPWORDS drops the listed words from the stream:
CREATE TEXT SEARCH DICTIONARY tok_text_stop ( template = 'text', locale = 'en_US.UTF-8', case = 'lower', stemming = true, stopwords = '"the","a","an","is"');
SELECT ts_lexize('tok_text_stop', 'The cat is a hunter'); ts_lexize-------------- {cat,hunter}Setting MINGRAM/MAXGRAM emits prefix-anchored edge n-grams of each word, so a partial query like sea matches Search — the basis for autocomplete:
CREATE TEXT SEARCH DICTIONARY tok_text_edge ( template = 'text', locale = 'en_US.UTF-8', case = 'lower', mingram = 2, maxgram = 4, preserveoriginal = true);
SELECT ts_lexize('tok_text_edge', 'Search'); ts_lexize---------------------- {se,sea,sear,search}Examples
Basic English dictionary
CREATE TEXT SEARCH DICTIONARY english_dict ( template = 'text', locale = 'en_US.UTF-8', case = 'lower', stemming = true, accent = true, frequency = true, position = true);No stemming, case-sensitive
CREATE TEXT SEARCH DICTIONARY exact_dict ( template = 'text', locale = 'en_US.UTF-8', case = 'none', stemming = false, accent = false, frequency = true, position = true);With edge n-grams for autocomplete
CREATE TEXT SEARCH DICTIONARY autocomplete_dict ( template = 'text', locale = 'en_US.UTF-8', case = 'lower', mingram = 2, maxgram = 5, PRESERVEORIGINAL = true);With inline stopwords
CREATE TEXT SEARCH DICTIONARY filtered_dict ( template = 'text', locale = 'en_US.UTF-8', case = 'lower', stemming = true, stopwords = '"the","a","an","is","at"');See also
keyword— emit the whole input as one verbatim tokenngram— character n-grams for fuzzy and substring matchingstem— stemming only, without word splitting- CREATE TEXT SEARCH DICTIONARY
- CREATE INDEX