text
Tokenizes text into words with stemming, stopwords and accent handling. The most commonly used template.
Options
| Option | Type | Default | Description |
|---|---|---|---|
LOCALE | string | — | ICU locale (e.g., 'en_US.UTF-8', 'fr', 'de') |
CASE | string | 'none' | Case conversion: 'none', 'lower', 'upper' |
STEMMING | boolean | true | Apply word stemming |
ACCENT | boolean | true | Preserve accent marks |
STOPWORDS | string list | — | Inline stop words (e.g., '"the","a","an"') |
STOPWORDSPATH | string | — | Path to a stopwords file |
MINGRAM | integer | 2 | Edge n-gram minimum length |
MAXGRAM | integer | 3 | Edge n-gram maximum length |
PRESERVEORIGINAL | boolean | false | Emit original token alongside n-grams |
Examples
Basic English dictionary
CREATE TEXT SEARCH DICTIONARY english_dict (
TEMPLATE = 'text',
LOCALE = 'en_US.UTF-8',
CASE = 'lower',
STEMMING = true,
ACCENT = true,
FREQUENCY = true,
POSITION = true
);
No stemming, case-sensitive
CREATE TEXT SEARCH DICTIONARY exact_dict (
TEMPLATE = 'text',
LOCALE = 'en_US.UTF-8',
CASE = 'none',
STEMMING = false,
ACCENT = false,
FREQUENCY = true,
POSITION = true
);
With edge n-grams for autocomplete
CREATE TEXT SEARCH DICTIONARY autocomplete_dict (
TEMPLATE = 'text',
LOCALE = 'en_US.UTF-8',
CASE = 'lower',
MINGRAM = 2,
MAXGRAM = 5,
PRESERVEORIGINAL = true
);
With inline stopwords
CREATE TEXT SEARCH DICTIONARY filtered_dict (
TEMPLATE = 'text',
LOCALE = 'en_US.UTF-8',
CASE = 'lower',
STEMMING = true,
STOPWORDS = '"the","a","an","is","at"'
);