Edit this page

text

Tokenizes text into words with stemming, stopwords and accent handling. The most commonly used template.

Options

Option	Type	Default	Description
`LOCALE`	string	—	ICU locale (e.g., `'en_US.UTF-8'`, `'fr'`, `'de'`)
`CASE`	string	`'none'`	Case conversion: `'none'`, `'lower'`, `'upper'`
`STEMMING`	boolean	`true`	Apply word stemming
`ACCENT`	boolean	`true`	Preserve accent marks
`STOPWORDS`	string list	—	Inline stop words (e.g., `'"the","a","an"'`)
`STOPWORDSPATH`	string	—	Path to a stopwords file
`MINGRAM`	integer	`2`	Edge n-gram minimum length
`MAXGRAM`	integer	`3`	Edge n-gram maximum length
`PRESERVEORIGINAL`	boolean	`false`	Emit original token alongside n-grams

Examples

Basic English dictionary

CREATE TEXT SEARCH DICTIONARY english_dict (
    TEMPLATE = 'text',
    LOCALE = 'en_US.UTF-8',
    CASE = 'lower',
    STEMMING = true,
    ACCENT = true,
    FREQUENCY = true,
    POSITION = true
);

No stemming, case-sensitive

CREATE TEXT SEARCH DICTIONARY exact_dict (
    TEMPLATE = 'text',
    LOCALE = 'en_US.UTF-8',
    CASE = 'none',
    STEMMING = false,
    ACCENT = false,
    FREQUENCY = true,
    POSITION = true
);

With edge n-grams for autocomplete

CREATE TEXT SEARCH DICTIONARY autocomplete_dict (
    TEMPLATE = 'text',
    LOCALE = 'en_US.UTF-8',
    CASE = 'lower',
    MINGRAM = 2,
    MAXGRAM = 5,
    PRESERVEORIGINAL = true
);

With inline stopwords

CREATE TEXT SEARCH DICTIONARY filtered_dict (
    TEMPLATE = 'text',
    LOCALE = 'en_US.UTF-8',
    CASE = 'lower',
    STEMMING = true,
    STOPWORDS = '"the","a","an","is","at"'
);

Options​

Examples​

Basic English dictionary​

No stemming, case-sensitive​

With edge n-grams for autocomplete​

With inline stopwords​

See also​

Options

Examples

Basic English dictionary

No stemming, case-sensitive

With edge n-grams for autocomplete

With inline stopwords

See also