Skip to main content

text

Tokenizes text into words with stemming, stopwords and accent handling. The most commonly used template.

Options

OptionTypeDefaultDescription
LOCALEstringICU locale (e.g., 'en_US.UTF-8', 'fr', 'de')
CASEstring'none'Case conversion: 'none', 'lower', 'upper'
STEMMINGbooleantrueApply word stemming
ACCENTbooleantruePreserve accent marks
STOPWORDSstring listInline stop words (e.g., '"the","a","an"')
STOPWORDSPATHstringPath to a stopwords file
MINGRAMinteger2Edge n-gram minimum length
MAXGRAMinteger3Edge n-gram maximum length
PRESERVEORIGINALbooleanfalseEmit original token alongside n-grams

Examples

Basic English dictionary

CREATE TEXT SEARCH DICTIONARY english_dict (
TEMPLATE = 'text',
LOCALE = 'en_US.UTF-8',
CASE = 'lower',
STEMMING = true,
ACCENT = true,
FREQUENCY = true,
POSITION = true
);

No stemming, case-sensitive

CREATE TEXT SEARCH DICTIONARY exact_dict (
TEMPLATE = 'text',
LOCALE = 'en_US.UTF-8',
CASE = 'none',
STEMMING = false,
ACCENT = false,
FREQUENCY = true,
POSITION = true
);

With edge n-grams for autocomplete

CREATE TEXT SEARCH DICTIONARY autocomplete_dict (
TEMPLATE = 'text',
LOCALE = 'en_US.UTF-8',
CASE = 'lower',
MINGRAM = 2,
MAXGRAM = 5,
PRESERVEORIGINAL = true
);

With inline stopwords

CREATE TEXT SEARCH DICTIONARY filtered_dict (
TEMPLATE = 'text',
LOCALE = 'en_US.UTF-8',
CASE = 'lower',
STEMMING = true,
STOPWORDS = '"the","a","an","is","at"'
);

See also