ngram
The ngram template breaks each token into overlapping fixed-length character sequences — n-grams — so searches can match on fragments rather than whole words. With the default MINGRAM of 2 and MAXGRAM of 3, the word search yields se, ea, ar, rc, ch and sea, ear, rch, letting a query find it from a partial or slightly misspelled input. This makes the template a good fit for fuzzy matching, autocomplete and typo-tolerant search.
PRESERVEORIGINAL additionally keeps the whole token alongside its grams, and STARTMARKER/ENDMARKER tag the start and end of the source token so prefixes and suffixes can be distinguished from interior matches. The index grows with the width of the MINGRAM–MAXGRAM range, so keep it as narrow as your matching needs allow.
For substring search over code, logs or identifiers, prefer sparse_ngram, which answers the same fragment queries while keeping the index far more compact.
Options
| Option | Type | Default | Description |
|---|---|---|---|
MINGRAM | integer | 2 | Minimum n-gram length |
MAXGRAM | integer | 3 | Maximum n-gram length |
PRESERVEORIGINAL | boolean | false | Emit original token alongside n-grams |
INPUTTYPE | string | 'utf8' | Input encoding: 'binary', 'utf8' |
STARTMARKER | string | — | Prefix marker at n-gram boundary |
ENDMARKER | string | — | Suffix marker at n-gram boundary |
Tokenization
For each input token the template emits every contiguous character window whose length falls between MINGRAM and MAXGRAM, sliding one character at a time across the whole word. With MINGRAM = 2 and MAXGRAM = 3, search produces every 2- and 3-character window, so a query for any of those fragments finds the word — the basis for fuzzy and typo-tolerant matching. Unlike the edge n-grams of text, these grams are not anchored to the start of the word.
| Input | Options | Tokens |
|---|---|---|
search | MINGRAM = 2, MAXGRAM = 3 | {se,sea,ea,ear,ar,arc,rc,rch,ch} |
search | MINGRAM = 2, MAXGRAM = 3, PRESERVEORIGINAL = true | {se,sea,search,ea,ear,ar,arc,rc,rch,ch} |
cat | MINGRAM = 2, MAXGRAM = 3, STARTMARKER = '^', ENDMARKER = '$' | {^ca,^cat,cat$,at$} |
Preview the gram stream with ts_lexize:
CREATE TEXT SEARCH DICTIONARY tok_ngram ( template = 'ngram', mingram = 2, maxgram = 3);
SELECT ts_lexize('tok_ngram', 'search'); ts_lexize---------------------------------- {se,sea,ea,ear,ar,arc,rc,rch,ch}PRESERVEORIGINAL = true keeps the whole word in the stream alongside its grams, so an exact match still scores:
CREATE TEXT SEARCH DICTIONARY tok_ngram_orig ( template = 'ngram', mingram = 2, maxgram = 3, preserveoriginal = true);
SELECT ts_lexize('tok_ngram_orig', 'search'); ts_lexize----------------------------------------- {se,sea,search,ea,ear,ar,arc,rc,rch,ch}STARTMARKER and ENDMARKER tag only the boundary grams — those at the start of the word carry the start marker and those at the end carry the end marker — so a prefix or suffix query can be distinguished from an interior match:
CREATE TEXT SEARCH DICTIONARY tok_ngram_mark ( template = 'ngram', mingram = 2, maxgram = 3, startmarker = '^', endmarker = '$');
SELECT ts_lexize('tok_ngram_mark', 'cat'); ts_lexize--------------------- {^ca,^cat,cat$,at$}Examples
CREATE TEXT SEARCH DICTIONARY ngram_dict ( template = 'ngram', mingram = 2, maxgram = 3);Unigrams and bigrams
CREATE TEXT SEARCH DICTIONARY unigram_dict ( template = 'ngram', mingram = 1, maxgram = 2);See also
sparse_ngram— variable-length grams for compact substring searchtext— word tokenizer with optional prefix-anchored edge n-gramswildcard— boundary-marked n-grams for wildcard and prefix matching- CREATE TEXT SEARCH DICTIONARY