Skip to main content

Fuzzy Search

Fuzzy search finds approximate matches despite typos, spelling variations or alternative forms. SereneDB supports two approaches, each suited to different use cases.

Similarity measures

Levenshtein distance

Measures the minimum number of single-character edits (insertions, deletions, substitutions) to transform one string into another.

FromToDistanceEdits
galaxygalxy11 deletion
searchserach22 substitutions

Damerau-Levenshtein distance

Extends Levenshtein by also counting transpositions (adjacent character swaps) as a single edit. This is more forgiving for common typos:

FromToLevenshteinDamerau-Levenshtein
galaxyglaaxy21 (transposition)
searchsaerch21 (transposition)

N-gram similarity

Breaks strings into substrings of fixed length (bigrams, trigrams, etc.) and measures how many substrings are shared. Works better for longer strings and partial matches.

Example with bigrams (n=2):

StringBigrams
hellohe, el, ll, lo
helphe, el, lp

Shared bigrams: he, el → similarity = 2/5 = 0.4

When to use which

ApproachBest forTypical use case
LevenshteinShort strings, exact typo correctionUser name search, product codes, tags
N-gramLonger strings, partial matchingAutocomplete, "did you mean?", document titles

Levenshtein matching with ts_levenshtein

Finds terms within a given edit distance. Uses Damerau-Levenshtein by default (transpositions count as one edit).

Setup

Any text dictionary works — stemming should typically be disabled for fuzzy matching:

Query
CREATE TEXT SEARCH DICTIONARY fuzzy_dict (    template = 'text',    locale = 'en_US.UTF-8',    case = 'lower',    stemming = false,    accent = false);
CREATE TABLE products (    id INTEGER PRIMARY KEY,    name VARCHAR);
CREATE INDEX idx_products ON products    USING inverted (id, name fuzzy_dict);
INSERT INTO products VALUES    (1, 'cat'), (2, 'bat'), (3, 'car'),    (4, 'dog'), (5, 'cats'), (6, 'act');
VACUUM (REFRESH_TABLE) products;

Basic usage

Query
-- Exact match only (distance 0)SELECT id, name FROM idx_productsWHERE name @@ ts_levenshtein('cat', 0)ORDER BY id;
-- Distance 1: one edit awaySELECT id, name FROM idx_productsWHERE name @@ ts_levenshtein('cat', 1)ORDER BY id;
-- Distance 2: two edits awaySELECT id, name FROM idx_productsWHERE name @@ ts_levenshtein('cat', 2)ORDER BY id;
Result
 id | name----+------  1 | cat
 id | name----+------  1 | cat  2 | bat  3 | car  5 | cats  6 | act
 id | name----+------  1 | cat  2 | bat  3 | car  5 | cats  6 | act

Disable transpositions

Use strict Levenshtein (no transposition counting):

Query
-- Disable transpositions: 'act' no longer matches 'cat'SELECT id, name FROM idx_productsWHERE name @@ ts_levenshtein('cat', 1, false)ORDER BY id;
Result
 id | name----+------  1 | cat  2 | bat  3 | car  5 | cats

Prefix matching

Require a prefix before applying fuzzy matching — useful for autocomplete:

Query
-- Must start with 'ca', then fuzzy match the restSELECT id, name FROM idx_productsWHERE name @@ ts_levenshtein('t', 1, true, 'ca')ORDER BY id;
Result
 id | name----+------  1 | cat  3 | car  5 | cats

Parameters

ParameterTypeDefaultDescription
columncolumnIndexed text column
termstringSearch term
distanceintegerMax edit distance (0–4)
transpositionsbooleantrueCount transpositions as single edit
max_termsinteger64Max candidate terms to evaluate
prefixstring''Required prefix before fuzzy matching

N-gram matching with ts_ngram

Finds terms by n-gram similarity. Requires an index built with an ngram dictionary.

Setup

Query
CREATE TEXT SEARCH DICTIONARY bigram_dict (    template = 'ngram',    mingram = 2,    maxgram = 2,    frequency = true,    position = true);
CREATE TABLE articles (    id INTEGER PRIMARY KEY,    title VARCHAR);
CREATE INDEX idx_articles_ngram ON articles    USING inverted (id, title bigram_dict);
INSERT INTO articles VALUES    (1, 'hello'), (2, 'help'), (3, 'world'),    (4, 'held'), (5, 'hero');
VACUUM (REFRESH_TABLE) articles;

Basic usage

Query
-- Default threshold — strict matchingSELECT id, title FROM idx_articles_ngramWHERE title @@ ts_ngram('hello')ORDER BY id;
-- Lower threshold (0.3) — matches more looselySELECT id, title FROM idx_articles_ngramWHERE title @@ ts_ngram('hello', 0.3)ORDER BY id;
-- Threshold 0.0 — matches anything with at least one shared n-gramSELECT id, title FROM idx_articles_ngramWHERE title @@ ts_ngram('hello', 0.0)ORDER BY id;
Result
 id | title----+-------  1 | hello
 id | title----+-------  1 | hello  2 | help  4 | held
 id | title----+-------  1 | hello  2 | help  4 | held  5 | hero

Tuning n-gram size

  • Bigrams (mingram=2, maxgram=2): More matches, less precision. Good for short terms.
  • Trigrams (mingram=3, maxgram=3): Fewer matches, more precision. Better for longer terms.

Parameters

ParameterTypeDefaultDescription
columncolumnIndexed text column (must use ngram dictionary)
termstringSearch term
thresholdfloat0.7Minimum n-gram similarity (0.0–1.0)

Combining with other filters

Both fuzzy functions work with AND, OR and other search predicates:

Query
-- Fuzzy match + exact filterSELECT id, name FROM idx_productsWHERE name @@ ts_levenshtein('cat', 1) AND id < 4ORDER BY id;
Result
 id | name----+------  1 | cat  2 | bat  3 | car

See also