Skip to main content

Fuzzy Search

Fuzzy search finds approximate matches despite typos, spelling variations or alternative forms. SereneDB supports two approaches, each suited to different use cases.

Similarity measures

Levenshtein distance

Measures the minimum number of single-character edits (insertions, deletions, substitutions) to transform one string into another.

FromToDistanceEdits
galaxygalxy11 deletion
searchserach22 substitutions

Damerau-Levenshtein distance

Extends Levenshtein by also counting transpositions (adjacent character swaps) as a single edit. This is more forgiving for common typos:

FromToLevenshteinDamerau-Levenshtein
galaxyglaaxy21 (transposition)
searchsaerch21 (transposition)

N-gram similarity

Breaks strings into substrings of fixed length (bigrams, trigrams, etc.) and measures how many substrings are shared. Works better for longer strings and partial matches.

Example with bigrams (n=2):

StringBigrams
hellohe, el, ll, lo
helphe, el, lp

Shared bigrams: he, el → similarity = 2/5 = 0.4

When to use which

ApproachBest forTypical use case
LevenshteinShort strings, exact typo correctionUser name search, product codes, tags
N-gramLonger strings, partial matchingAutocomplete, "did you mean?", document titles

LEVENSHTEIN_MATCH

Finds terms within a given edit distance. Uses Damerau-Levenshtein by default (transpositions count as one edit).

Setup

Any text dictionary works — stemming should typically be disabled for fuzzy matching:

CREATE TEXT SEARCH DICTIONARY fuzzy_dict (
TEMPLATE = 'text',
LOCALE = 'en_US.UTF-8',
CASE = 'lower',
STEMMING = false,
ACCENT = false
);

CREATE TABLE products (
id INTEGER PRIMARY KEY,
name TEXT
);

CREATE INDEX idx_products ON products
USING inverted (id, name fuzzy_dict);

INSERT INTO products VALUES
(1, 'cat'), (2, 'bat'), (3, 'car'),
(4, 'dog'), (5, 'cats'), (6, 'act');

VACUUM (UPDATE_INDEXES) products;

Basic usage

-- Exact match only (distance 0)
SELECT id, name FROM idx_products
WHERE LEVENSHTEIN_MATCH(name, 'cat', 0)
ORDER BY id;
-- Returns: cat

-- Distance 1: one edit away
SELECT id, name FROM idx_products
WHERE LEVENSHTEIN_MATCH(name, 'cat', 1)
ORDER BY id;
-- Returns: cat, bat, car, cats, act

-- Distance 2: two edits away
SELECT id, name FROM idx_products
WHERE LEVENSHTEIN_MATCH(name, 'cat', 2)
ORDER BY id;
-- Returns: cat, bat, car, dog, cats, act

Disable transpositions

Use strict Levenshtein (no transposition counting):

SELECT id, name FROM idx_products
WHERE LEVENSHTEIN_MATCH(name, 'cat', 1, false)
ORDER BY id;

Prefix matching

Require a prefix before applying fuzzy matching — useful for autocomplete:

-- Must start with 'ca', then fuzzy match the rest
SELECT id, name FROM idx_products
WHERE LEVENSHTEIN_MATCH(name, 't', 1, true, 64, 'ca')
ORDER BY id;

Parameters

ParameterTypeDefaultDescription
columncolumnIndexed text column
termstringSearch term
distanceintegerMax edit distance (0–4)
transpositionsbooleantrueCount transpositions as single edit
max_termsinteger64Max candidate terms to evaluate
prefixstring''Required prefix before fuzzy matching

NGRAM_MATCH

Finds terms by n-gram similarity. Requires an index built with an ngram dictionary.

Setup

CREATE TEXT SEARCH DICTIONARY bigram_dict (
TEMPLATE = 'ngram',
MINGRAM = 2,
MAXGRAM = 2,
FREQUENCY = true,
POSITION = true
);

CREATE TABLE articles (
id INTEGER PRIMARY KEY,
title TEXT
);

CREATE INDEX idx_articles_ngram ON articles
USING inverted (id, title bigram_dict);

INSERT INTO articles VALUES
(1, 'hello'), (2, 'help'), (3, 'world'),
(4, 'held'), (5, 'hero');

VACUUM (UPDATE_INDEXES) articles;

Basic usage

-- Default threshold (0.7) — strict matching
SELECT id, title FROM idx_articles_ngram
WHERE NGRAM_MATCH(title, 'hello')
ORDER BY id;
-- Returns: hello

-- Lower threshold (0.3) — matches more loosely
SELECT id, title FROM idx_articles_ngram
WHERE NGRAM_MATCH(title, 'hello', 0.3)
ORDER BY id;
-- Returns: hello, help, held

-- Threshold 0.0 — matches anything with at least one shared n-gram
SELECT id, title FROM idx_articles_ngram
WHERE NGRAM_MATCH(title, 'hello', 0.0)
ORDER BY id;
-- Returns: hello, help, held, hero

Tuning n-gram size

  • Bigrams (mingram=2, maxgram=2): More matches, less precision. Good for short terms.
  • Trigrams (mingram=3, maxgram=3): Fewer matches, more precision. Better for longer terms.

Parameters

ParameterTypeDefaultDescription
columncolumnIndexed text column (must use ngram dictionary)
termstringSearch term
thresholdfloat0.7Minimum n-gram similarity (0.0–1.0)

Combining with other filters

Both fuzzy functions work with AND, OR and other search predicates:

-- Fuzzy match + exact filter
SELECT id, name FROM idx_products
WHERE LEVENSHTEIN_MATCH(name, 'cat', 1) AND id < 4
ORDER BY id;

-- Fuzzy match + phrase search
SELECT id, title FROM idx_articles_ngram
WHERE NGRAM_MATCH(title, 'hello', 0.3) AND PHRASE(title, 'world')
ORDER BY id;

See also