Fuzzy Search
Fuzzy search finds approximate matches despite typos, spelling variations or alternative forms. SereneDB supports two approaches, each suited to different use cases.
Similarity measures
Levenshtein distance
Measures the minimum number of single-character edits (insertions, deletions, substitutions) to transform one string into another.
| From | To | Distance | Edits |
|---|---|---|---|
galaxy | galxy | 1 | 1 deletion |
search | serach | 2 | 2 substitutions |
Damerau-Levenshtein distance
Extends Levenshtein by also counting transpositions (adjacent character swaps) as a single edit. This is more forgiving for common typos:
| From | To | Levenshtein | Damerau-Levenshtein |
|---|---|---|---|
galaxy | glaaxy | 2 | 1 (transposition) |
search | saerch | 2 | 1 (transposition) |
N-gram similarity
Breaks strings into substrings of fixed length (bigrams, trigrams, etc.) and measures how many substrings are shared. Works better for longer strings and partial matches.
Example with bigrams (n=2):
| String | Bigrams |
|---|---|
hello | he, el, ll, lo |
help | he, el, lp |
Shared bigrams: he, el → similarity = 2/5 = 0.4
When to use which
| Approach | Best for | Typical use case |
|---|---|---|
| Levenshtein | Short strings, exact typo correction | User name search, product codes, tags |
| N-gram | Longer strings, partial matching | Autocomplete, "did you mean?", document titles |
LEVENSHTEIN_MATCH
Finds terms within a given edit distance. Uses Damerau-Levenshtein by default (transpositions count as one edit).
Setup
Any text dictionary works — stemming should typically be disabled for fuzzy matching:
CREATE TEXT SEARCH DICTIONARY fuzzy_dict (
TEMPLATE = 'text',
LOCALE = 'en_US.UTF-8',
CASE = 'lower',
STEMMING = false,
ACCENT = false
);
CREATE TABLE products (
id INTEGER PRIMARY KEY,
name TEXT
);
CREATE INDEX idx_products ON products
USING inverted (id, name fuzzy_dict);
INSERT INTO products VALUES
(1, 'cat'), (2, 'bat'), (3, 'car'),
(4, 'dog'), (5, 'cats'), (6, 'act');
VACUUM (UPDATE_INDEXES) products;
Basic usage
-- Exact match only (distance 0)
SELECT id, name FROM idx_products
WHERE LEVENSHTEIN_MATCH(name, 'cat', 0)
ORDER BY id;
-- Returns: cat
-- Distance 1: one edit away
SELECT id, name FROM idx_products
WHERE LEVENSHTEIN_MATCH(name, 'cat', 1)
ORDER BY id;
-- Returns: cat, bat, car, cats, act
-- Distance 2: two edits away
SELECT id, name FROM idx_products
WHERE LEVENSHTEIN_MATCH(name, 'cat', 2)
ORDER BY id;
-- Returns: cat, bat, car, dog, cats, act
Disable transpositions
Use strict Levenshtein (no transposition counting):
SELECT id, name FROM idx_products
WHERE LEVENSHTEIN_MATCH(name, 'cat', 1, false)
ORDER BY id;
Prefix matching
Require a prefix before applying fuzzy matching — useful for autocomplete:
-- Must start with 'ca', then fuzzy match the rest
SELECT id, name FROM idx_products
WHERE LEVENSHTEIN_MATCH(name, 't', 1, true, 64, 'ca')
ORDER BY id;
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| column | column | Indexed text column | |
| term | string | Search term | |
| distance | integer | Max edit distance (0–4) | |
| transpositions | boolean | true | Count transpositions as single edit |
| max_terms | integer | 64 | Max candidate terms to evaluate |
| prefix | string | '' | Required prefix before fuzzy matching |
NGRAM_MATCH
Finds terms by n-gram similarity. Requires an index built with an ngram dictionary.
Setup
CREATE TEXT SEARCH DICTIONARY bigram_dict (
TEMPLATE = 'ngram',
MINGRAM = 2,
MAXGRAM = 2,
FREQUENCY = true,
POSITION = true
);
CREATE TABLE articles (
id INTEGER PRIMARY KEY,
title TEXT
);
CREATE INDEX idx_articles_ngram ON articles
USING inverted (id, title bigram_dict);
INSERT INTO articles VALUES
(1, 'hello'), (2, 'help'), (3, 'world'),
(4, 'held'), (5, 'hero');
VACUUM (UPDATE_INDEXES) articles;
Basic usage
-- Default threshold (0.7) — strict matching
SELECT id, title FROM idx_articles_ngram
WHERE NGRAM_MATCH(title, 'hello')
ORDER BY id;
-- Returns: hello
-- Lower threshold (0.3) — matches more loosely
SELECT id, title FROM idx_articles_ngram
WHERE NGRAM_MATCH(title, 'hello', 0.3)
ORDER BY id;
-- Returns: hello, help, held
-- Threshold 0.0 — matches anything with at least one shared n-gram
SELECT id, title FROM idx_articles_ngram
WHERE NGRAM_MATCH(title, 'hello', 0.0)
ORDER BY id;
-- Returns: hello, help, held, hero
Tuning n-gram size
- Bigrams (mingram=2, maxgram=2): More matches, less precision. Good for short terms.
- Trigrams (mingram=3, maxgram=3): Fewer matches, more precision. Better for longer terms.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| column | column | Indexed text column (must use ngram dictionary) | |
| term | string | Search term | |
| threshold | float | 0.7 | Minimum n-gram similarity (0.0–1.0) |
Combining with other filters
Both fuzzy functions work with AND, OR and other search predicates:
-- Fuzzy match + exact filter
SELECT id, name FROM idx_products
WHERE LEVENSHTEIN_MATCH(name, 'cat', 1) AND id < 4
ORDER BY id;
-- Fuzzy match + phrase search
SELECT id, title FROM idx_articles_ngram
WHERE NGRAM_MATCH(title, 'hello', 0.3) AND PHRASE(title, 'world')
ORDER BY id;