Fuzzy Search
Fuzzy search finds approximate matches despite typos, spelling variations or alternative forms. SereneDB supports two approaches, each suited to different use cases.
Similarity measures
Levenshtein distance
Measures the minimum number of single-character edits (insertions, deletions, substitutions) to transform one string into another.
| From | To | Distance | Edits |
|---|---|---|---|
galaxy | galxy | 1 | 1 deletion |
search | serach | 2 | 2 substitutions |
Damerau-Levenshtein distance
Extends Levenshtein by also counting transpositions (adjacent character swaps) as a single edit. This is more forgiving for common typos:
| From | To | Levenshtein | Damerau-Levenshtein |
|---|---|---|---|
galaxy | glaaxy | 2 | 1 (transposition) |
search | saerch | 2 | 1 (transposition) |
N-gram similarity
Breaks strings into substrings of fixed length (bigrams, trigrams, etc.) and measures how many substrings are shared. Works better for longer strings and partial matches.
Example with bigrams (n=2):
| String | Bigrams |
|---|---|
hello | he, el, ll, lo |
help | he, el, lp |
Shared bigrams: he, el → similarity = 2/5 = 0.4
When to use which
| Approach | Best for | Typical use case |
|---|---|---|
| Levenshtein | Short strings, exact typo correction | User name search, product codes, tags |
| N-gram | Longer strings, partial matching | Autocomplete, "did you mean?", document titles |
Levenshtein matching with ts_levenshtein
Finds terms within a given edit distance. Uses Damerau-Levenshtein by default (transpositions count as one edit).
Setup
Any text dictionary works — stemming should typically be disabled for fuzzy matching:
CREATE TEXT SEARCH DICTIONARY fuzzy_dict ( template = 'text', locale = 'en_US.UTF-8', case = 'lower', stemming = false, accent = false);
CREATE TABLE products ( id INTEGER PRIMARY KEY, name VARCHAR);
CREATE INDEX idx_products ON products USING inverted (id, name fuzzy_dict);
INSERT INTO products VALUES (1, 'cat'), (2, 'bat'), (3, 'car'), (4, 'dog'), (5, 'cats'), (6, 'act');
VACUUM (REFRESH_TABLE) products;Basic usage
-- Exact match only (distance 0)SELECT id, name FROM idx_productsWHERE name @@ ts_levenshtein('cat', 0)ORDER BY id;
-- Distance 1: one edit awaySELECT id, name FROM idx_productsWHERE name @@ ts_levenshtein('cat', 1)ORDER BY id;
-- Distance 2: two edits awaySELECT id, name FROM idx_productsWHERE name @@ ts_levenshtein('cat', 2)ORDER BY id; id | name----+------ 1 | cat
id | name----+------ 1 | cat 2 | bat 3 | car 5 | cats 6 | act
id | name----+------ 1 | cat 2 | bat 3 | car 5 | cats 6 | actDisable transpositions
Use strict Levenshtein (no transposition counting):
-- Disable transpositions: 'act' no longer matches 'cat'SELECT id, name FROM idx_productsWHERE name @@ ts_levenshtein('cat', 1, false)ORDER BY id; id | name----+------ 1 | cat 2 | bat 3 | car 5 | catsPrefix matching
Require a prefix before applying fuzzy matching — useful for autocomplete:
-- Must start with 'ca', then fuzzy match the restSELECT id, name FROM idx_productsWHERE name @@ ts_levenshtein('t', 1, true, 'ca')ORDER BY id; id | name----+------ 1 | cat 3 | car 5 | catsParameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| column | column | Indexed text column | |
| term | string | Search term | |
| distance | integer | Max edit distance (0–4) | |
| transpositions | boolean | true | Count transpositions as single edit |
| max_terms | integer | 64 | Max candidate terms to evaluate |
| prefix | string | '' | Required prefix before fuzzy matching |
N-gram matching with ts_ngram
Finds terms by n-gram similarity. Requires an index built with an ngram dictionary.
Setup
CREATE TEXT SEARCH DICTIONARY bigram_dict ( template = 'ngram', mingram = 2, maxgram = 2, frequency = true, position = true);
CREATE TABLE articles ( id INTEGER PRIMARY KEY, title VARCHAR);
CREATE INDEX idx_articles_ngram ON articles USING inverted (id, title bigram_dict);
INSERT INTO articles VALUES (1, 'hello'), (2, 'help'), (3, 'world'), (4, 'held'), (5, 'hero');
VACUUM (REFRESH_TABLE) articles;Basic usage
-- Default threshold — strict matchingSELECT id, title FROM idx_articles_ngramWHERE title @@ ts_ngram('hello')ORDER BY id;
-- Lower threshold (0.3) — matches more looselySELECT id, title FROM idx_articles_ngramWHERE title @@ ts_ngram('hello', 0.3)ORDER BY id;
-- Threshold 0.0 — matches anything with at least one shared n-gramSELECT id, title FROM idx_articles_ngramWHERE title @@ ts_ngram('hello', 0.0)ORDER BY id; id | title----+------- 1 | hello
id | title----+------- 1 | hello 2 | help 4 | held
id | title----+------- 1 | hello 2 | help 4 | held 5 | heroTuning n-gram size
- Bigrams (mingram=2, maxgram=2): More matches, less precision. Good for short terms.
- Trigrams (mingram=3, maxgram=3): Fewer matches, more precision. Better for longer terms.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| column | column | Indexed text column (must use ngram dictionary) | |
| term | string | Search term | |
| threshold | float | 0.7 | Minimum n-gram similarity (0.0–1.0) |
Combining with other filters
Both fuzzy functions work with AND, OR and other search predicates:
-- Fuzzy match + exact filterSELECT id, name FROM idx_productsWHERE name @@ ts_levenshtein('cat', 1) AND id < 4ORDER BY id; id | name----+------ 1 | cat 2 | bat 3 | car