Edit this page

Fuzzy Search

Fuzzy search finds approximate matches despite typos, spelling variations or alternative forms. SereneDB supports two approaches, each suited to different use cases.

Similarity measures

Levenshtein distance

Measures the minimum number of single-character edits (insertions, deletions, substitutions) to transform one string into another.

From	To	Distance	Edits
`galaxy`	`galxy`	1	1 deletion
`search`	`serach`	2	2 substitutions

Damerau-Levenshtein distance

Extends Levenshtein by also counting transpositions (adjacent character swaps) as a single edit. This is more forgiving for common typos:

From	To	Levenshtein	Damerau-Levenshtein
`galaxy`	`glaaxy`	2	1 (transposition)
`search`	`saerch`	2	1 (transposition)

N-gram similarity

Breaks strings into substrings of fixed length (bigrams, trigrams, etc.) and measures how many substrings are shared. Works better for longer strings and partial matches.

Example with bigrams (n=2):

String	Bigrams
`hello`	`he`, `el`, `ll`, `lo`
`help`	`he`, `el`, `lp`

Shared bigrams: he, el → similarity = 2/5 = 0.4

When to use which

Approach	Best for	Typical use case
Levenshtein	Short strings, exact typo correction	User name search, product codes, tags
N-gram	Longer strings, partial matching	Autocomplete, "did you mean?", document titles

Levenshtein matching with `ts_levenshtein`

Finds terms within a given edit distance. Uses Damerau-Levenshtein by default (transpositions count as one edit).

Setup

Any text dictionary works — stemming should typically be disabled for fuzzy matching:

Query

CREATE TEXT SEARCH DICTIONARY fuzzy_dict (    template = 'text',    locale = 'en_US.UTF-8',    case = 'lower',    stemming = false,    accent = false);
CREATE TABLE products (    id INTEGER PRIMARY KEY,    name VARCHAR);
CREATE INDEX idx_products ON products    USING inverted (id, name fuzzy_dict);
INSERT INTO products VALUES    (1, 'cat'), (2, 'bat'), (3, 'car'),    (4, 'dog'), (5, 'cats'), (6, 'act');
VACUUM (REFRESH_TABLE) products;

Basic usage

Query

-- Exact match only (distance 0)SELECT id, name FROM idx_productsWHERE name @@ ts_levenshtein('cat', 0)ORDER BY id;
-- Distance 1: one edit awaySELECT id, name FROM idx_productsWHERE name @@ ts_levenshtein('cat', 1)ORDER BY id;
-- Distance 2: two edits awaySELECT id, name FROM idx_productsWHERE name @@ ts_levenshtein('cat', 2)ORDER BY id;

Result

 id | name----+------  1 | cat
 id | name----+------  1 | cat  2 | bat  3 | car  5 | cats  6 | act
 id | name----+------  1 | cat  2 | bat  3 | car  5 | cats  6 | act

Disable transpositions

Use strict Levenshtein (no transposition counting):

Query

-- Disable transpositions: 'act' no longer matches 'cat'SELECT id, name FROM idx_productsWHERE name @@ ts_levenshtein('cat', 1, false)ORDER BY id;

Result

 id | name----+------  1 | cat  2 | bat  3 | car  5 | cats

Prefix matching

Require a prefix before applying fuzzy matching — useful for autocomplete:

Query

-- Must start with 'ca', then fuzzy match the restSELECT id, name FROM idx_productsWHERE name @@ ts_levenshtein('t', 1, true, 'ca')ORDER BY id;

Result

 id | name----+------  1 | cat  3 | car  5 | cats

Parameters

Parameter	Type	Default	Description
column	column		Indexed text column
term	string		Search term
distance	integer		Max edit distance (0–4)
transpositions	boolean	`true`	Count transpositions as single edit
max_terms	integer	`64`	Max candidate terms to evaluate
prefix	string	`''`	Required prefix before fuzzy matching

N-gram matching with `ts_ngram`

Finds terms by n-gram similarity. Requires an index built with an ngram dictionary.

Setup

Query

CREATE TEXT SEARCH DICTIONARY bigram_dict (    template = 'ngram',    mingram = 2,    maxgram = 2,    frequency = true,    position = true);
CREATE TABLE articles (    id INTEGER PRIMARY KEY,    title VARCHAR);
CREATE INDEX idx_articles_ngram ON articles    USING inverted (id, title bigram_dict);
INSERT INTO articles VALUES    (1, 'hello'), (2, 'help'), (3, 'world'),    (4, 'held'), (5, 'hero');
VACUUM (REFRESH_TABLE) articles;

Basic usage

Query

-- Default threshold — strict matchingSELECT id, title FROM idx_articles_ngramWHERE title @@ ts_ngram('hello')ORDER BY id;
-- Lower threshold (0.3) — matches more looselySELECT id, title FROM idx_articles_ngramWHERE title @@ ts_ngram('hello', 0.3)ORDER BY id;
-- Threshold 0.0 — matches anything with at least one shared n-gramSELECT id, title FROM idx_articles_ngramWHERE title @@ ts_ngram('hello', 0.0)ORDER BY id;

Result

 id | title----+-------  1 | hello
 id | title----+-------  1 | hello  2 | help  4 | held
 id | title----+-------  1 | hello  2 | help  4 | held  5 | hero

Tuning n-gram size

Bigrams (mingram=2, maxgram=2): More matches, less precision. Good for short terms.
Trigrams (mingram=3, maxgram=3): Fewer matches, more precision. Better for longer terms.

Parameters

Parameter	Type	Default	Description
column	column		Indexed text column (must use `ngram` dictionary)
term	string		Search term
threshold	float	`0.7`	Minimum n-gram similarity (0.0–1.0)

Combining with other filters

Both fuzzy functions work with AND, OR and other search predicates:

Query

-- Fuzzy match + exact filterSELECT id, name FROM idx_productsWHERE name @@ ts_levenshtein('cat', 1) AND id < 4ORDER BY id;

Result

 id | name----+------  1 | cat  2 | bat  3 | car

Similarity measures​

Levenshtein distance​

Damerau-Levenshtein distance​

N-gram similarity​

When to use which​

Levenshtein matching with ts_levenshtein​

Setup​

Basic usage​

Disable transpositions​

Prefix matching​

Parameters​

N-gram matching with ts_ngram​

Setup​

Basic usage​

Tuning n-gram size​

Parameters​

Combining with other filters​

See also​

Similarity measures

Levenshtein distance

Damerau-Levenshtein distance

N-gram similarity

When to use which

Levenshtein matching with `ts_levenshtein`

Setup

Basic usage

Disable transpositions

Prefix matching

Parameters

N-gram matching with `ts_ngram`

Setup

Basic usage

Tuning n-gram size

Parameters

Combining with other filters

See also