Skip to main content

Case-Sensitivity and Diacritics

Control how text is normalized before indexing and searching. Dictionary configuration determines whether searches are case-sensitive and whether accented characters match their base forms.

See Setup for the shared dataset used in all examples.

How it works

When text is indexed, the dictionary's CASE and ACCENT options normalize tokens:

OptionValueEffect
CASE'lower'Convert all tokens to lowercase
CASE'upper'Convert all tokens to uppercase
CASE'none'Preserve original case
ACCENTfalseStrip diacritics (é → e, ü → u)
ACCENTtruePreserve diacritics

Use ts_lexize to see exactly how a dictionary transforms text:

SELECT ts_lexize('basic_dict', 'Café Résumé');
-- {cafe,resume}

SELECT ts_lexize('exact_dict', 'Café Résumé');
-- {Café,Résumé}

The basic_dict uses CASE = 'lower', so searches are case-insensitive:

-- All of these match "The Matrix"
SELECT id, title FROM movies_idx WHERE PHRASE(title, 'the matrix');
SELECT id, title FROM movies_idx WHERE PHRASE(title, 'THE MATRIX');
SELECT id, title FROM movies_idx WHERE PHRASE(title, 'The Matrix');
 id | title
----+------------------------
1 | The Matrix
2 | The Matrix Reloaded
3 | The Matrix Revolutions

The exact_dict uses CASE = 'none', preserving original case:

-- Only matches the exact capitalization
SELECT id, title FROM movies_exact_idx WHERE PHRASE(title, 'The Matrix');
 id | title
----+------------------------
1 | The Matrix
2 | The Matrix Reloaded
3 | The Matrix Revolutions
-- No results — wrong case
SELECT id, title FROM movies_exact_idx WHERE PHRASE(title, 'the matrix');
(0 rows)

The basic_dict uses ACCENT = false, stripping diacritics:

-- "cafe" matches "Café" because accents are stripped
SELECT id, title FROM movies_idx WHERE PHRASE(title, 'cafe society');
 id | title
----+---------------
9 | Café Society
-- Also works with the accented form
SELECT id, title FROM movies_idx WHERE PHRASE(title, 'café society');
 id | title
----+---------------
9 | Café Society

The exact_dict uses ACCENT = true, preserving diacritics:

-- Must use the exact accent to match
SELECT id, title FROM movies_exact_idx WHERE PHRASE(title, 'Café');
 id | title
----+---------------
9 | Café Society
-- No match — accent is missing
SELECT id, title FROM movies_exact_idx WHERE PHRASE(title, 'Cafe');
(0 rows)

Creating custom dictionaries

Different use cases call for different normalization. Here are common patterns:

Search-friendly (case + accent insensitive)

CREATE TEXT SEARCH DICTIONARY search_friendly (
TEMPLATE = 'text',
LOCALE = 'en_US.UTF-8',
CASE = 'lower',
STEMMING = true,
ACCENT = false,
FREQUENCY = true,
POSITION = true
);

Identifier matching (case + accent sensitive)

CREATE TEXT SEARCH DICTIONARY identifier_dict (
TEMPLATE = 'text',
LOCALE = 'en_US.UTF-8',
CASE = 'none',
STEMMING = false,
ACCENT = true
);

Uppercase normalization

CREATE TEXT SEARCH DICTIONARY upper_dict (
TEMPLATE = 'text',
LOCALE = 'en_US.UTF-8',
CASE = 'upper',
STEMMING = false,
ACCENT = false
);

SELECT ts_lexize('upper_dict', 'Café Résumé');
-- {CAFE,RESUME}

See also