Skip to main content

Case-Sensitivity and Diacritics

Control how text is normalized before indexing and searching. Dictionary configuration determines whether searches are case-sensitive and whether accented characters match their base forms.

See Setup for the shared dataset used in all examples.

How it works

When text is indexed, the dictionary's CASE and ACCENT options normalize tokens:

OptionValueEffect
CASE'lower'Convert all tokens to lowercase
CASE'upper'Convert all tokens to uppercase
CASE'none'Preserve original case
ACCENTfalseStrip diacritics (é → e, ü → u)
ACCENTtruePreserve diacritics

Use ts_lexize to see exactly how a dictionary transforms text:

Query
SELECT ts_lexize('basic_dict', 'Café Résumé');
-- {cafe,resume}SELECT ts_lexize('exact_dict', 'Café Résumé');
Result
 ts_lexize--------------- {cafe,resume}
 ts_lexize--------------- {Café,Résumé}

The basic_dict uses CASE = 'lower', so searches are case-insensitive:

Query
-- All of these match "The Matrix"SELECT id, title FROM movies_idx WHERE title @@ ts_phrase('the matrix');
SELECT id, title FROM movies_idx WHERE title @@ ts_phrase('THE MATRIX');
SELECT id, title FROM movies_idx WHERE title @@ ts_phrase('The Matrix');
Result
 id | title----+------------------------  1 | The Matrix  2 | The Matrix Reloaded  3 | The Matrix Revolutions
 id | title----+------------------------  1 | The Matrix  2 | The Matrix Reloaded  3 | The Matrix Revolutions
 id | title----+------------------------  1 | The Matrix  2 | The Matrix Reloaded  3 | The Matrix Revolutions

The exact_dict uses CASE = 'none', preserving original case:

Query
-- Only matches the exact capitalizationSELECT id, title FROM movies_exact_idx WHERE title @@ ts_phrase('The Matrix');
Result
 id | title----+------------------------  1 | The Matrix  2 | The Matrix Reloaded  3 | The Matrix Revolutions
Query
-- No results — wrong caseSELECT id, title FROM movies_exact_idx WHERE title @@ ts_phrase('the matrix');
Result
id	title

The basic_dict uses ACCENT = false, stripping diacritics:

Query
SELECT id, title FROM movies_idx WHERE title @@ ts_phrase('cafe society');
Result
 id | title----+--------------  9 | Café Society
Query
-- Also works with the accented formSELECT id, title FROM movies_idx WHERE title @@ ts_phrase('café society');
Result
 id | title----+--------------  9 | Café Society

The exact_dict uses ACCENT = true, preserving diacritics:

Query
-- Must use the exact accent to matchSELECT id, title FROM movies_exact_idx WHERE title @@ ts_phrase('Café');
Result
 id | title----+--------------  9 | Café Society
Query
-- No match — accent is missingSELECT id, title FROM movies_exact_idx WHERE title @@ ts_phrase('Cafe');
Result
id	title

Creating custom dictionaries

Different use cases call for different normalization. Here are common patterns:

Search-friendly (case + accent insensitive)

Query
CREATE TEXT SEARCH DICTIONARY search_friendly (    template = 'text',    locale = 'en_US.UTF-8',    case = 'lower',    stemming = true,    accent = false,    frequency = true,    position = true);

Identifier matching (case + accent sensitive)

Query
CREATE TEXT SEARCH DICTIONARY identifier_dict (    template = 'text',    locale = 'en_US.UTF-8',    case = 'none',    stemming = false,    accent = true);

Uppercase normalization

Query
CREATE TEXT SEARCH DICTIONARY upper_dict (    template = 'text',    locale = 'en_US.UTF-8',    case = 'upper',    stemming = false,    accent = false);
SELECT ts_lexize('upper_dict', 'Café Résumé');
Result
 ts_lexize--------------- {CAFE,RESUME}

See also