Case-Sensitivity and Diacritics
Control how text is normalized before indexing and searching. Dictionary configuration determines whether searches are case-sensitive and whether accented characters match their base forms.
See Setup for the shared dataset used in all examples.
How it works
When text is indexed, the dictionary's CASE and ACCENT options normalize tokens:
| Option | Value | Effect |
|---|---|---|
CASE | 'lower' | Convert all tokens to lowercase |
CASE | 'upper' | Convert all tokens to uppercase |
CASE | 'none' | Preserve original case |
ACCENT | false | Strip diacritics (é → e, ü → u) |
ACCENT | true | Preserve diacritics |
Use ts_lexize to see exactly how a dictionary transforms text:
Query
SELECT ts_lexize('basic_dict', 'Café Résumé');
-- {cafe,resume}SELECT ts_lexize('exact_dict', 'Café Résumé');Result
ts_lexize--------------- {cafe,resume}
ts_lexize--------------- {Café,Résumé}Case-insensitive search
The basic_dict uses CASE = 'lower', so searches are case-insensitive:
Query
-- All of these match "The Matrix"SELECT id, title FROM movies_idx WHERE title @@ ts_phrase('the matrix');
SELECT id, title FROM movies_idx WHERE title @@ ts_phrase('THE MATRIX');
SELECT id, title FROM movies_idx WHERE title @@ ts_phrase('The Matrix');Result
id | title----+------------------------ 1 | The Matrix 2 | The Matrix Reloaded 3 | The Matrix Revolutions
id | title----+------------------------ 1 | The Matrix 2 | The Matrix Reloaded 3 | The Matrix Revolutions
id | title----+------------------------ 1 | The Matrix 2 | The Matrix Reloaded 3 | The Matrix RevolutionsCase-sensitive search
The exact_dict uses CASE = 'none', preserving original case:
Query
-- Only matches the exact capitalizationSELECT id, title FROM movies_exact_idx WHERE title @@ ts_phrase('The Matrix');Result
id | title----+------------------------ 1 | The Matrix 2 | The Matrix Reloaded 3 | The Matrix RevolutionsQuery
-- No results — wrong caseSELECT id, title FROM movies_exact_idx WHERE title @@ ts_phrase('the matrix');Result
id titleAccent-insensitive search
The basic_dict uses ACCENT = false, stripping diacritics:
Query
SELECT id, title FROM movies_idx WHERE title @@ ts_phrase('cafe society');Result
id | title----+-------------- 9 | Café SocietyQuery
-- Also works with the accented formSELECT id, title FROM movies_idx WHERE title @@ ts_phrase('café society');Result
id | title----+-------------- 9 | Café SocietyAccent-sensitive search
The exact_dict uses ACCENT = true, preserving diacritics:
Query
-- Must use the exact accent to matchSELECT id, title FROM movies_exact_idx WHERE title @@ ts_phrase('Café');Result
id | title----+-------------- 9 | Café SocietyQuery
-- No match — accent is missingSELECT id, title FROM movies_exact_idx WHERE title @@ ts_phrase('Cafe');Result
id titleCreating custom dictionaries
Different use cases call for different normalization. Here are common patterns:
Search-friendly (case + accent insensitive)
Query
CREATE TEXT SEARCH DICTIONARY search_friendly ( template = 'text', locale = 'en_US.UTF-8', case = 'lower', stemming = true, accent = false, frequency = true, position = true);Identifier matching (case + accent sensitive)
Query
CREATE TEXT SEARCH DICTIONARY identifier_dict ( template = 'text', locale = 'en_US.UTF-8', case = 'none', stemming = false, accent = true);Uppercase normalization
Query
CREATE TEXT SEARCH DICTIONARY upper_dict ( template = 'text', locale = 'en_US.UTF-8', case = 'upper', stemming = false, accent = false);
SELECT ts_lexize('upper_dict', 'Café Résumé');Result
ts_lexize--------------- {CAFE,RESUME}See also
- CREATE TEXT SEARCH DICTIONARY — all dictionary templates and options
- Phrase and Proximity Search — using phrase search with dictionaries