Case-Sensitivity and Diacritics
Control how text is normalized before indexing and searching. Dictionary configuration determines whether searches are case-sensitive and whether accented characters match their base forms.
See Setup for the shared dataset used in all examples.
How it works
When text is indexed, the dictionary's CASE and ACCENT options normalize tokens:
| Option | Value | Effect |
|---|---|---|
CASE | 'lower' | Convert all tokens to lowercase |
CASE | 'upper' | Convert all tokens to uppercase |
CASE | 'none' | Preserve original case |
ACCENT | false | Strip diacritics (é → e, ü → u) |
ACCENT | true | Preserve diacritics |
Use ts_lexize to see exactly how a dictionary transforms text:
SELECT ts_lexize('basic_dict', 'Café Résumé');
-- {cafe,resume}
SELECT ts_lexize('exact_dict', 'Café Résumé');
-- {Café,Résumé}
Case-insensitive search
The basic_dict uses CASE = 'lower', so searches are case-insensitive:
-- All of these match "The Matrix"
SELECT id, title FROM movies_idx WHERE PHRASE(title, 'the matrix');
SELECT id, title FROM movies_idx WHERE PHRASE(title, 'THE MATRIX');
SELECT id, title FROM movies_idx WHERE PHRASE(title, 'The Matrix');
id | title
----+------------------------
1 | The Matrix
2 | The Matrix Reloaded
3 | The Matrix Revolutions
Case-sensitive search
The exact_dict uses CASE = 'none', preserving original case:
-- Only matches the exact capitalization
SELECT id, title FROM movies_exact_idx WHERE PHRASE(title, 'The Matrix');
id | title
----+------------------------
1 | The Matrix
2 | The Matrix Reloaded
3 | The Matrix Revolutions
-- No results — wrong case
SELECT id, title FROM movies_exact_idx WHERE PHRASE(title, 'the matrix');
(0 rows)
Accent-insensitive search
The basic_dict uses ACCENT = false, stripping diacritics:
-- "cafe" matches "Café" because accents are stripped
SELECT id, title FROM movies_idx WHERE PHRASE(title, 'cafe society');
id | title
----+---------------
9 | Café Society
-- Also works with the accented form
SELECT id, title FROM movies_idx WHERE PHRASE(title, 'café society');
id | title
----+---------------
9 | Café Society
Accent-sensitive search
The exact_dict uses ACCENT = true, preserving diacritics:
-- Must use the exact accent to match
SELECT id, title FROM movies_exact_idx WHERE PHRASE(title, 'Café');
id | title
----+---------------
9 | Café Society
-- No match — accent is missing
SELECT id, title FROM movies_exact_idx WHERE PHRASE(title, 'Cafe');
(0 rows)
Creating custom dictionaries
Different use cases call for different normalization. Here are common patterns:
Search-friendly (case + accent insensitive)
CREATE TEXT SEARCH DICTIONARY search_friendly (
TEMPLATE = 'text',
LOCALE = 'en_US.UTF-8',
CASE = 'lower',
STEMMING = true,
ACCENT = false,
FREQUENCY = true,
POSITION = true
);
Identifier matching (case + accent sensitive)
CREATE TEXT SEARCH DICTIONARY identifier_dict (
TEMPLATE = 'text',
LOCALE = 'en_US.UTF-8',
CASE = 'none',
STEMMING = false,
ACCENT = true
);
Uppercase normalization
CREATE TEXT SEARCH DICTIONARY upper_dict (
TEMPLATE = 'text',
LOCALE = 'en_US.UTF-8',
CASE = 'upper',
STEMMING = false,
ACCENT = false
);
SELECT ts_lexize('upper_dict', 'Café Résumé');
-- {CAFE,RESUME}
See also
- CREATE TEXT SEARCH DICTIONARY — all dictionary templates and options
- Phrase and Proximity Search — using phrase search with dictionaries