Edit this page

Case-Sensitivity and Diacritics

Control how text is normalized before indexing and searching. Dictionary configuration determines whether searches are case-sensitive and whether accented characters match their base forms.

See Setup for the shared dataset used in all examples.

How it works

When text is indexed, the dictionary's CASE and ACCENT options normalize tokens:

Option	Value	Effect
`CASE`	`'lower'`	Convert all tokens to lowercase
`CASE`	`'upper'`	Convert all tokens to uppercase
`CASE`	`'none'`	Preserve original case
`ACCENT`	`false`	Strip diacritics (é → e, ü → u)
`ACCENT`	`true`	Preserve diacritics

Use ts_lexize to see exactly how a dictionary transforms text:

SELECT ts_lexize('basic_dict', 'Café Résumé');
-- {cafe,resume}

SELECT ts_lexize('exact_dict', 'Café Résumé');
-- {Café,Résumé}

Case-insensitive search

The basic_dict uses CASE = 'lower', so searches are case-insensitive:

-- All of these match "The Matrix"
SELECT id, title FROM movies_idx WHERE PHRASE(title, 'the matrix');
SELECT id, title FROM movies_idx WHERE PHRASE(title, 'THE MATRIX');
SELECT id, title FROM movies_idx WHERE PHRASE(title, 'The Matrix');

 id | title
----+------------------------
  1 | The Matrix
  2 | The Matrix Reloaded
  3 | The Matrix Revolutions

Case-sensitive search

The exact_dict uses CASE = 'none', preserving original case:

-- Only matches the exact capitalization
SELECT id, title FROM movies_exact_idx WHERE PHRASE(title, 'The Matrix');

 id | title
----+------------------------
  1 | The Matrix
  2 | The Matrix Reloaded
  3 | The Matrix Revolutions

-- No results — wrong case
SELECT id, title FROM movies_exact_idx WHERE PHRASE(title, 'the matrix');

(0 rows)

Accent-insensitive search

The basic_dict uses ACCENT = false, stripping diacritics:

-- "cafe" matches "Café" because accents are stripped
SELECT id, title FROM movies_idx WHERE PHRASE(title, 'cafe society');

 id | title
----+---------------
  9 | Café Society

-- Also works with the accented form
SELECT id, title FROM movies_idx WHERE PHRASE(title, 'café society');

 id | title
----+---------------
  9 | Café Society

Accent-sensitive search

The exact_dict uses ACCENT = true, preserving diacritics:

-- Must use the exact accent to match
SELECT id, title FROM movies_exact_idx WHERE PHRASE(title, 'Café');

 id | title
----+---------------
  9 | Café Society

-- No match — accent is missing
SELECT id, title FROM movies_exact_idx WHERE PHRASE(title, 'Cafe');

(0 rows)

Creating custom dictionaries

Different use cases call for different normalization. Here are common patterns:

Search-friendly (case + accent insensitive)

CREATE TEXT SEARCH DICTIONARY search_friendly (
    TEMPLATE = 'text',
    LOCALE = 'en_US.UTF-8',
    CASE = 'lower',
    STEMMING = true,
    ACCENT = false,
    FREQUENCY = true,
    POSITION = true
);

Identifier matching (case + accent sensitive)

CREATE TEXT SEARCH DICTIONARY identifier_dict (
    TEMPLATE = 'text',
    LOCALE = 'en_US.UTF-8',
    CASE = 'none',
    STEMMING = false,
    ACCENT = true
);

Uppercase normalization

CREATE TEXT SEARCH DICTIONARY upper_dict (
    TEMPLATE = 'text',
    LOCALE = 'en_US.UTF-8',
    CASE = 'upper',
    STEMMING = false,
    ACCENT = false
);

SELECT ts_lexize('upper_dict', 'Café Résumé');
-- {CAFE,RESUME}

How it works​

Case-insensitive search​

Case-sensitive search​

Accent-insensitive search​

Accent-sensitive search​

Creating custom dictionaries​

Search-friendly (case + accent insensitive)​

Identifier matching (case + accent sensitive)​

Uppercase normalization​

See also​