Skip to main content

Phrase and Proximity Search

Search for tokens appearing in a specific order. This allows matching partial or full sentences within indexed text. Requires POSITION = true in the dictionary.

See Setup for the shared dataset used in all examples.

Use PHRASE to find documents containing tokens in sequence:

SELECT id, title
FROM movies_idx
WHERE PHRASE(description, 'biggest blockbuster')
ORDER BY id;
 id | title
----+---------------
4 | Jurassic Park
6 | Scary Movie

Both documents contain the phrase "biggest blockbuster" in their descriptions.

Multi-word phrases

Search for longer sequences:

SELECT id, title
FROM movies_idx
WHERE PHRASE(description, 'the war against its controllers');
 id | title
----+------------
1 | The Matrix

Combining phrase conditions with AND

Find documents matching multiple phrases:

SELECT id, title
FROM movies_idx
WHERE PHRASE(description, 'alien') AND PHRASE(description, 'galaxy')
ORDER BY id;
 id | title
----+---------------------------------
7 | Star Trek: The Motion Picture
8 | Alien

Combining phrase conditions with OR

Find documents matching any of several phrases:

SELECT id, title
FROM movies_idx
WHERE PHRASE(description, 'computer hacker') OR PHRASE(description, 'serial killer')
ORDER BY id;
 id | title
----+--------------
1 | The Matrix
6 | Scary Movie

Phrase search across columns

Search different columns in the same query:

SELECT id, title
FROM movies_idx
WHERE PHRASE(title, 'the matrix') AND PHRASE(description, 'machine')
ORDER BY id;
 id | title
----+----------------------------
2 | The Matrix Reloaded
3 | The Matrix Revolutions

Combine with exact matching

Use phrase search together with term operations:

SELECT id, title, genre
FROM movies_idx
WHERE TERM_EQ(genre, 'sci-fi') AND PHRASE(description, 'galaxy')
ORDER BY id;
 id | title                           | genre
----+---------------------------------+--------
7 | Star Trek: The Motion Picture | sci-fi
8 | Alien | sci-fi

Combine with analytics

The power of SereneDB: search and aggregate in a single query:

SELECT genre, COUNT(*) AS matches, AVG(runtime) AS avg_runtime
FROM movies_idx
WHERE PHRASE(description, 'film')
GROUP BY genre
ORDER BY matches DESC;
SELECT genre,
COUNT(*) AS count,
MIN(year) AS earliest,
MAX(year) AS latest
FROM movies_idx
WHERE PHRASE(description, 'biggest blockbuster')
GROUP BY genre;
 genre     | count | earliest | latest
-----------+-------+----------+--------
adventure | 1 | 1993 | 1993
comedy | 1 | 2000 | 2000

How phrase search works

  1. The query text goes through the same dictionary as the indexed data
  2. The resulting tokens must appear in the same order and at consecutive positions in the document
  3. Because both sides use the same normalization (case, stemming, accents), matching is consistent

For example, with basic_dict (CASE = 'lower', ACCENT = false):

-- The query "Biggest Blockbuster" becomes tokens: {biggest, blockbuster}
SELECT ts_lexize('basic_dict', 'Biggest Blockbuster');
-- {biggest,blockbuster}

-- These tokens are searched at consecutive positions in the index

See also