Skip to main content

path_hierarchy

The path_hierarchy template tokenizes a hierarchical path into every prefix along the way, so a value indexed at /usr/local/bin is also found by a search for /usr or /usr/local.

This is ideal for file paths, category trees and URL paths where you want a query on any ancestor to match the descendants stored beneath it. By default it splits on /; set DELIMITER to use another separator. With REVERSE = true it builds the hierarchy from the right instead — the natural choice for domain names, where docs.serenedb.com should also match serenedb.com and com. Unlike a plain delimiter split, which would emit the individual components, path_hierarchy emits the cumulative prefixes.

Options

OptionTypeDefaultDescription
DELIMITERstring'/'Path separator character or string
REPLACEMENTstringsame as DELIMITERString that replaces the delimiter in the emitted tokens
REVERSEbooleanfalseBuild the hierarchy from the right (for domain-like values)
SKIPinteger0Number of leading components to drop before building prefixes
BUFFERSIZEinteger1024Term buffer size hint (characters per pass)

Tokenization

Each token is a cumulative prefix of the path. The input is cut on DELIMITER and one token is emitted for the first component, then for the first two, and so on up to the whole value. REVERSE builds the prefixes from the trailing end instead; SKIP discards a number of leading components before prefixes are formed; REPLACEMENT rewrites the delimiter character in the output.

The table shows the tokens emitted for a few option combinations:

OptionsInputTokens
DELIMITER = '/'/usr/local/bin/usr, /usr/local, /usr/local/bin
DELIMITER = '/', SKIP = 1/usr/local/bin/local, /local/bin
DELIMITER = '.', REVERSE = truedocs.serenedb.comdocs.serenedb.com, serenedb.com, com

Index a filesystem path into its ancestors

A search for any ancestor prefix matches every path stored beneath it:

Query
CREATE TEXT SEARCH DICTIONARY path_dict (    template = 'path_hierarchy',    delimiter = '/');
SELECT ts_lexize('path_dict', '/usr/local/bin');
Result
 ts_lexize---------------------------------- {/usr,/usr/local,/usr/local/bin}

Reverse mode for a domain name

With REVERSE = true and DELIMITER = '.' the prefixes grow from the right, so a subdomain matches its parent domains:

Query
CREATE TEXT SEARCH DICTIONARY domain_dict (    template = 'path_hierarchy',    delimiter = '.',    reverse = true);
SELECT ts_lexize('domain_dict', 'docs.serenedb.com');
Result
 ts_lexize-------------------------------------- {docs.serenedb.com,serenedb.com,com}

Skip leading components

SKIP = 1 drops the first component before building the prefixes — useful for stripping a common root such as a mount point or a leading category:

Query
CREATE TEXT SEARCH DICTIONARY path_skip (    template = 'path_hierarchy',    delimiter = '/',    skip = 1);
SELECT ts_lexize('path_skip', '/usr/local/bin');
Result
 ts_lexize--------------------- {/local,/local/bin}

See also