Skip to main content

delimiter

The delimiter template splits the input on one delimiter character and emits the pieces as tokens, with no further analysis. It is the simplest tokenizer and suits structured values whose parts are separated by a known character — comma-separated tags, slash-separated paths, dotted identifiers.

For example, with DELIMITER = ',' the value red,green,blue produces the tokens red, green and blue. To split on more than one separator, use multi_delimiter; to further process each piece — lower-case it, stem it, drop stop words — chain this template into a pipeline.

Options

OptionTypeDefaultDescription
DELIMITERstringrequiredDelimiter character

Tokenization

The template cuts the input at every occurrence of DELIMITER and emits the pieces between the cuts verbatim — no case folding, stemming or trimming. Adjacent or leading delimiters therefore yield empty tokens, since the piece between two cuts is itself empty.

InputDelimiterTokens
red,green,blue,{red,green,blue}
com.example.app.{com,example,app}
/usr/local/bin/{"",usr,local,bin}

The third row shows the leading / producing an empty first token. Preview the split with ts_lexize:

Query
CREATE TEXT SEARCH DICTIONARY tok_delim_comma (    template = 'delimiter',    delimiter = ',');
SELECT ts_lexize('tok_delim_comma', 'red,green,blue');
Result
 ts_lexize------------------ {red,green,blue}

Any single character works as the delimiter — here a dot splits a reverse-DNS identifier into its components:

Query
CREATE TEXT SEARCH DICTIONARY tok_delim_dot (    template = 'delimiter',    delimiter = '.');
SELECT ts_lexize('tok_delim_dot', 'com.example.app');
Result
 ts_lexize------------------- {com,example,app}

To further process each piece — lower-case it, stem it, drop stop words — chain this template into a pipeline.

Examples

Query
CREATE TEXT SEARCH DICTIONARY pipe_delim (    template = 'delimiter',    delimiter = '|');
Query
CREATE TEXT SEARCH DICTIONARY comma_delim (    template = 'delimiter',    delimiter = ',');

See also