Edit this page

pipeline

Chains multiple analyzers in sequence. Each step processes the output of the previous step. Options for each step are prefixed with STEP<N>_.

Syntax

Examples

Delimiter then text analysis

Split on commas, then apply text analysis to each token:

CREATE TEXT SEARCH DICTIONARY pipe_dict (
    TEMPLATE = 'pipeline',
    STEP1_TEMPLATE = 'delimiter',
    STEP1_DELIMITER = ',',
    STEP2_TEMPLATE = 'text',
    STEP2_LOCALE = 'en_US.UTF-8',
    STEP2_CASE = 'lower',
    STEP2_STEMMING = true
);

Three-step pipeline with stopwords

CREATE TEXT SEARCH DICTIONARY advanced_pipe (
    TEMPLATE = 'pipeline',
    STEP1_TEMPLATE = 'delimiter',
    STEP1_DELIMITER = 'A',
    STEP2_TEMPLATE = 'text',
    STEP2_LOCALE = 'en_US.UTF-8',
    STEP2_CASE = 'lower',
    STEP2_ACCENT = false,
    STEP2_STEMMING = true,
    STEP2_STOPWORDS = '"fox"',
    STEP3_TEMPLATE = 'norm',
    STEP3_LOCALE = 'en_US.UTF-8',
    STEP3_CASE = 'upper'
);

N-grams then normalization

CREATE TEXT SEARCH DICTIONARY ngram_norm (
    TEMPLATE = 'pipeline',
    STEP1_TEMPLATE = 'ngram',
    STEP1_MINGRAM = 2,
    STEP1_MAXGRAM = 3,
    STEP2_TEMPLATE = 'norm',
    STEP2_LOCALE = 'en_US.UTF-8',
    STEP2_CASE = 'lower'
);

Nested pipeline

A pipeline step can itself be a pipeline. Nest with STEP<N>_STEP<M>_:

CREATE TEXT SEARCH DICTIONARY nested_pipe (
    TEMPLATE = 'pipeline',
    STEP1_TEMPLATE = 'norm',
    STEP1_LOCALE = 'en_US.UTF-8',
    STEP1_CASE = 'lower',
    STEP2_TEMPLATE = 'pipeline',
    STEP2_STEP1_TEMPLATE = 'delimiter',
    STEP2_STEP1_DELIMITER = '|',
    STEP2_STEP2_TEMPLATE = 'norm',
    STEP2_STEP2_LOCALE = 'en_US.UTF-8',
    STEP2_STEP2_CASE = 'upper',
    STEP3_TEMPLATE = 'stopwords',
    STEP3_STOPWORDS = '"A"'
);

Syntax​

Examples​

Delimiter then text analysis​

Three-step pipeline with stopwords​

N-grams then normalization​

Nested pipeline​

See also​