pipeline
Chains multiple analyzers in sequence. Each step processes the output of the previous step. Options for each step are prefixed with STEP<N>_.
Syntax
Examples
Delimiter then text analysis
Split on commas, then apply text analysis to each token:
CREATE TEXT SEARCH DICTIONARY pipe_dict (
TEMPLATE = 'pipeline',
STEP1_TEMPLATE = 'delimiter',
STEP1_DELIMITER = ',',
STEP2_TEMPLATE = 'text',
STEP2_LOCALE = 'en_US.UTF-8',
STEP2_CASE = 'lower',
STEP2_STEMMING = true
);
Three-step pipeline with stopwords
CREATE TEXT SEARCH DICTIONARY advanced_pipe (
TEMPLATE = 'pipeline',
STEP1_TEMPLATE = 'delimiter',
STEP1_DELIMITER = 'A',
STEP2_TEMPLATE = 'text',
STEP2_LOCALE = 'en_US.UTF-8',
STEP2_CASE = 'lower',
STEP2_ACCENT = false,
STEP2_STEMMING = true,
STEP2_STOPWORDS = '"fox"',
STEP3_TEMPLATE = 'norm',
STEP3_LOCALE = 'en_US.UTF-8',
STEP3_CASE = 'upper'
);
N-grams then normalization
CREATE TEXT SEARCH DICTIONARY ngram_norm (
TEMPLATE = 'pipeline',
STEP1_TEMPLATE = 'ngram',
STEP1_MINGRAM = 2,
STEP1_MAXGRAM = 3,
STEP2_TEMPLATE = 'norm',
STEP2_LOCALE = 'en_US.UTF-8',
STEP2_CASE = 'lower'
);
Nested pipeline
A pipeline step can itself be a pipeline. Nest with STEP<N>_STEP<M>_:
CREATE TEXT SEARCH DICTIONARY nested_pipe (
TEMPLATE = 'pipeline',
STEP1_TEMPLATE = 'norm',
STEP1_LOCALE = 'en_US.UTF-8',
STEP1_CASE = 'lower',
STEP2_TEMPLATE = 'pipeline',
STEP2_STEP1_TEMPLATE = 'delimiter',
STEP2_STEP1_DELIMITER = '|',
STEP2_STEP2_TEMPLATE = 'norm',
STEP2_STEP2_LOCALE = 'en_US.UTF-8',
STEP2_STEP2_CASE = 'upper',
STEP3_TEMPLATE = 'stopwords',
STEP3_STOPWORDS = '"A"'
);
See also
- CREATE TEXT SEARCH DICTIONARY
- minhash — another composition template