Skip to main content

segmentation

Segments text by Unicode word boundaries. Useful for languages without whitespace word delimiters (e.g., Chinese, Japanese).

Options

OptionTypeDefaultDescription
CASEstring'none'Case conversion: 'none', 'lower', 'upper'
BREAKstring'alpha'Boundary mode: 'all' (all boundaries), 'graphic' (visible characters), 'alpha' (alphabetic only)

Examples

CREATE TEXT SEARCH DICTIONARY seg_dict (
TEMPLATE = 'segmentation',
CASE = 'lower',
BREAK = 'alpha'
);

All boundaries

CREATE TEXT SEARCH DICTIONARY seg_all (
TEMPLATE = 'segmentation',
CASE = 'upper',
BREAK = 'all'
);

See also