segmentation
Segments text by Unicode word boundaries. Useful for languages without whitespace word delimiters (e.g., Chinese, Japanese).
Options
| Option | Type | Default | Description |
|---|---|---|---|
CASE | string | 'none' | Case conversion: 'none', 'lower', 'upper' |
BREAK | string | 'alpha' | Boundary mode: 'all' (all boundaries), 'graphic' (visible characters), 'alpha' (alphabetic only) |
Examples
CREATE TEXT SEARCH DICTIONARY seg_dict (
TEMPLATE = 'segmentation',
CASE = 'lower',
BREAK = 'alpha'
);
All boundaries
CREATE TEXT SEARCH DICTIONARY seg_all (
TEMPLATE = 'segmentation',
CASE = 'upper',
BREAK = 'all'
);