Edit this page

wordnet_synonyms

The wordnet_synonyms template expands tokens using a WordNet Prolog synonyms database supplied inline via the required SYNONYMS option. Where solr_synonyms rewrites a word to its sibling words, this template rewrites each word to the synset id(s) it belongs to — a numeric concept identifier shared by all words of the same sense.

Each record has the form s(synset_id, w_num, 'word', ss_type, sense_number, tag_count). and assigns one word to one synset. Words that appear under the same synset_id are synonyms, so they all map to that id and meet in the index even though the surface words differ. A word that appears in several synsets maps to all of their ids. A word in no record produces no tokens.

Like solr_synonyms, it is typically used inside a pipeline to broaden recall to related words.

Options

Option	Type	Default	Description
`SYNONYMS`	string	required	Inline WordNet Prolog database: one `s(...)` record per line

Tokenization

Given records that place fast, quick and swift under synset 100000001, each of those words is rewritten to {100000001}. Because the indexed text and the query are analyzed the same way, a search for quick reduces to 100000001 and so matches a document that contained fast. Words placed under a different synset map to that synset's id, and a word the database never mentions yields an empty token set.

Input	Records	Tokens
`fast`	`s(100000001,1,'fast',v,1,0).`	`{100000001}`
`quick`	`s(100000001,2,'quick',v,1,0).`	`{100000001}`
`keyboard`	(no record)	`{}`

The database below defines two synsets — a verb sense and a noun sense:

Query

CREATE TEXT SEARCH DICTIONARY wordnet_syn (    template = 'wordnet_synonyms',    -- words sharing a synset id are synonyms; one s(...) record per line    synonyms = 's(100000001,1,''fast'',v,1,0).s(100000001,2,''quick'',v,1,0).s(100000001,3,''swift'',v,1,0).s(100000002,1,''car'',n,1,0).s(100000002,2,''automobile'',n,1,0).');

Words sharing a synset map to its id, so synonyms meet under the same token:

Query

SELECT ts_lexize('wordnet_syn', 'fast');

Result

 ts_lexize------------- {100000001}

Query

SELECT ts_lexize('wordnet_syn', 'quick');

Result

 ts_lexize------------- {100000001}

A word the database never mentions produces no tokens:

Query

SELECT ts_lexize('wordnet_syn', 'keyboard');

Result

 ts_lexize----------- {}

Options​

Tokenization​

See also​

Options

Tokenization

See also