The approach
The MultiWordNet project aims
at the realisation of a large scale multilingual
computational lexicon based on WordNet.
WordNet is a lexical database,
created at Princeton University, in which nouns,
verbs, adjectives and adverbs are organized
into sets of synonyms (synsets), representing
lexical concepts. Synsets are linked by means
of various relations, both semantic and lexical.
Semantic relations, e.g. hypo/hypernymy and
meronymy, hold between synsets, while lexical
relations, e.g. antonymy, connect words.
The model adopted within the
MultiWordNet project stresses the usefulness
of a strict alignment between lexical databases,
i.e. wordnets, of different languages, while
retaining the ability to represent true lexical
idiosyncrasies between languages. It consists
of building language specific wordnets keeping
as much as possible of the semantic relations
available in the Princeton WordNet (PWN). This
is done by building the new synsets in correspondence
with the PWN synsets, whenever possible, and
importing semantic relations from the corresponding
English synsets; i.e., we assume that if there
are two synsets in PWN and a relation holding
between them, the same relation holds between
the corresponding synsets in the new language.
A possible risk related to
the MultiWordNet approach is that of forcing
the new wordnets to depend on the lexical
and conceptual structures of the English
language. However, this risk can be avoided
by allowing the new wordnet to diverge,
when necessary, from PWN.
Two major idiosyncrasies
can occur: lexical gaps (a language expresses
through a lexical unit what the other language
expresses with a free combination of words)
and denotation differences (a translation
equivalent exists in the target language
but it is more general or more specific).
In both cases, a lexical concept of one
language has no synonymous correspondent
in the other language. These cases are dealt
with in the MultiWordNet architecture by
creating special empty nodes whenever the
lexical concept of one language has no correspondent
in the other.