What's in MultiWordNet
MultiWordNet is a multilingual
lexical database including information about
English and Italian words. It is an extension
of WordNet 1.6, a lexical database for English
developed at the Princeton University. MultiWordNet
contains information about the following
aspects of the English and Italian lexica:
- lexical relations between words;
- semantic relations between lexical concepts
(synsets);
- correspondences between Italian and
English lexical concepts;
- semantic fields (domains).
The basic lexical relationship
in MultiWordNet is lexical synonymy.
Groups of synonyms are used to identify lexical
concepts, which are called synsets. Here
is an example of an Italian synset:
{elaboratore, computer, cervello_elettronico,
calcolatore}
Synsets are the most important
units in MultiWordNet. Different types of semantic
relationships can be attached to them. For example,
the above synset has three different semantic
relationships:
has_hypernym {macchina}
has_hyponym {calcolatore_analogico},
{calcolatore_digitale}, etc.
has_part {microchip, chip}, etc.
As a result of the approach followed
in the construction of MultiWordNet, cross-language
correspondence is defined between synsets as
well:
{elaboratore, computer, cervello_elettronico,
calcolatore}
corresponds_to
{computer, data_processor, electronic_computer,
information_processing_system}
MultiWordNet also contains
domain information. Each synset has been
annotated with at least one domain label,
selected from a set of about two hundred
labels hierarchically organized (see WordNet
Domains for further information). In our
example, the synset is labeled with the
"Computer Science" semantic field.
The lastest version of MultiWordNet
(1.39) contains around 58,000 Italian word senses
and 41,500 lemmas organized into 32,700 synsets
aligned whenever possible with Princeton WordNet
English synsets. The following table reports
all the details.
|
 |
Nouns |
Verbs |
Adj |
Adverbs |
|
 |
Total |
 |
Word Senses |
 |
43,449 |
8,271 |
4,425 |
1,789 |
|
 |
57,934 |
Lemmas |
31,525 |
4,431 |
4,130 |
1,405 |
41,491 |
Total number of synsets |
25,043 |
4,170 |
2,454 |
1,006 |
32,673 |
New Italian synsets
(with no correspondent in PWN) |
2,768 |
31 |
26 |
0 |
2,825 |
English-to-Italian gaps |
370 |
142 |
232 |
26 |
770 |
As regards MultiWordNet relations,
all Princeton WordNet relations are represented.
Moreover, the new NEAREST relation has been
added. The NEAREST relation is an intralinguistic
semantic relation connecting a synset which
is a gap to its semantically nearest synset
(usually an hyponym or an hypernym). The NEAREST
relation is typically used to manage denotation
differences, i.e. cases in which a lexical concept
in one language has no synonymous correspondent
in the other language: a translation equivalent
exists but it is more general or more specific.
As an example, the Italian word "abbronzante"
has not one corresponding synonymous translation
equivalent in English but two more specific
translation equivalents "suntan cream" and "suntan
oil". The English synset corresponding to the
Italian "abbronzante" is empty as it is a gap,
but is is connected through a NEAREST relation
to its semantically nearest synsets "suntan
cream" and "suntan oil".
A total of 53,002 relations
are represented in MultiWordNet. Semantic
relations hold for both languages while
only English lexical relations are represented
in the currently available version of MultiWordNet.
All the details are given in the following
table.
Semantic Relations
|
|
 |
|
Lexical Relations
|
 |
 |
HAS_HYPERONYM |
*33,195 |
ANTONYM |
3,266 |
HAS-MEMBER |
1,115 |
PARTICIPLE |
11 |
HAS-PART |
4,925 |
PERTAINS-TO |
2,335 |
HAS-SUBSTANCE |
358 |
ALSO-SEE |
1,592 |
SIMILAR-TO |
4,887 |
VERB-GROUP |
205 |
ENTAILMENT |
213 |
|
|
ATTRIBUTE |
743 |
|
|
CAUSES |
117 |
|
|
NEAREST |
40 |
|
|
Total |
45,593 |
Total |
7,509 |
* 30,323 relations refer to already existing synsets
from Princeton WordNet; 2,872 relations refer
to new synsets
MultiWordNet is available
both for reasearch and commercial purposes.
See the "Obtain a
licence" page for details.