|
Projects related to MultiWordNet at TCC group
|
|
 |
- MultiSemCor
The MultiSemCor project aims at creating a semantically
annotated corpus by exploiting information contained
in the English SemCor corpus. SemCor is a subset
of the English Brown corpus containing almost
700,000 running words. In SemCor all the words
are tagged by PoS, and more than 200,000 content
words are also lemmatized and sense-tagged according
to WordNet. To build MultiSemCor, the English
texts are translated into Italian, then the
parallel texts are aligned at word level and
the word sense annotations from the SemCor tagged
texts are transferred to the Italian translations.
The final result of the project will be the
MultiSemCor corpus, an Italian corpus annotated
with PoS, lemma and word sense, but also an
aligned English/Italian parallel corpus lexically
annotated with a shared inventory of word senses
taken from MultiWordNet. Up to now, MultiSemcor
is composed of 116 Italian texts aligned to
the corresponding English texts and linguistically
annotated.
- WordNet Domains
WordNet Domains is an extension
of WordNet 1.6 where each synset has been
annotated with at least one domain label
(e.g. medicine, sports, law, etc.), selected
from a set of about two hundred labels
hierarchically organized.
- Domain-specific Lexical Databases
Some work has been done
on creating wordnets for specialized domains
and integrating them into MultiWordNet.
A set of procedures have been defined
to allow an integrated access of the two
resources, such that overlapping senses
are merged and conflicting situations
are properly managed. Two past projects
dealt with the economic and philosophical
domains, while an ongoing project deals
with the architectural domain.
- The Economic-WordNet project led to the creaation
of a specialized wordnet for the economic domain
consising of 5,130 lemmas distributed in 4,687
synsets.
- In the Philonet project we studied methodologies
to devolop a specialized wordnet for the
philosophical domain and to create tools
for computer-aided analysis of philosophical
texts.
- The ArchiWordNet project aims
at creating a specialized wordnet for
the domain of architecture integrated
in the MultiWordNet lexical database.
This resource will be used as a thesaurus
in an architecture image archive, both
for indexing and retrieval of the images.
Moreover, it will be used for educational
purposes.
- Extension of MultiWordNet to other languages:
Hebrew WordNet
Hebrew WordNet is a research project carried
on in conjunction with the University
of Haifa aiming at creating a WordNet
for the Hebrew language within the framework
of the MultiWordNet approach.
|
|
|