Human Language Technology Group

  language word  
  The project
  On-line interface
  Related work
  Data format
  Download brochure
  Become a contributor
Projects related to MultiWordNet at TCC group

  • MultiSemCor
    The MultiSemCor project aims at creating a semantically annotated corpus by exploiting information contained in the English SemCor corpus. SemCor is a subset of the English Brown corpus containing almost 700,000 running words. In SemCor all the words are tagged by PoS, and more than 200,000 content words are also lemmatized and sense-tagged according to WordNet. To build MultiSemCor, the English texts are translated into Italian, then the parallel texts are aligned at word level and the word sense annotations from the SemCor tagged texts are transferred to the Italian translations. The final result of the project will be the MultiSemCor corpus, an Italian corpus annotated with PoS, lemma and word sense, but also an aligned English/Italian parallel corpus lexically annotated with a shared inventory of word senses taken from MultiWordNet. Up to now, MultiSemcor is composed of 116 Italian texts aligned to the corresponding English texts and linguistically annotated.

  • WordNet Domains
    WordNet Domains is an extension of WordNet 1.6 where each synset has been annotated with at least one domain label (e.g. medicine, sports, law, etc.), selected from a set of about two hundred labels hierarchically organized.

  • Domain-specific Lexical Databases
    Some work has been done on creating wordnets for specialized domains and integrating them into MultiWordNet. A set of procedures have been defined to allow an integrated access of the two resources, such that overlapping senses are merged and conflicting situations are properly managed. Two past projects dealt with the economic and philosophical domains, while an ongoing project deals with the architectural domain.
    • The Economic-WordNet project led to the creaation of a specialized wordnet for the economic domain consising of 5,130 lemmas distributed in 4,687 synsets.
    • In the Philonet project we studied methodologies to devolop a specialized wordnet for the philosophical domain and to create tools for computer-aided analysis of philosophical texts.
    • The ArchiWordNet project aims at creating a specialized wordnet for the domain of architecture integrated in the MultiWordNet lexical database. This resource will be used as a thesaurus in an architecture image archive, both for indexing and retrieval of the images. Moreover, it will be used for educational purposes.

  • Extension of MultiWordNet to other languages: Hebrew WordNet
    Hebrew WordNet is a research project carried on in conjunction with the University of Haifa aiming at creating a WordNet for the Hebrew language within the framework of the MultiWordNet approach.
MultiWordNet ® - All rights reserved.      926090 visitors (since 26-Jul-2004) maintainer Girardi C. :