Human Language Technology Group

  language word  
  The project
  On-line interface
  Related work
  Data format
  Download brochure
  Become a contributor
The MultiWordNet project

  • The approach
  • The semi-automatic creation of MultiWordNet
  • What's in MultiWordNet
  • How MultiWordNet is used in Natural Language Processing applications

    The semi-automatic creation of MultiWordNet

    In the construction of the Italian component of MultiWordNet, we have developed techniques for the semi-automatic acquisition of lexical information, in order to speed up both the construction of the corresponding Italian synsets and the detection of lexical divergences between English and Italian. These techniques rely on various sources, among which Princeton WordNet, and the Collins English/Italian bilingual dictionary.

    Two main procedures have been developed, called Assign-procedure and Lexical Gaps-procedure.
    The Assign-procedure exploits the information on translation equivalents contained in the Collins dictionary to build Italian synsets in correspondence with the synsets already existing in the Princeton Wordnet. A mapping algorithm takes as input an Italian word sense, with all the related information, and tries to assign the sense to a synset of the English WordNet. The algorithm is based on the activation of a number of rules, each of them taking into consideration a particular kind of information, such as, for example, the presence of a semantic code in the Italian sense, e.g. the label CULINARY for one of the three senses of the word "pizza". Each rule contributes to the assignment with a partial score. The output of the algorithm is either an assignment of the Italian sense to a certain English synset, when the global score (given by the sum of the partial scores contributed by the single rules) reaches a fixed threshold, or a failure to assign the sense, when the global score does not reach the threshold. The set of assignments produced looking into all the senses of the Italian words of the dictionary constitutes an automatically created Italian WordNet aligned with PWN. The data of the automatic version are then tested against manually acquired data, with the aim of incrementally improve the precision level of the algorithm.

    The Lexical Gaps-procedure identifies lexical gaps in a semi-automatic way. The procedure classifies translation equivalents in two main groups: idioms and restricted collocations on the one hand and free combinations of words (which imply gaps) on the other hand. Knowledge contained in dictionaries and structural regularities exhibited by idioms, restricted collocations and gaps can be exploited to automatically distinguish them from each other with a certain degree of confidence. The procedure is able to detect lexical gaps both from English to Italian and from Italian to English.
    For details about the procedures see [Pianta et al. 2002].

    A final manual check is performed on all the data automatically acquired, in order to guarantee the reliability of the resource.

  • MultiWordNet ® - All rights reserved.      918589 visitors (since 26-Jul-2004) maintainer Girardi C. :