The JIGSAW Algorithm for Word Sense Disambiguation and Semantic Indexing of Documents

Basile, P.; Degemmis, Marco; Gentile, A. L.; Lops, Pasquale; Semeraro, Giovanni

doi:10.1007/978-3-540-74782-6_28

Word Sense Disambiguation (WSD) is traditionally considered an AI-hard problem. In fact, a breakthrough in this field would have a significant impact on many relevant fields, such as information retrieval and information extraction. This paper describes JIGSAW, a knowledge-based WSD algorithm that attemps to disambiguate all words in a text by exploiting WordNet(1) senses. The main assumption is that a Part-Of-Speech (POS)-dependent strategy to WSD can turn out to be more effective than a unique strategy. Semantics provided by WSD gives an added value to applications centred on humans as users. Two empirical evaluations are described in the paper. First, we evaluated the accuracy of JIGSAW on Task 1 of SEMEVAL-1 competition(2). This task measures the effectiveness of a WSD algorithm in an Information Retrieval System. For the second evaluation, we used semantically indexed documents obtained through a WSD process in order to train a native Bayes learner that infers semantic sense-based user profiles as binary text classifiers. The goal of the second empirical evaluation has been to measure the accuracy of the user profiles in selecting relevant documents to be recommended within a document collection.