Super-sense tagging is the task of annotating each word in a text with a super-sense, i.e. a general concept such as animal, food or person, coming from the general semantic taxonomy defined by the WordNet lexicographer classes. Due to the small set of involved concepts, the task is simpler than Word Sense Disambiguation, which identifies a specific meaning for each word. The small set of concepts allows machine learning algorithms to achieve good performance when coping with the problem of tagging. However, machine learning algorithms suffer from data-sparseness. This problem becomes more evident when lexical features are involved, because test data can contain words with low frequency (or completely absent) in training data. To overcome the sparseness problem, this paper proposes a supervised method for super-sense tagging which incorporates information coming from a distributional space of words built on a large corpus. Results obtained on two standard datasets, SemCor and SensEval-3, show the effectiveness of our approach.

Supervised Learning and Distributional Semantic Models for Super-Sense Tagging

BASILE, PIERPAOLO;CAPUTO, ANNALINA;SEMERARO, Giovanni
2013-01-01

Abstract

Super-sense tagging is the task of annotating each word in a text with a super-sense, i.e. a general concept such as animal, food or person, coming from the general semantic taxonomy defined by the WordNet lexicographer classes. Due to the small set of involved concepts, the task is simpler than Word Sense Disambiguation, which identifies a specific meaning for each word. The small set of concepts allows machine learning algorithms to achieve good performance when coping with the problem of tagging. However, machine learning algorithms suffer from data-sparseness. This problem becomes more evident when lexical features are involved, because test data can contain words with low frequency (or completely absent) in training data. To overcome the sparseness problem, this paper proposes a supervised method for super-sense tagging which incorporates information coming from a distributional space of words built on a large corpus. Results obtained on two standard datasets, SemCor and SensEval-3, show the effectiveness of our approach.
2013
978-3-319-03523-9
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/63409
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact