Typically, personalized information recommendation services automatically infer the user profile, a structured model of the user interests, from documents that were already deemed relevant by the user. We present an approach based on Word Sense Disambiguation (WSD) for the extraction of user profiles from documents. This approach relies on a knowledge-based WSD algorithm, called JIGSAW, for the semantic indexing of documents: JIGSAW exploits the WordNet lexical database to select, among all the possible meanings (senses) of a polysemous word, the correct one. Semantically indexed documents are used to train a naive Bayes learner that infers "semantic", sense-based user profiles as binary text classifiers (user-likes and user-dislikes). Two empirical evaluations are described in the paper. In the first experimental session, JIGSAW has been evaluated according to the parameters of the SENSEVAL-3 initiative, that provides a forum where the WSD systems are assessed against disambiguated datasets. The goal of the second, empirical evaluation has been to measure the accuracy of the user profiles in selecting relevant documents to be recommended. Performance of classical keyword-based profiles has been compared to that of sense-based profiles in the task of recommending scientific papers. The results show that sense-based profiles outperform keyword-based ones.

Discovering User Profiles from Semantically Indexed Scientific Papers

SEMERARO, Giovanni;BASILE, PIERPAOLO;DEGEMMIS, MARCO;LOPS, PASQUALE
2007

Abstract

Typically, personalized information recommendation services automatically infer the user profile, a structured model of the user interests, from documents that were already deemed relevant by the user. We present an approach based on Word Sense Disambiguation (WSD) for the extraction of user profiles from documents. This approach relies on a knowledge-based WSD algorithm, called JIGSAW, for the semantic indexing of documents: JIGSAW exploits the WordNet lexical database to select, among all the possible meanings (senses) of a polysemous word, the correct one. Semantically indexed documents are used to train a naive Bayes learner that infers "semantic", sense-based user profiles as binary text classifiers (user-likes and user-dislikes). Two empirical evaluations are described in the paper. In the first experimental session, JIGSAW has been evaluated according to the parameters of the SENSEVAL-3 initiative, that provides a forum where the WSD systems are assessed against disambiguated datasets. The goal of the second, empirical evaluation has been to measure the accuracy of the user profiles in selecting relevant documents to be recommended. Performance of classical keyword-based profiles has been compared to that of sense-based profiles in the task of recommending scientific papers. The results show that sense-based profiles outperform keyword-based ones.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11586/112608
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 3
social impact