Typically, personalized information recommendation services automatically infer the user profile, a structured model of the user interests, from documents that were already deemed relevant by the user. We present an approach based on Word Sense Disambiguation (WSD) for the extraction of user profiles from documents. This approach relies on a knowledge-based WSD algorithm, called JIGSAW, for the semantic indexing of documents: JIGSAW exploits the WordNet lexical database to select, among all the possible meanings (senses) of a polysemous word, the correct one. Semantically indexed documents are used to train a naive Bayes learner that infers "semantic", sense-based user profiles as binary text classifiers (user-likes and user-dislikes). Two empirical evaluations are described in the paper. In the first experimental session, JIGSAW has been evaluated according to the parameters of the SENSEVAL-3 initiative, that provides a forum where the WSD systems are assessed against disambiguated datasets. The goal of the second, empirical evaluation has been to measure the accuracy of the user profiles in selecting relevant documents to be recommended. Performance of classical keyword-based profiles has been compared to that of sense-based profiles in the task of recommending scientific papers. The results show that sense-based profiles outperform keyword-based ones.
Discovering User Profiles from Semantically Indexed Scientific Papers
SEMERARO, Giovanni;BASILE, PIERPAOLO;DEGEMMIS, MARCO;LOPS, PASQUALE
2007-01-01
Abstract
Typically, personalized information recommendation services automatically infer the user profile, a structured model of the user interests, from documents that were already deemed relevant by the user. We present an approach based on Word Sense Disambiguation (WSD) for the extraction of user profiles from documents. This approach relies on a knowledge-based WSD algorithm, called JIGSAW, for the semantic indexing of documents: JIGSAW exploits the WordNet lexical database to select, among all the possible meanings (senses) of a polysemous word, the correct one. Semantically indexed documents are used to train a naive Bayes learner that infers "semantic", sense-based user profiles as binary text classifiers (user-likes and user-dislikes). Two empirical evaluations are described in the paper. In the first experimental session, JIGSAW has been evaluated according to the parameters of the SENSEVAL-3 initiative, that provides a forum where the WSD systems are assessed against disambiguated datasets. The goal of the second, empirical evaluation has been to measure the accuracy of the user profiles in selecting relevant documents to be recommended. Performance of classical keyword-based profiles has been compared to that of sense-based profiles in the task of recommending scientific papers. The results show that sense-based profiles outperform keyword-based ones.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.