User profiling is a fundamental task in Web personalization. Fuzzy clustering is a valid approach to derive user profiles by capturing similar user interests from web usage data available in log files. Often, fuzzy clustering is based on the assumption that data lay on an Euclidean space; however, clustering based on Euclidean distance can lead the clustering process to find user representations that do not capture the semantic information incorporated in the original Web usage data. In this paper, we propose a different approach to express similarity between Web users. The measure is based on the evaluation of similarity between fuzzy sets. The proposed measure is employed in a elational fuzzy clustering algorithm to discover clusters embedded in the Web usage data and derive profiles modeling the real user preferences. An application example on usage data extracted from log files of a sample Web site is reported and a comparison with the results obtained using the cosine measure is shown to demonstrate the effectiveness of the proposed similarity measure.
Similarity-based clustering for user profiling
CASTELLANO, GIOVANNA;FANELLI, Anna Maria;MENCAR, CORRADO;
2007-01-01
Abstract
User profiling is a fundamental task in Web personalization. Fuzzy clustering is a valid approach to derive user profiles by capturing similar user interests from web usage data available in log files. Often, fuzzy clustering is based on the assumption that data lay on an Euclidean space; however, clustering based on Euclidean distance can lead the clustering process to find user representations that do not capture the semantic information incorporated in the original Web usage data. In this paper, we propose a different approach to express similarity between Web users. The measure is based on the evaluation of similarity between fuzzy sets. The proposed measure is employed in a elational fuzzy clustering algorithm to discover clusters embedded in the Web usage data and derive profiles modeling the real user preferences. An application example on usage data extracted from log files of a sample Web site is reported and a comparison with the results obtained using the cosine measure is shown to demonstrate the effectiveness of the proposed similarity measure.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.