The data clustering is a common technique for statistical data analysis.The task is to group objects such that data inside each cluster models the continuity of some environment, while separate clusters model variation over it. CORSO is a method to discover clusters of structured objects possibly related each other according to some relation defining a discrete data structure. Clusters are built by merging partially overlapping neighborhoods of objects, which result homogeneous with respect to the cluster description. The quality of clusters depends on the evaluation of cluster homogeneity as well as the selection of the seeds of the neighborhoods. To face these issues, we illustrate some innovations in CORSO whose validity is confirmed by experimental results.
Clustering Related Structured Objects: Issues and Solutions
MALERBA, Donato;APPICE, ANNALISA;LANZA, Antonietta
2007-01-01
Abstract
The data clustering is a common technique for statistical data analysis.The task is to group objects such that data inside each cluster models the continuity of some environment, while separate clusters model variation over it. CORSO is a method to discover clusters of structured objects possibly related each other according to some relation defining a discrete data structure. Clusters are built by merging partially overlapping neighborhoods of objects, which result homogeneous with respect to the cluster description. The quality of clusters depends on the evaluation of cluster homogeneity as well as the selection of the seeds of the neighborhoods. To face these issues, we illustrate some innovations in CORSO whose validity is confirmed by experimental results.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.