Symbolic data analysis generalizes some standard statistical data mining methods, such as those developed for classification and clustering tasks, to the case of symbolic objects (SOs). These objects, informally defined as “aggregated data” because they synthesize information concerning a group of individuals of a population, ensure confidentiality of original data, nevertheless they pose new problems which finds a solution in symbolic data analysis. A by-product of working with aggregate data is the possibility of dealing with data from complex questionnaires, where multiple answers are possible or constraints among different answers exists. Comparing SOs is an important step of symbolic data analysis. It can be useful either to cluster some SOs or to discriminate between them, or even to order SOs according to their degree of generalization. This paper presents a comparative study aiming at evaluating the degree of dissimilarity between the objects of a restricted class of symbolic data, namely Probabilistic Symbolic Objects. To define a ground truth for the empirical evaluation, a data set with understandable and explainable properties has been selected. In the experiment, only two dissimilarity measures, among the seven ones we have studied, seems to have a more stable behaviour.

Comparing dissimilarity measures for probabilistic symbolic objects

MALERBA, Donato;ESPOSITO, Floriana;
2002-01-01

Abstract

Symbolic data analysis generalizes some standard statistical data mining methods, such as those developed for classification and clustering tasks, to the case of symbolic objects (SOs). These objects, informally defined as “aggregated data” because they synthesize information concerning a group of individuals of a population, ensure confidentiality of original data, nevertheless they pose new problems which finds a solution in symbolic data analysis. A by-product of working with aggregate data is the possibility of dealing with data from complex questionnaires, where multiple answers are possible or constraints among different answers exists. Comparing SOs is an important step of symbolic data analysis. It can be useful either to cluster some SOs or to discriminate between them, or even to order SOs according to their degree of generalization. This paper presents a comparative study aiming at evaluating the degree of dissimilarity between the objects of a restricted class of symbolic data, namely Probabilistic Symbolic Objects. To define a ground truth for the empirical evaluation, a data set with understandable and explainable properties has been selected. In the experiment, only two dissimilarity measures, among the seven ones we have studied, seems to have a more stable behaviour.
2002
1-85312-925-9
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/6736
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 26
  • ???jsp.display-item.citation.isi??? ND
social impact