In this paper, we tackle the problem of clustering individual resources in the context of the Web of Data, that is characterized by a huge amount of data published in a standard data model with a well-defined semantics based on Web ontologies. In fact, clustering methods offer an effective solution to support a lot of complex related activities, such as ontology construction, debugging and evolution, taking into account the inherent incompleteness underlying the representation. Web ontologies already encode a hierarchical organization of the resources by means of the subsumption hierarchy of the classes, which may be expressed explicitly, with proper subsumption axioms, or it must be detected indirectly, by reasoning on the available axioms that define the classes (classification). However it frequently happens that such classes are sparsely populated as the hierarchy often reflect a view of the knowledge engineer prior to the actual introduction of assertions involving the individual resources. As a result, very general classes are often loosely populated, but this may happen also to specific subclasses, making it more difficult to check the types of a resource (instance checking), even through reasoning services. Among the large number of algorithms proposed in the Machine Learning literature, we propose a clustering method that is able to organize groups of resources hierarchically. Specifically, in this work, we introduce a conceptual clustering approach that combines a distance measure between individuals in a knowledge base in a divide-and-conquer solution that is intended to elicit ex post the underlying hierarchy based on the actual distributions of the instances.
Induction of terminological cluster trees: Preliminaries, model, method and perspectives
RIZZO, GIUSEPPE;D'AMATO, CLAUDIA;FANIZZI, Nicola;ESPOSITO, Floriana
2016-01-01
Abstract
In this paper, we tackle the problem of clustering individual resources in the context of the Web of Data, that is characterized by a huge amount of data published in a standard data model with a well-defined semantics based on Web ontologies. In fact, clustering methods offer an effective solution to support a lot of complex related activities, such as ontology construction, debugging and evolution, taking into account the inherent incompleteness underlying the representation. Web ontologies already encode a hierarchical organization of the resources by means of the subsumption hierarchy of the classes, which may be expressed explicitly, with proper subsumption axioms, or it must be detected indirectly, by reasoning on the available axioms that define the classes (classification). However it frequently happens that such classes are sparsely populated as the hierarchy often reflect a view of the knowledge engineer prior to the actual introduction of assertions involving the individual resources. As a result, very general classes are often loosely populated, but this may happen also to specific subclasses, making it more difficult to check the types of a resource (instance checking), even through reasoning services. Among the large number of algorithms proposed in the Machine Learning literature, we propose a clustering method that is able to organize groups of resources hierarchically. Specifically, in this work, we introduce a conceptual clustering approach that combines a distance measure between individuals in a knowledge base in a divide-and-conquer solution that is intended to elicit ex post the underlying hierarchy based on the actual distributions of the instances.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.