Tree-based models for inductive classification on the Web Of Data

IRIS

The Web of Data, which is one of the dimensions of the Semantic Web (SW), represents a tremendous source of information, which motivates the increasing attention to the formalization and application of machine learning methods for solving tasks such as concept learning, link prediction, inductive instance retrieval in this context. However, the Web of Data is also characterized by various forms of uncertainty, owing to its inherent incompleteness (missing information, uneven data distributions) and noise, which may affect open and distributed architectures. In this paper, we focus on the inductive instance retrieval task regarded as a classification problem. The proposed solution is a framework for learning Terminological Decision Trees from examples described in an ontological knowledge base, to be used for performing instance classifications. For the purpose, suitable pruning strategies and a new prediction procedure are proposed. Furthermore, in order to tackle the class-imbalance distribution problem, the framework is extended to ensembles of Terminological Decision Trees called Terminological Random Forests. The proposed framework has been evaluated, in comparative experiments, with the main state of the art solutions grounded on a similar approach, showing that: (1) the employment of the formalized pruning strategies can improve the model predictiveness; (2) Terminological Random Forests outperform the usage of a single Terminological Decision Tree, particularly when the knowledge base is endowed with a large number of concepts and roles; (3) the framework can be exploited for solving related problems, such as predicting the values of given properties with finite ranges.

Tree-based models for inductive classification on the Web Of Data

RIZZO, GIUSEPPE;D'AMATO, CLAUDIA;FANIZZI, Nicola;ESPOSITO, Floriana

2017-01-01

Abstract

The Web of Data, which is one of the dimensions of the Semantic Web (SW), represents a tremendous source of information, which motivates the increasing attention to the formalization and application of machine learning methods for solving tasks such as concept learning, link prediction, inductive instance retrieval in this context. However, the Web of Data is also characterized by various forms of uncertainty, owing to its inherent incompleteness (missing information, uneven data distributions) and noise, which may affect open and distributed architectures. In this paper, we focus on the inductive instance retrieval task regarded as a classification problem. The proposed solution is a framework for learning Terminological Decision Trees from examples described in an ontological knowledge base, to be used for performing instance classifications. For the purpose, suitable pruning strategies and a new prediction procedure are proposed. Furthermore, in order to tackle the class-imbalance distribution problem, the framework is extended to ensembles of Terminological Decision Trees called Terminological Random Forests. The proposed framework has been evaluated, in comparative experiments, with the main state of the art solutions grounded on a similar approach, showing that: (1) the employment of the formalized pruning strategies can improve the model predictiveness; (2) Terminological Random Forests outperform the usage of a single Terminological Decision Tree, particularly when the knowledge base is endowed with a large number of concepts and roles; (3) the framework can be exploited for solving related problems, such as predicting the values of given properties with finite ranges.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2017

Appare nelle tipologie:

1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Tree-Based Models.pdf non disponibili Tipologia: Documento in Versione Editoriale Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 705.25 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	705.25 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
jws17.pdf accesso aperto Tipologia: Documento in Pre-print Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 481.76 kB Formato Adobe PDF Visualizza/Apri	481.76 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/187332

Citazioni

ND

15

13

social impact