Semi-supervised trees for multi-target regression

IRIS

The predictive performance of traditional supervised methods heavily depends on the amount of labeled data. However, obtaining labels is a difficult process in many real-life tasks, and only a small amount of labeled data is typically available for model learning. As an answer to this problem, the concept of semi-supervised learning has emerged. Semi-supervised methods use unlabeled data in addition to labeled data to improve the performance of supervised methods. It is even more difficult to get labeled data for data mining problems with structured outputs since several labels need to be determined for each example. Multi-target regression (MTR) is one type of a structured output prediction problem, where we need to simultaneously predict multiple continuous variables. Despite the apparent need for semi-supervised methods able to deal with MTR, only a few such methods are available and even those are difficult to use in practice and/or their advantages over supervised methods for MTR are not clear. This paper presents an extension of predictive clustering trees for MTR and ensembles thereof towards semi-supervised learning. The proposed method preserves the appealing characteristic of decision trees while enabling the use of unlabeled examples. In particular, the proposed semi-supervised trees for MTR are interpretable, easy to understand, fast to learn, and can handle both numeric and nominal descriptive features. We perform an extensive empirical evaluation in both an inductive and a transductive semi-supervised setting. The results show that the proposed method improves the performance of supervised predictive clustering trees and enhances their interpretability (due to reduced tree size), whereas, in the ensemble learning scenario, it outperforms its supervised counterpart in the transductive setting. The proposed methods have a mechanism for controlling the influence of unlabeled examples, which makes them highly useful in practice: This mechanism can protect them against a degradation of performance of their supervised counterparts – an inherent risk of semi-supervised learning. The proposed methods also outperform two existing semi-supervised methods for MTR.

Semi-supervised trees for multi-target regression

Jurica Levatić;Dragi Kocev;Michelangelo Ceci;Sašo Džeroski

2018-01-01

Abstract

The predictive performance of traditional supervised methods heavily depends on the amount of labeled data. However, obtaining labels is a difficult process in many real-life tasks, and only a small amount of labeled data is typically available for model learning. As an answer to this problem, the concept of semi-supervised learning has emerged. Semi-supervised methods use unlabeled data in addition to labeled data to improve the performance of supervised methods. It is even more difficult to get labeled data for data mining problems with structured outputs since several labels need to be determined for each example. Multi-target regression (MTR) is one type of a structured output prediction problem, where we need to simultaneously predict multiple continuous variables. Despite the apparent need for semi-supervised methods able to deal with MTR, only a few such methods are available and even those are difficult to use in practice and/or their advantages over supervised methods for MTR are not clear. This paper presents an extension of predictive clustering trees for MTR and ensembles thereof towards semi-supervised learning. The proposed method preserves the appealing characteristic of decision trees while enabling the use of unlabeled examples. In particular, the proposed semi-supervised trees for MTR are interpretable, easy to understand, fast to learn, and can handle both numeric and nominal descriptive features. We perform an extensive empirical evaluation in both an inductive and a transductive semi-supervised setting. The results show that the proposed method improves the performance of supervised predictive clustering trees and enhances their interpretability (due to reduced tree size), whereas, in the ensemble learning scenario, it outperforms its supervised counterpart in the transductive setting. The proposed methods have a mechanism for controlling the influence of unlabeled examples, which makes them highly useful in practice: This mechanism can protect them against a degradation of performance of their supervised counterparts – an inherent risk of semi-supervised learning. The proposed methods also outperform two existing semi-supervised methods for MTR.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2018

Appare nelle tipologie:

1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Semi-supervised.pdf non disponibili Tipologia: Documento in Versione Editoriale Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 896.02 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	896.02 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
Levatić et al 2017c_ Semi-supervised Trees for Multi-target Regression.pdf accesso aperto Tipologia: Documento in Pre-print Licenza: Creative commons Dimensione 552.74 kB Formato Adobe PDF Visualizza/Apri	552.74 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/214582

Citazioni

ND

43

36

social impact