Heterogeneous networks are networks consisting of different types of objects and links. They can be found in several fields, ranging from the Internet to social sciences, biology, epidemiology, geography, finance, and many others. In the literature, several methods have been proposed for the analysis of network data, but they usually focus on homogeneous networks, where all the objects are of the same type, and links among them describe a single type of relationship. More recently, the complexity of real scenarios has impelled researchers to design methods for the analysis of heterogeneous networks, especially focused on classification and clustering tasks. However, they often make assumptions on the structure of the network that are too restrictive or do not fully exploit different forms of network correlation and autocorrelation. Moreover, when nodes which are the main subject of the classification task are linked to several nodes of the network having missing values, standard methods can lead to either building incomplete classification models or to discarding possibly relevant dependencies (correlation or autocorrelation). In this paper, we propose an ensemble learning approach for multi-type classification. We adopt the system Mr-SBC, which is originally able to analyze heterogeneous networks of arbitrary structure, within an ensemble learning approach. The ensemble allows us to improve the classification accuracy of Mr-SBC by exploiting i) the possible presence of correlation and autocorrelation phenomena, and ii) the classification of instances (which contain missing values) of other node types in the network. As a beneficial side effect, we have also that the models are more stable in terms of standard deviation of the accuracy, over different samples used for training. Experiments performed on real-world datasets show that the proposed method is able to significantly outperform the standard implementation of Mr-SBC. Moreover, it gives Mr-SBC the advantage of outperforming four other well-known algorithms for the classification of data organized in a network.
Ensemble Learning for Multi-Type Classification in Heterogeneous Networks
Serafino, Francesco;Pio, Gianvito;Ceci, Michelangelo
2018-01-01
Abstract
Heterogeneous networks are networks consisting of different types of objects and links. They can be found in several fields, ranging from the Internet to social sciences, biology, epidemiology, geography, finance, and many others. In the literature, several methods have been proposed for the analysis of network data, but they usually focus on homogeneous networks, where all the objects are of the same type, and links among them describe a single type of relationship. More recently, the complexity of real scenarios has impelled researchers to design methods for the analysis of heterogeneous networks, especially focused on classification and clustering tasks. However, they often make assumptions on the structure of the network that are too restrictive or do not fully exploit different forms of network correlation and autocorrelation. Moreover, when nodes which are the main subject of the classification task are linked to several nodes of the network having missing values, standard methods can lead to either building incomplete classification models or to discarding possibly relevant dependencies (correlation or autocorrelation). In this paper, we propose an ensemble learning approach for multi-type classification. We adopt the system Mr-SBC, which is originally able to analyze heterogeneous networks of arbitrary structure, within an ensemble learning approach. The ensemble allows us to improve the classification accuracy of Mr-SBC by exploiting i) the possible presence of correlation and autocorrelation phenomena, and ii) the classification of instances (which contain missing values) of other node types in the network. As a beneficial side effect, we have also that the models are more stable in terms of standard deviation of the accuracy, over different samples used for training. Experiments performed on real-world datasets show that the proposed method is able to significantly outperform the standard implementation of Mr-SBC. Moreover, it gives Mr-SBC the advantage of outperforming four other well-known algorithms for the classification of data organized in a network.File | Dimensione | Formato | |
---|---|---|---|
Ensemble Learning.pdf
non disponibili
Tipologia:
Documento in Versione Editoriale
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
808.59 kB
Formato
Adobe PDF
|
808.59 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
TKDE_MT-MRSBC.pdf
accesso aperto
Tipologia:
Documento in Post-print
Licenza:
Creative commons
Dimensione
1.55 MB
Formato
Adobe PDF
|
1.55 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.