Link prediction in network data is a data mining task which is receiving significant attention due to its applicability in various domains. An example can be found in social network analysis, where the goal is to identify connections between users. Another application can be found in computational biology, where the goal is to identify previously unknown relationships among biological entities. For example, the identification of regulatory activities (links) among genes would allow biologists to discover possible gene regulatory networks. In the literature, several approaches for link prediction can be found, but they often fail in simultaneously considering all the possible criteria (e.g. network topology, nodes properties, autocorrelation among nodes). In this paper we present a semi-supervised data mining approach which learns to combine the scores returned by several link prediction algorithms. The proposed solution exploits both a small set of validated examples of links and a huge set of unlabeled links. The application we consider regards the identification of links between genes and miRNAs, which can contribute to the understanding of their roles in many biological processes. The specific application requires to learn from only positively labeled examples of links and to face with the high unbalancing between labeled and unlabeled examples. Results show a significant improvement with respect to single prediction algorithms and with respect to baseline combination.

Learning to Combine miRNA Target Predictions: a Semi-supervised Ensemble Learning Approach (Discussion Paper)

PIO, GIANVITO;CECI, MICHELANGELO;MALERBA, Donato
2014-01-01

Abstract

Link prediction in network data is a data mining task which is receiving significant attention due to its applicability in various domains. An example can be found in social network analysis, where the goal is to identify connections between users. Another application can be found in computational biology, where the goal is to identify previously unknown relationships among biological entities. For example, the identification of regulatory activities (links) among genes would allow biologists to discover possible gene regulatory networks. In the literature, several approaches for link prediction can be found, but they often fail in simultaneously considering all the possible criteria (e.g. network topology, nodes properties, autocorrelation among nodes). In this paper we present a semi-supervised data mining approach which learns to combine the scores returned by several link prediction algorithms. The proposed solution exploits both a small set of validated examples of links and a huge set of unlabeled links. The application we consider regards the identification of links between genes and miRNAs, which can contribute to the understanding of their roles in many biological processes. The specific application requires to learn from only positively labeled examples of links and to face with the high unbalancing between labeled and unlabeled examples. Results show a significant improvement with respect to single prediction algorithms and with respect to baseline combination.
2014
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/65662
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact