Abstract. Link prediction in network data is a data mining task which is receiving significant attention due to its applicability in various do- mains. An example can be found in social network analysis, where the goal is to identify connections between users. Another application can be found in computational biology, where the goal is to identify previ- ously unknown relationships among biological entities. For example, the identification of regulatory activities (links) among genes would allow bi- ologists to discover possible gene regulatory networks. In the literature, several approaches for link prediction can be found, but they often fail in simultaneously considering all the possible criteria (e.g. network topol- ogy, nodes properties, autocorrelation among nodes). In this paper we present a semi-supervised data mining approach which learns to combine the scores returned by several link prediction algorithms. The proposed solution exploits both a small set of validated examples of links and a huge set of unlabeled links. The application we consider regards the iden- tification of links between genes and miRNAs, which can contribute to the understanding of their roles in many biological processes. The spe- cific application requires to learn from only positively labeled examples of links and to face with the high unbalancing between labeled and unla- beled examples. Results show a significant improvement with respect to single prediction algorithms and with respect to baseline combination.

earning to Combine miRNA Target Predictions: a Semi-supervised Ensemble Learning Approach

CECI, MICHELANGELO;
2014-01-01

Abstract

Abstract. Link prediction in network data is a data mining task which is receiving significant attention due to its applicability in various do- mains. An example can be found in social network analysis, where the goal is to identify connections between users. Another application can be found in computational biology, where the goal is to identify previ- ously unknown relationships among biological entities. For example, the identification of regulatory activities (links) among genes would allow bi- ologists to discover possible gene regulatory networks. In the literature, several approaches for link prediction can be found, but they often fail in simultaneously considering all the possible criteria (e.g. network topol- ogy, nodes properties, autocorrelation among nodes). In this paper we present a semi-supervised data mining approach which learns to combine the scores returned by several link prediction algorithms. The proposed solution exploits both a small set of validated examples of links and a huge set of unlabeled links. The application we consider regards the iden- tification of links between genes and miRNAs, which can contribute to the understanding of their roles in many biological processes. The spe- cific application requires to learn from only positively labeled examples of links and to face with the high unbalancing between labeled and unla- beled examples. Results show a significant improvement with respect to single prediction algorithms and with respect to baseline combination.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/39144
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact