Motivation: Catalogs, such as Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically (general functions include more specific functions). This has recently motivated the development of several machine learning algorithms under the assumption that instances may belong to multiple hierarchy organized classes. Besides relationships among classes, it is also possible to identify relationships among examples. Although such relationships have been identified and extensively studied in the in the area of protein-to-protein interaction (PPI) networks, they have not received much attention in hierarchical protein function prediction. The use of such relationships between genes introduces autocorrelation and violates the assumption that instances are independently and identically distributed, which underlines most machine learning algorithms. While this consideration introduces additional complexity to the learning process, we expect it would also carry substantial benefits. Results: This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). The empirical evaluation of the proposed algorithm, called NHMC, on 24 yeast datasets using MIPSFUN and GO annotations and exploiting three different PPI networks, clearly shows that taking autocorrelation into account improves performance. Conclusions: Our results suggest that explicitly taking network autocorrelation into account increases the predictive capability of the models, especially when the underlying PPI network is dense. Furthermore, NHMC can be used as a tool to assess network data and the information it provides with respect to the gene function.

Using PPI Networks in Hierarchical Multi-label Classification Trees for Gene Function Prediction

CECI, MICHELANGELO;
2012-01-01

Abstract

Motivation: Catalogs, such as Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically (general functions include more specific functions). This has recently motivated the development of several machine learning algorithms under the assumption that instances may belong to multiple hierarchy organized classes. Besides relationships among classes, it is also possible to identify relationships among examples. Although such relationships have been identified and extensively studied in the in the area of protein-to-protein interaction (PPI) networks, they have not received much attention in hierarchical protein function prediction. The use of such relationships between genes introduces autocorrelation and violates the assumption that instances are independently and identically distributed, which underlines most machine learning algorithms. While this consideration introduces additional complexity to the learning process, we expect it would also carry substantial benefits. Results: This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). The empirical evaluation of the proposed algorithm, called NHMC, on 24 yeast datasets using MIPSFUN and GO annotations and exploiting three different PPI networks, clearly shows that taking autocorrelation into account improves performance. Conclusions: Our results suggest that explicitly taking network autocorrelation into account increases the predictive capability of the models, especially when the underlying PPI network is dense. Furthermore, NHMC can be used as a tool to assess network data and the information it provides with respect to the gene function.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/35572
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact