The reconstruction of gene regulatory networks (GRNs) from gene expression data is pivotal for understanding gene regulatory mechanisms and processes. In this context, machine learning and big data analytics tools can be considered fundamental. However, most existing methods (i) produce poor results when the amount of labelled examples is limited or when no negative example is available and (ii) they are not able to exploit information extracted from GRNs of other (better studied) related organisms. We overcome these limitations by proposing an innovative transfer learning method, called BioSfer (Mignone et al., 2020), which can exploit the knowledge about the GRN of a source organism for the reconstruction of the GRN of the target organism. In the first stages, we identify two predictive models to discover unknown links for both the considered GRNs. In the final stage, we build a new geometrically-combined model, which can identify unknown links better. Moreover, the proposed method is natively able to work in the positiveunlabeled setting, where no negative example is available, by fruitfully exploiting a set of unlabeled examples. In our experiments, we reconstructed the human GRN by exploiting the knowledge of the GRN of M. musculus. The qualitative analysis showed that the proposed method is able to identify biologically plausible gene regulations that are not identified by other tools. Results showed that the proposed method outperforms state-of-the-art approaches (Zhang et al., 2017; Wang et al., 2017; Long et al., 2014; Huynh-Thu et al., 2010; Aibar et al., 2017; Mignone et al., 2018) and identifies previously unknown functional relationships among the analysed genes.
Big Data analytics for knowledge transfer among organisms while reconstructing Gene Regulatory Networks
Mignone, Paolo;Pio, Gianvito
;Ceci, Michelangelo
2021-01-01
Abstract
The reconstruction of gene regulatory networks (GRNs) from gene expression data is pivotal for understanding gene regulatory mechanisms and processes. In this context, machine learning and big data analytics tools can be considered fundamental. However, most existing methods (i) produce poor results when the amount of labelled examples is limited or when no negative example is available and (ii) they are not able to exploit information extracted from GRNs of other (better studied) related organisms. We overcome these limitations by proposing an innovative transfer learning method, called BioSfer (Mignone et al., 2020), which can exploit the knowledge about the GRN of a source organism for the reconstruction of the GRN of the target organism. In the first stages, we identify two predictive models to discover unknown links for both the considered GRNs. In the final stage, we build a new geometrically-combined model, which can identify unknown links better. Moreover, the proposed method is natively able to work in the positiveunlabeled setting, where no negative example is available, by fruitfully exploiting a set of unlabeled examples. In our experiments, we reconstructed the human GRN by exploiting the knowledge of the GRN of M. musculus. The qualitative analysis showed that the proposed method is able to identify biologically plausible gene regulations that are not identified by other tools. Results showed that the proposed method outperforms state-of-the-art approaches (Zhang et al., 2017; Wang et al., 2017; Long et al., 2014; Huynh-Thu et al., 2010; Aibar et al., 2017; Mignone et al., 2018) and identifies previously unknown functional relationships among the analysed genes.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.