Data-driven analysis of collections of big datasets by the Bi-CoPaM method yields field-specific novel insights

IRIS

Massive amounts of data have recently been, and are increasingly being, generated from various fields, such as bioinformatics, neuroscience and social networks. Many of these big datasets were generated to answer specific research questions, and were analysed accordingly. However, the scope of information contained in these datasets can usually answer much broader questions than what was originally intended. Moreover, many existing big datasets are related to each other but have different detailed specifications, and the mutual information that can be extracted from them collectively has been not commonly considered. To bridge this gap between the fast pace of data generation and the slower pace of data analysis, and to exploit the massive amounts of existing data, we suggest employing data-driven explorations to analyse collections of related big datasets. This approach aims at extracting field-specific novel findings which can be revealed from the data without being driven by specific questions or hypotheses. To realise this paradigm, we introduced the binarisation of consensus partition matrices (Bi- CoPaM) method, with the ability of analysing collections of heterogeneous big datasets to identify clusters of consistently correlated objects. We demonstrate the power of data-driven explorations by applying the Bi-CoPaM to two collections of big datasets from two distinct fields, namely bioinformatics and neuroscience. In the first application, the collective analysis of forty yeast gene expression datasets identified a novel cluster of genes and some new biological hypotheses regarding their function and regulation. In the other application, the analysis of 1,856 big fMRI datasets identified three functionally connected neural networks related to visual, reward and auditory systems during affective processing. These experiments reveal the broad applicability of this paradigm to various fields, and thus encourage exploring the large amounts of partially exploited existing datasets, preferably as collections of related datasets, with a similar approach.

Data-driven analysis of collections of big datasets by the Bi-CoPaM method yields field-specific novel insights

Abu-Jamous B.;Liu C.;Roberts D. J.;Brattico E.;Nandi A. K.

2017-01-01

Abstract

Massive amounts of data have recently been, and are increasingly being, generated from various fields, such as bioinformatics, neuroscience and social networks. Many of these big datasets were generated to answer specific research questions, and were analysed accordingly. However, the scope of information contained in these datasets can usually answer much broader questions than what was originally intended. Moreover, many existing big datasets are related to each other but have different detailed specifications, and the mutual information that can be extracted from them collectively has been not commonly considered. To bridge this gap between the fast pace of data generation and the slower pace of data analysis, and to exploit the massive amounts of existing data, we suggest employing data-driven explorations to analyse collections of related big datasets. This approach aims at extracting field-specific novel findings which can be revealed from the data without being driven by specific questions or hypotheses. To realise this paradigm, we introduced the binarisation of consensus partition matrices (Bi- CoPaM) method, with the ability of analysing collections of heterogeneous big datasets to identify clusters of consistently correlated objects. We demonstrate the power of data-driven explorations by applying the Bi-CoPaM to two collections of big datasets from two distinct fields, namely bioinformatics and neuroscience. In the first application, the collective analysis of forty yeast gene expression datasets identified a novel cluster of genes and some new biological hypotheses regarding their function and regulation. In the other application, the analysis of 1,856 big fMRI datasets identified three functionally connected neural networks related to visual, reward and auditory systems during affective processing. These experiments reveal the broad applicability of this paradigm to various fields, and thus encourage exploring the large amounts of partially exploited existing datasets, preferably as collections of related datasets, with a similar approach.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
			2017
		
	Codice ISBN
	
			978-981-10-4234-8
978-981-10-4235-5
		
	Appare nelle tipologie:
	
			2.1 Contributo in volume (Capitolo o Saggio)

File in questo prodotto:

File	Dimensione	Formato
Abu-Jamous17_Chapter_proofs.pdf non disponibili Tipologia: Documento in Post-print Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 822.42 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	822.42 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/280201

Citazioni

ND

0

0

social impact