MicroRNAs (miRNAs) are a set of short non coding RNAs that play significant regulatory roles in cells. The study of miRNA data can be of valuable support for the early diagnosis of multifactorial diseases such as pediatric Multiple Sclerosis. However the analysis of miRNA expressions poses several challenges due to high dimensionality and imbalance of data. In this paper we present a data science workflow to develop a predictive model that is intended to support the clinicians in the diagnosis of Multiple Sclerosis starting from miRNA data produced by Next-Generation Sequencing. The goal is to create an effective model able to predict the pathological condition of a patient starting from his miRNA expression profile. Based on the proposed workflow, the miRNA dataset is firstly preprocessed in order to reduce its high dimensionality (from 1287 features to 40 features) and to mitigate class imbalance. Then a classification model is learnt from data via neural network training. Results show that the model defined by using the 40 data-driven selected features achieves an overall classification accuracy of 94% on test data and overcomes the model based on 42 features selected by the experts that achieves only 83% of overall accuracy.

A Predictive Model for MicroRNA Expressions in Pediatric Multiple Sclerosis Detection

Casalino, Gabriella;Castellano, Giovanna
;
Nuzziello, Nicoletta;
2019

Abstract

MicroRNAs (miRNAs) are a set of short non coding RNAs that play significant regulatory roles in cells. The study of miRNA data can be of valuable support for the early diagnosis of multifactorial diseases such as pediatric Multiple Sclerosis. However the analysis of miRNA expressions poses several challenges due to high dimensionality and imbalance of data. In this paper we present a data science workflow to develop a predictive model that is intended to support the clinicians in the diagnosis of Multiple Sclerosis starting from miRNA data produced by Next-Generation Sequencing. The goal is to create an effective model able to predict the pathological condition of a patient starting from his miRNA expression profile. Based on the proposed workflow, the miRNA dataset is firstly preprocessed in order to reduce its high dimensionality (from 1287 features to 40 features) and to mitigate class imbalance. Then a classification model is learnt from data via neural network training. Results show that the model defined by using the 40 data-driven selected features achieves an overall classification accuracy of 94% on test data and overcomes the model based on 42 features selected by the experts that achieves only 83% of overall accuracy.
978-3-030-26772-8
978-3-030-26773-5
File in questo prodotto:
File Dimensione Formato  
2019_A Predictive Model for MicroRNA Expressions in Pediatric Multiple Sclerosis Detection_Casalino et al.pdf

non disponibili

Descrizione: A Predictive Model for MicroRNA Expressions in Pediatric Multiple Sclerosis Detection
Tipologia: Documento in Versione Editoriale
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 601.63 kB
Formato Adobe PDF
601.63 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
MDAI2019_preprint.pdf

non disponibili

Tipologia: Documento in Post-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 707.01 kB
Formato Adobe PDF
707.01 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11586/241771
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? ND
social impact