MicroRNAs (miRNAs) are a set of short non coding RNAs that play significant regulatory roles in cells. The study of miRNA data can be of valuable support for the early diagnosis of multifactorial diseases such as pediatric Multiple Sclerosis. However the analysis of miRNA expressions poses several challenges due to high dimensionality and imbalance of data. In this paper we present a data science workflow to develop a predictive model that is intended to support the clinicians in the diagnosis of Multiple Sclerosis starting from miRNA data produced by Next-Generation Sequencing. The goal is to create an effective model able to predict the pathological condition of a patient starting from his miRNA expression profile. Based on the proposed workflow, the miRNA dataset is firstly preprocessed in order to reduce its high dimensionality (from 1287 features to 40 features) and to mitigate class imbalance. Then a classification model is learnt from data via neural network training. Results show that the model defined by using the 40 data-driven selected features achieves an overall classification accuracy of 94% on test data and overcomes the model based on 42 features selected by the experts that achieves only 83% of overall accuracy.
Scheda prodotto non validato
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo
|Titolo:||A Predictive Model for MicroRNA Expressions in Pediatric Multiple Sclerosis Detection|
|Data di pubblicazione:||2019|
|Appare nelle tipologie:||2.1 Contributo in volume (Capitolo o Saggio)|