The application of high-performance Next-Generation Sequencing (NGS) technologies is widely used to characterize case-control comparison studies for RNA transcripts, such as mRNAs and small non-coding RNAs. The first step in the analy- sis strategies is mapping NGS reads against a reference database, and a critical issue is choosing how to deal with multiread problem. In this paper we present a novel ap- proach to represent and quantify read mapping ambiguities through the use of fuzzy sets and possibility theory. The aim of this work is to obtain a list of candidate differential expression events, ordered by significance, providing a description of the uncertainty of the results due to the multiread issue. A preliminary experiment on a case-control study of human endobronchial biopsies resulted in the identification of 9 genes with possible differential expression, four of them with an uncertain fold change. This result was con- firmed by FDR adjusted Fisher’s test, while the same data processed with DESeq2 did not provide significant differences between case and control.

High-performance Next-Generation Sequencing (NGS) has become a widely used technology to characterize case-control comparison studies for RNA transcripts, such as mRNAs and small non-coding RNAs. The first step in the analysis strategies is mapping NGS reads against a reference database and a critical issue emerges in this phase: the problem of multireads. In this paper we present a novel approach to represent and quantify read mapping ambiguities through the use of fuzzy sets and possibility theory. The aim of this work is to obtain a list of candidate differential expression events, providing a description of the uncertainty of the results due to multiread presence. In a preliminary experiment on HeLa cells, the method correctly detected the possibility of false positiveness, while on a case-control study of human endobronchial biopsies, the method identified 11 genes with possible different expression, four of them with an uncertain fold change. This last result was confirmed by FDR adjusted Fisher’s test, while DESeq2 did not provide significant differences between case and control.

Managing NGS differential expression uncertainty with fuzzy sets

MENCAR, CORRADO;
2016-01-01

Abstract

High-performance Next-Generation Sequencing (NGS) has become a widely used technology to characterize case-control comparison studies for RNA transcripts, such as mRNAs and small non-coding RNAs. The first step in the analysis strategies is mapping NGS reads against a reference database and a critical issue emerges in this phase: the problem of multireads. In this paper we present a novel approach to represent and quantify read mapping ambiguities through the use of fuzzy sets and possibility theory. The aim of this work is to obtain a list of candidate differential expression events, providing a description of the uncertainty of the results due to multiread presence. In a preliminary experiment on HeLa cells, the method correctly detected the possibility of false positiveness, while on a case-control study of human endobronchial biopsies, the method identified 11 genes with possible different expression, four of them with an uncertain fold change. This last result was confirmed by FDR adjusted Fisher’s test, while DESeq2 did not provide significant differences between case and control.
2016
9783319443317
The application of high-performance Next-Generation Sequencing (NGS) technologies is widely used to characterize case-control comparison studies for RNA transcripts, such as mRNAs and small non-coding RNAs. The first step in the analy- sis strategies is mapping NGS reads against a reference database, and a critical issue is choosing how to deal with multiread problem. In this paper we present a novel ap- proach to represent and quantify read mapping ambiguities through the use of fuzzy sets and possibility theory. The aim of this work is to obtain a list of candidate differential expression events, ordered by significance, providing a description of the uncertainty of the results due to the multiread issue. A preliminary experiment on a case-control study of human endobronchial biopsies resulted in the identification of 9 genes with possible differential expression, four of them with an uncertain fold change. This result was con- firmed by FDR adjusted Fisher’s test, while the same data processed with DESeq2 did not provide significant differences between case and control.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/180403
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact