Motivation: A major challenge in current biomedical research is the identification of cellular processes deregulated in a given pathology through the analysis of gene expression profiles. To this end, predefined lists of genes, coding specific functions, are compared with a list of genes ordered according to their values of differential expression measured by suitable univariate statistics. Results: We propose a statistically well-founded method for measuring the relevance of predefined lists of genes and for assessing their statistical significance starting from their raw expression levels as recorded on the microarray. We use prediction accuracy as a measure of relevance of the list. The rationale is that a functional category, coded through a list of genes, is perturbed in a given pathology if it is possible to correctly predict the occurrence of the disease in new subjects on the basis of the expression levels of the genes belonging to the list only. The accuracy is estimated with multiple random validation strategy and its statistical significance is assessed against a couple of null hypothesis, by using two independent permutation tests. The utility of the proposed methodology is illustrated by analyzing the relevance of Gene Ontology terms belonging to biological process category in colon and prostate cancer, by using three different microarray data sets and by comparing it with current approaches. Availability: Source code for the algorithms is available from author upon request. Contact: ancona@ba.issia.cnr.it Supplementary information: Colon cancer data set and a complete description of experimental results are available at: ftp://bioftp:76bioftpxxx@marx.ba.issia.cnr.it/supp-info.htm
Statistical assessment of functional categories of genes deregulated in pathological conditions by using microarray data
PESOLE, Graziano;
2007-01-01
Abstract
Motivation: A major challenge in current biomedical research is the identification of cellular processes deregulated in a given pathology through the analysis of gene expression profiles. To this end, predefined lists of genes, coding specific functions, are compared with a list of genes ordered according to their values of differential expression measured by suitable univariate statistics. Results: We propose a statistically well-founded method for measuring the relevance of predefined lists of genes and for assessing their statistical significance starting from their raw expression levels as recorded on the microarray. We use prediction accuracy as a measure of relevance of the list. The rationale is that a functional category, coded through a list of genes, is perturbed in a given pathology if it is possible to correctly predict the occurrence of the disease in new subjects on the basis of the expression levels of the genes belonging to the list only. The accuracy is estimated with multiple random validation strategy and its statistical significance is assessed against a couple of null hypothesis, by using two independent permutation tests. The utility of the proposed methodology is illustrated by analyzing the relevance of Gene Ontology terms belonging to biological process category in colon and prostate cancer, by using three different microarray data sets and by comparing it with current approaches. Availability: Source code for the algorithms is available from author upon request. Contact: ancona@ba.issia.cnr.it Supplementary information: Colon cancer data set and a complete description of experimental results are available at: ftp://bioftp:76bioftpxxx@marx.ba.issia.cnr.it/supp-info.htmI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.