In recent years, the involvement of the gut microbiota in disease and health has been investigated by sequencing the 16S gene from fecal samples. Dysbiotic gut microbiota was also observed in Autism Spectrum Disorder (ASD), a neurodevelopmental disorder characterized by gastrointestinal symptoms. However, despite the relevant number of studies, it is still difficult to identify a typical dysbiotic profile in ASD patients. The discrepancies among these studies are due to technical factors (i.e., experimental procedures) and external parameters (i.e., dietary habits). In this paper, we collected 959 samples from eight available projects (540 ASD and 419 Healthy Controls, HC) and reduced the observed bias among studies. Then, we applied a Machine Learning (ML) approach to create a predictor able to discriminate between ASD and HC. We tested and optimized three algorithms: Random Forest, Support Vector Machine and Gradient Boosting Machine. All three algorithms confirmed the importance of five different genera, including Parasutterella and Alloprevotella. Furthermore, our results show that ML algorithms could identify common taxonomic features by comparing datasets obtained from countries characterized by latent confounding variables.

Machine Learning Data Analysis Highlights the Role of Parasutterella and Alloprevotella in Autism Spectrum Disorders

Fosso B.
Writing – Original Draft Preparation
;
Messina F.;Pesole G.
Supervision
;
2022-01-01

Abstract

In recent years, the involvement of the gut microbiota in disease and health has been investigated by sequencing the 16S gene from fecal samples. Dysbiotic gut microbiota was also observed in Autism Spectrum Disorder (ASD), a neurodevelopmental disorder characterized by gastrointestinal symptoms. However, despite the relevant number of studies, it is still difficult to identify a typical dysbiotic profile in ASD patients. The discrepancies among these studies are due to technical factors (i.e., experimental procedures) and external parameters (i.e., dietary habits). In this paper, we collected 959 samples from eight available projects (540 ASD and 419 Healthy Controls, HC) and reduced the observed bias among studies. Then, we applied a Machine Learning (ML) approach to create a predictor able to discriminate between ASD and HC. We tested and optimized three algorithms: Random Forest, Support Vector Machine and Gradient Boosting Machine. All three algorithms confirmed the importance of five different genera, including Parasutterella and Alloprevotella. Furthermore, our results show that ML algorithms could identify common taxonomic features by comparing datasets obtained from countries characterized by latent confounding variables.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/456634
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? 4
  • Scopus 20
  • ???jsp.display-item.citation.isi??? 15
social impact