FEEDS, the Food wastE biopEptiDe claSsifier: From microbial genomes and substrates to biopeptides function

Centurion, Victor Borin; Bizzotto, Edoardo; Tonini, Stefano; Filannino, Pasquale; Di Cagno, Raffaella; Zampieri, Guido; Campanaro, Stefano

doi:10.1016/j.crbiot.2024.100186

The production of biopeptides from food waste through microbial fermentation faces challenges arising from the diverse proteolytic abilities of microorganisms and substrate variability, impacting both the quality and yield of generated biopeptides. To address these challenges, preliminary in-silico bioinformatics analyses play a crucial role in evaluating suitable substrates and proteases for the fermentation process. However, existing tools lack comprehensive predictive capabilities for relevant proteases, substrate performance assessment, and final biopeptide family characterization. To overcome these limitations, we developed FEEDS (Food wastE biopEptiDe claSsifier), a novel biopeptide prediction and classification tool. FEEDS predicts biopeptide families based on microbial genome protease profiles and substrate composition during proteolysis. The tool also employs a machine learning approach for functional biopeptide classification. Results from testing on 1000 microbial genomes demonstrate the effectiveness of biopeptide classification, particularly in categorizing peptides derived from substrates like Hordeum vulgare and Vitis vinifera seed storage proteins. In addition to biopeptide classification, our study delves into the distinctive protease profiles of bacteria and yeast genomes. Bacterial genomes exhibited 60 to 100 proteases across 40–55 families. Contrastingly, yeast genomes displayed a more evenly distributed pattern with 150 to 160 protease-encoding genes across 60 to 67 families, surpassing bacterial counts. Remarkably, a substantial portion of yeast proteases (~66 %) was secreted. Moreover, our integration of a machine learning methodology within the FEEDS pipeline proved highly effective, achieving over 80 % accuracy in predicting the function of peptides derived from seed storage proteins. Notably, longer peptide sequences exceeding 20 amino acids consistently displayed a higher probability of correct assignment compared to shorter counterparts.