Extra virgin olive oil (EVOO) is susceptible to adulteration and degradation, making the assessment of its authenticity and quality essential. Fatty acid ethyl esters (FAEE), formed through fermentative processes, are regulated by EU legislation as key markers of EVOO quality, with acceptable levels up to 35 mg/kg. In this study, a rapid, non-destructive, and cost-effective alternative based on infrared spectroscopy combined with traditional statistical methods (i.e., Partial Least Square – PLS), machine learning (ML) and explainable artificial intelligence (XAI) is proposed. A dataset of 170 olive oil samples with FAEE concentrations ranging from 1.81 mg/kg to 109.00 mg/kg were analyzed using Fourier Transform Infrared spectroscopy. Spectral data were preprocessed and used to train various regression models. The best performance was obtained with an XGBoost model (R2 = 0.90, RMSE = 9.41 mg/kg). XAI techniques enabled interpretation of the model and identification of spectral regions mostly associated with FAEE content.
Unlocking extra virgin olive oil identification: predicting ethyl esters with explainable AI on IR spectra
Magarelli, Michele;Squeo, Giacomo;Novielli, Pierfrancesco;Bellotti, Roberto;Caponio, Francesco;Tangaro, Sabina
2026-01-01
Abstract
Extra virgin olive oil (EVOO) is susceptible to adulteration and degradation, making the assessment of its authenticity and quality essential. Fatty acid ethyl esters (FAEE), formed through fermentative processes, are regulated by EU legislation as key markers of EVOO quality, with acceptable levels up to 35 mg/kg. In this study, a rapid, non-destructive, and cost-effective alternative based on infrared spectroscopy combined with traditional statistical methods (i.e., Partial Least Square – PLS), machine learning (ML) and explainable artificial intelligence (XAI) is proposed. A dataset of 170 olive oil samples with FAEE concentrations ranging from 1.81 mg/kg to 109.00 mg/kg were analyzed using Fourier Transform Infrared spectroscopy. Spectral data were preprocessed and used to train various regression models. The best performance was obtained with an XGBoost model (R2 = 0.90, RMSE = 9.41 mg/kg). XAI techniques enabled interpretation of the model and identification of spectral regions mostly associated with FAEE content.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


