Air pollution remains a major environmental challenge, with severe impacts on human health and ecosystems. Recent advances in satellite technology have transformed air quality monitoring by enabling global, continuous observations of atmospheric pollutants. However, satellite data often lack the precision of ground-based stations. This study aims to develop a machine learning model to predict daily surface concentrations of key air pollutants (NO2, O3, PM10, and PM2.5) at high spatial resolution (300 m) in the Apulia region. Using Regional Environmental Protection Agency (ARPA) station data from 2019 to 2022 and meteorological, geographic, land-use, and temporal variables, we trained an XGBoost model on a 300 m grid. Model performance, assessed by repeated cross-validation, showed an average (Formula presented.) of 0.71, with values of 0.77 for NO2, 0.78 for O3, 0.67 for PM2.5, and 0.64 for PM10. eXplainable AI (XAI) methods confirmed strong alignment with established scientific knowledge, enhancing model reliability and offering insights into pollutant distribution drivers.

High-Resolution NO2, O3, and PMs Estimation in Puglia: Leveraging AI and Explainability Techniques

Fania, Alessandro;Lorusso, Giovanni;Cilli, Roberto;Amoroso, Nicola
;
Adamo, Maria;Aquilino, Mariella;Bellantuono, Loredana;De Lucia, Marica;Lacalamita, Antonio;La Rocca, Marianna;Maggipinto, Tommaso;Pantaleo, Ester;Primerano, Roberto;Tangaro, Sabina;Bellotti, Roberto;Monaco, Alfonso
2026-01-01

Abstract

Air pollution remains a major environmental challenge, with severe impacts on human health and ecosystems. Recent advances in satellite technology have transformed air quality monitoring by enabling global, continuous observations of atmospheric pollutants. However, satellite data often lack the precision of ground-based stations. This study aims to develop a machine learning model to predict daily surface concentrations of key air pollutants (NO2, O3, PM10, and PM2.5) at high spatial resolution (300 m) in the Apulia region. Using Regional Environmental Protection Agency (ARPA) station data from 2019 to 2022 and meteorological, geographic, land-use, and temporal variables, we trained an XGBoost model on a 300 m grid. Model performance, assessed by repeated cross-validation, showed an average (Formula presented.) of 0.71, with values of 0.77 for NO2, 0.78 for O3, 0.67 for PM2.5, and 0.64 for PM10. eXplainable AI (XAI) methods confirmed strong alignment with established scientific knowledge, enhancing model reliability and offering insights into pollutant distribution drivers.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/585681
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact