Accurate prediction of nitrogen dioxide (NO2) concentrations is essential for air quality management and public health protection, particularly in urban environments. In this study, we developed and evaluated a deep learning framework based on Long Short-Term Memory (LSTM) networks to forecast daily NO2 concentrations across Italy on a high-resolution (5 km) hexagonal grid. The model integrates a heterogeneous set of predictors, including satellite-derived tropospheric NO2 (Sentinel-5P), meteorological reanalysis data (ERA5), land cover and elevation data (CORINE, Google Earth Engine), and demographic and infrastructure features (WorldPop, OpenStreetMap). Ground-truth data from ARPA monitoring stations between 2019 and 2022 were used for training and validation. We compared the performance of the LSTM with a linear regression baseline and XGBoost, employing both Random 5-Fold and Leave-One-Year-Out cross-validation strategies. Model performance was assessed using RMSE, MAE, and R2, showing that the LSTM consistently outperformed the other models in both temporal and spatial prediction accuracy. This framework demonstrates the effectiveness of AI-based approaches in air pollution modeling and highlights their potential for supporting One Health applications by providing high-resolution exposure estimates that can inform public health interventions and environmental policy.
Deep Learning for NO2 Forecasting in Italy: An LSTM-Based Approach
Fania A.;Amoroso N.;Lacalamita A.;Maggipinto T.;Bellotti R.
2025-01-01
Abstract
Accurate prediction of nitrogen dioxide (NO2) concentrations is essential for air quality management and public health protection, particularly in urban environments. In this study, we developed and evaluated a deep learning framework based on Long Short-Term Memory (LSTM) networks to forecast daily NO2 concentrations across Italy on a high-resolution (5 km) hexagonal grid. The model integrates a heterogeneous set of predictors, including satellite-derived tropospheric NO2 (Sentinel-5P), meteorological reanalysis data (ERA5), land cover and elevation data (CORINE, Google Earth Engine), and demographic and infrastructure features (WorldPop, OpenStreetMap). Ground-truth data from ARPA monitoring stations between 2019 and 2022 were used for training and validation. We compared the performance of the LSTM with a linear regression baseline and XGBoost, employing both Random 5-Fold and Leave-One-Year-Out cross-validation strategies. Model performance was assessed using RMSE, MAE, and R2, showing that the LSTM consistently outperformed the other models in both temporal and spatial prediction accuracy. This framework demonstrates the effectiveness of AI-based approaches in air pollution modeling and highlights their potential for supporting One Health applications by providing high-resolution exposure estimates that can inform public health interventions and environmental policy.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


