Respiratory system cancer, encompassing lung, trachea and bronchus cancer, constitute a substantial and evolving public health challenge. Since pollution plays a prominent cause in the development of this disease, identifying which substances are most harmful is fundamental for implementing policies aimed at reducing exposure to these substances. We propose an approach based on explainable artificial intelligence (XAI) based on remote sensing data to identify the factors that most influence the prediction of the standard mortality ratio (SMR) for respiratory system cancer in the Italian provinces using environment and socio-economic data. First of all, we identified 10 clusters of provinces through the study of the SMR variogram. Then, a Random Forest regressor is used for learning a compact representation of data. Finally, we used XAI to identify which features were most important in predicting SMR values. Our machine learning analysis shows that NO, income and O3 are the first three relevant features for the mortality of this type of cancer, and provides a guideline on intervention priorities in reducing risk factors.
Air pollution and mortality for cancer of the respiratory system in Italy: an explainable artificial intelligence approach
Romano, Donato;Novielli, Pierfrancesco;Cilli, Roberto;Amoroso, Nicola;Monaco, Alfonso;Bellotti, Roberto;Tangaro, Sabina
2024-01-01
Abstract
Respiratory system cancer, encompassing lung, trachea and bronchus cancer, constitute a substantial and evolving public health challenge. Since pollution plays a prominent cause in the development of this disease, identifying which substances are most harmful is fundamental for implementing policies aimed at reducing exposure to these substances. We propose an approach based on explainable artificial intelligence (XAI) based on remote sensing data to identify the factors that most influence the prediction of the standard mortality ratio (SMR) for respiratory system cancer in the Italian provinces using environment and socio-economic data. First of all, we identified 10 clusters of provinces through the study of the SMR variogram. Then, a Random Forest regressor is used for learning a compact representation of data. Finally, we used XAI to identify which features were most important in predicting SMR values. Our machine learning analysis shows that NO, income and O3 are the first three relevant features for the mortality of this type of cancer, and provides a guideline on intervention priorities in reducing risk factors.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.