Accurate prediction of healthcare costs is essential for making decisions, shaping policies, preparing finances, and managing resources effectively, but traditional econometric models fall short in addressing this policy challenge adequately. This paper uses machine learning (ML) to predict healthcare expenditure in systems with heterogeneous regional needs. The Italian NHS is used as a case study, with administrative data spanning the years 1996 to 2019. The empirical analysis implements four ML algorithms (Elastic-Net, Gradient Boosting, Random Forest, and Support Vector Regression) and a multivariate regression as a baseline. Gradient boosting emerges as the superior algorithm in out-of-the-sample prediction performances; even when applied to 2019 data, the models trained up to 2018 demonstrate robust forecasting abilities. Important predictors of expenditure include temporal factors and technological progress, average family size and share of public expenditure over the total, regional area, population and share of foreign residents, GDP per capita and labour activity, and share of elderly population (75 years old and over). The remarkable effectiveness of the model demonstrates that ML can be efficiently employed to predict and then distribute national healthcare funds to areas with heterogeneous needs.

The determinants of health expenditure: a machine learning approach

Raffaele Lagravinese
Writing – Original Draft Preparation
;
Giuliano Resce
2026-01-01

Abstract

Accurate prediction of healthcare costs is essential for making decisions, shaping policies, preparing finances, and managing resources effectively, but traditional econometric models fall short in addressing this policy challenge adequately. This paper uses machine learning (ML) to predict healthcare expenditure in systems with heterogeneous regional needs. The Italian NHS is used as a case study, with administrative data spanning the years 1996 to 2019. The empirical analysis implements four ML algorithms (Elastic-Net, Gradient Boosting, Random Forest, and Support Vector Regression) and a multivariate regression as a baseline. Gradient boosting emerges as the superior algorithm in out-of-the-sample prediction performances; even when applied to 2019 data, the models trained up to 2018 demonstrate robust forecasting abilities. Important predictors of expenditure include temporal factors and technological progress, average family size and share of public expenditure over the total, regional area, population and share of foreign residents, GDP per capita and labour activity, and share of elderly population (75 years old and over). The remarkable effectiveness of the model demonstrates that ML can be efficiently employed to predict and then distribute national healthcare funds to areas with heterogeneous needs.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11586/573176
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact