In this paper, we present an algorithm for decomposing time series based on Gaussian processes. Gaussian processes can be viewed as infinite-dimensional probability distributions over smooth functions and also provide a natural basis for additive decomposition of time series, since we can sum mutually independent Gaussian processes with a simple and elegant algebra involving covariance kernels. The component estimation algorithm we propose in this paper is general and does not depend on the number of components, nor on the correlation structure and interpretation of each component. Specifically, the proposed algorithm is based on nonparametric Bayesian Gaussian process regression, where the log-likelihood covariance is suitably structured to account for the presence of additive subcomponents. The numerical parameter estimation procedure finds MAP estimates by maximizing the unnormalized log-posterior density, with a great advantage in terms of computational cost and efficiency. We apply our proposal to real data based on a time series of daily COVID-19 confirmed cases and daily swabs administered in Italy. The curve of daily new cases follows an oscillatory pattern, as the ratio between the number of new cases and the number of tested individuals is locally constant, while the number of daily new cases is systematically lower on some days of the week, as fewer swabs are performed. This wavelike pattern must be considered a nuisance that does not reflect the true dynamics of contagion, as it is the result of variability in human activities. We show how such a cyclic component can be successfully filtered out using the proposed algorithm.
A Time Series Decomposition Algorithm Based on Gaussian Processes
Massimo Bilancia
Conceptualization
;Fabio Manca;
2021-01-01
Abstract
In this paper, we present an algorithm for decomposing time series based on Gaussian processes. Gaussian processes can be viewed as infinite-dimensional probability distributions over smooth functions and also provide a natural basis for additive decomposition of time series, since we can sum mutually independent Gaussian processes with a simple and elegant algebra involving covariance kernels. The component estimation algorithm we propose in this paper is general and does not depend on the number of components, nor on the correlation structure and interpretation of each component. Specifically, the proposed algorithm is based on nonparametric Bayesian Gaussian process regression, where the log-likelihood covariance is suitably structured to account for the presence of additive subcomponents. The numerical parameter estimation procedure finds MAP estimates by maximizing the unnormalized log-posterior density, with a great advantage in terms of computational cost and efficiency. We apply our proposal to real data based on a time series of daily COVID-19 confirmed cases and daily swabs administered in Italy. The curve of daily new cases follows an oscillatory pattern, as the ratio between the number of new cases and the number of tested individuals is locally constant, while the number of daily new cases is systematically lower on some days of the week, as fewer swabs are performed. This wavelike pattern must be considered a nuisance that does not reflect the true dynamics of contagion, as it is the result of variability in human activities. We show how such a cyclic component can be successfully filtered out using the proposed algorithm.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.