**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Functional lagged regression with sparse noisy observations

Abstract

A functional (lagged) time series regression model involves the regression of scalar response time series on a time series of regressors that consists of a sequence of random functions. In practice, the underlying regressor curve time series are not always directly accessible, but are latent processes observed (sampled) only at discrete measurement locations. In this article, we consider the so-called sparse observation scenario where only a relatively small number of measurement locations have been observed, possibly different for each curve. The measurements can be further contaminated by additive measurement error. A spectral approach to the estimation of the model dynamics is considered. The spectral density of the regressor time series and the cross-spectral density between the regressors and response time series are estimated by kernel smoothing methods from the sparse observations. The impulse response regression coefficients of the lagged regression model are then estimated by means of ridge regression (Tikhonov regularization) or principal component analysis (PCA) regression (spectral truncation). The latent functional time series are then recovered by means of prediction, conditioning on all the observed data. The performance and implementation of our methods are illustrated by means of a simulation study and the analysis of meteorological data.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related concepts (25)

Time series

In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. T

Simulation

A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of models; the model represents the key characteristics or behaviors of the se

Curve

In mathematics, a curve (also called a curved line in older texts) is an object similar to a line, but that does not have to be straight.
Intuitively, a curve may be thought of as the trace left by

Related publications (30)

Loading

Loading

Loading

This work is about time series of functional data (functional time series), and consists of three main parts. In the first part (Chapter 2), we develop a doubly spectral decomposition for functional time series that generalizes the Karhunen–Loève expansion. In the second part (Chapter 3), we develop the theory of estimation for the spectral density operators, which are the main tool involved in the doubly spectral decomposition. The third part (Chapter 4) is concerned with the problem of understanding and comparing the dynamics of DNA. It proposes a methodology for comparing the dynamics of DNA minicircles that are vibrating in solution, using tools developed in this thesis. In the first part, we develop a doubly spectral representation of a stationary functional time series that generalizes the Karhunen–Loève expansion to the functional time series setting. The representation decomposes the time series into an integral of uncorrelated frequency components (Cramér representation), each of which is in turn expanded in a Karhunen-Loève series, thus yielding a Cramér–Karhunen–Loève decomposition of the series. The construction is based on the spectral density operators—whose Fourier coefficients are the lag-t autocovariance operators—which characterise the second-order dynamics of the process. The spectral density operators are the functional analogues of the spectral density matrices, whose eigenvalues and eigenfunctions at different frequencies provide the building blocks of the representation. By truncating the representation at a finite level, we obtain a harmonic principal component analysis of the time series, an optimal finite dimensional reduction of the time series that captures both the temporal dynamics of the process, and the within-curve dynamics, and dominates functional PCA. The proofs rely on the construction of a stochastic integral of operator-valued functions, whose construction is similar to that of the Itô integral. In practice, the spectral density operators are unknown. In the second part, we therefore develop the basic theory of a frequency domain framework for drawing statistical inferences on the spectral density operators of a stationary functional time series. Our main tool is the functional Discrete Fourier Transform(fDFT).We derive an asymptotic Gaussian representation of the fDFT, thus allowing the transformation of the original collection of dependent random functions into a collection of approximately independent complex-valued Gaussian random functions. Our results are then employed in order to construct estimators of the spectral density operators based on smoothed versions of the periodogram kernel, the functional generalisation of the periodogram matrix. The consistency and asymptotic law of these estimators are studied in detail. As immediate consequences, we obtain central limit theorems for the mean and the long-run covariance operator of a stationary functional time series. Our results do not depend on structural modeling assumptions, but only functional versions of classical cumulant mixing conditions. The effect of discrete noisy observations on the consistency of the estimators is studied in a framework general enough to apply to a wide range of smoothing techniques for converting discrete noisy observations into functional data. We also perform a simulation study to assess the finite sample performance of our estimators, and give a discussion of the technical assumptions of our results, and at what cost our weak dependence assumptions could be changed or weakened, and provide examples of processes satisfying the technical assumptions of our asymptotic results. As an application, we consider in the third part the problem of comparing the dynamics of the trajectories of two DNA minicircles that are vibrating in solution, which are obtained via Molecular Dynamics simulations. The approach we take is to view and compare the dynamics through their spectral density operators, which contain the entire second-order structure of the trajectories. As a first step, we compare the spectral density operators of the two DNA minicircles using a new test we develop, which allows us to compare the spectral density operators at a fixed frequencies. Using multiple testing procedures, we are able to localize in frequencies the differences in spectral density operators of the two DNA minicircles, while controlling a type-I error, and conduct numerical simulations to assess the performance of our method. We further investigate the differences between the two minicircles by comparing their spectral density operators within frequencies. This allows us to localize their differences both in frequencies and on the minicircles, while controlling the averaged false discovery rate over the selected frequencies. Our methodology is general enough to be applied to the comparison of the dynamics of any pair of stationary functional time series.

Traditional approaches to analysing functional data typically follow a two-step procedure, consisting in first smoothing and then carrying out a functional principal component analysis. The idea underlying this procedure is that functional data are well approximated by smooth functions, and that rough variations are due to noise. However, it may very well happen that localised features are rough at a global scale but still smooth at some finer scale. In this thesis we put forward a new statistical approach for functional data arising as the sum of two uncorrelated components: one smooth plus one rough. We give non-parametric conditions under which the covariance operators of the smooth and of the rough components are jointly identifiable on the basis of discretely observed data: the covariance operator corresponding to the smooth component must be of finite rank and have real analytic eigenfunctions, while the one corresponding to the rough component must have a banded covariance function. We construct consistent estimators of both covariance operators without assuming knowledge of the true rank or bandwidth. We then use them to estimate the best linear predictors of the the smooth and the rough components of each functional datum. In both the identifiability and the inference part, we do not follow the usual strategy used in functional data analysis which is to first employ smoothing and work with continuous estimate of the covariance operator. Instead, we work directly with the covariance matrix of the discretely observed data, which allows us to use results and tools from linear algebra. In fact, we show that the whole problem of uniquely recovering the covariance operator of the smooth component given the one of the raw data can be seen as a low-rank matrix completion problem, and we make great use of a classical relation between the rank and the minors of a matrix to solve this matrix completion problem. The finite-sample performance of our approach is studied by means of simulation study.

Functional time series is a temporally ordered sequence of not necessarily independent random curves. While the statistical analysis of such data has been traditionally carried out under the assumption of completely observed functional data, it may well happen that the statistician only has access to a relatively low number of sparse measurements for each random curve. These discrete measurements may be moreover irregularly scattered in each curve's domain, missing altogether for some curves, and be contaminated by measurement noise. This sparse sampling protocol escapes from the reach of established estimators in functional time series analysis and therefore requires development of a novel methodology.
The core objective of this thesis is development of a non-parametric statistical toolbox for analysis of sparsely observed functional time series data. Assuming smoothness of the latent curves, we construct a local-polynomial-smoother based estimator of the spectral density operator producing a consistent estimator of the complete second order structure of the data. Moreover, the spectral domain recovery approach allows for prediction of latent curve data at a given time by borrowing strength from the estimated dynamic correlations in the entire time series across time. Further to predicting the latent curves from their noisy point samples, the method fills in gaps in the sequence (curves nowhere sampled), denoises the data, and serves as a basis for forecasting.
A classical non-parametric apparatus for encoding the dependence between a pair of or among a multiple functional time series, whether sparsely or fully observed, is the functional lagged regression model. This consists of a linear filter between the regressors time series and the response. We show how to tailor the smoother based estimators for the estimation of the cross-spectral density operators and the cross-covariance operators and, by means of spectral truncation and Tikhonov regularisation techniques, how to estimate the lagged regression filter and predict the response process.
The simulation studies revealed the following findings: (i) if one has freedom to design a sampling scheme with a fixed number of measurements, it is advantageous to sparsely distribute these measurements in a longer time horizon rather than concentrating over a shorter time horizon to achieve dense measurements in order to diminish the spectral density estimation error, (ii) the developed functional recovery predictor surpasses the static predictor not exploiting the temporal dependence, (iii) neither of the two considered regularisation techniques can, in general, dominate the other for the estimation in functional lagged regression models. The new methodologies are illustrated by applications to real data: the meteorological data revolving around the fair-weather atmospheric electricity measured in Tashkent, Uzbekistan, and at Wank mountain, Germany; and a case study analysing the dependence of the US Treasury yield curve on macroeconomic variables.
As a secondary contribution, we present a novel simulation method for general stationary functional time series defined through their spectral properties. A simulation study shows universality of such approach and superiority of the spectral domain simulation over the temporal domain in some situations.