**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Functional data analysis by matrix completion

Résumé

Traditional approaches to analysing functional data typically follow a two-step procedure, consisting in first smoothing and then carrying out a functional principal component analysis. The idea underlying this procedure is that functional data are well approximated by smooth functions, and that rough variations are due to noise. However, it may very well happen that localised features are rough at a global scale but still smooth at some finer scale. In this thesis we put forward a new statistical approach for functional data arising as the sum of two uncorrelated components: one smooth plus one rough. We give non-parametric conditions under which the covariance operators of the smooth and of the rough components are jointly identifiable on the basis of discretely observed data: the covariance operator corresponding to the smooth component must be of finite rank and have real analytic eigenfunctions, while the one corresponding to the rough component must have a banded covariance function. We construct consistent estimators of both covariance operators without assuming knowledge of the true rank or bandwidth. We then use them to estimate the best linear predictors of the the smooth and the rough components of each functional datum. In both the identifiability and the inference part, we do not follow the usual strategy used in functional data analysis which is to first employ smoothing and work with continuous estimate of the covariance operator. Instead, we work directly with the covariance matrix of the discretely observed data, which allows us to use results and tools from linear algebra. In fact, we show that the whole problem of uniquely recovering the covariance operator of the smooth component given the one of the raw data can be seen as a low-rank matrix completion problem, and we make great use of a classical relation between the rank and the minors of a matrix to solve this matrix completion problem. The finite-sample performance of our approach is studied by means of simulation study.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Publications associées (45)

Chargement

Chargement

Chargement

Concepts associés (29)

Consistent estimator

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ0—having the property that as the number of data points u

Covariance matrix

In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the co

Functional data analysis

Functional data analysis (FDA) is a branch of statistics that analyses data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an

Functional time series is a temporally ordered sequence of not necessarily independent random curves. While the statistical analysis of such data has been traditionally carried out under the assumption of completely observed functional data, it may well happen that the statistician only has access to a relatively low number of sparse measurements for each random curve. These discrete measurements may be moreover irregularly scattered in each curve's domain, missing altogether for some curves, and be contaminated by measurement noise. This sparse sampling protocol escapes from the reach of established estimators in functional time series analysis and therefore requires development of a novel methodology.
The core objective of this thesis is development of a non-parametric statistical toolbox for analysis of sparsely observed functional time series data. Assuming smoothness of the latent curves, we construct a local-polynomial-smoother based estimator of the spectral density operator producing a consistent estimator of the complete second order structure of the data. Moreover, the spectral domain recovery approach allows for prediction of latent curve data at a given time by borrowing strength from the estimated dynamic correlations in the entire time series across time. Further to predicting the latent curves from their noisy point samples, the method fills in gaps in the sequence (curves nowhere sampled), denoises the data, and serves as a basis for forecasting.
A classical non-parametric apparatus for encoding the dependence between a pair of or among a multiple functional time series, whether sparsely or fully observed, is the functional lagged regression model. This consists of a linear filter between the regressors time series and the response. We show how to tailor the smoother based estimators for the estimation of the cross-spectral density operators and the cross-covariance operators and, by means of spectral truncation and Tikhonov regularisation techniques, how to estimate the lagged regression filter and predict the response process.
The simulation studies revealed the following findings: (i) if one has freedom to design a sampling scheme with a fixed number of measurements, it is advantageous to sparsely distribute these measurements in a longer time horizon rather than concentrating over a shorter time horizon to achieve dense measurements in order to diminish the spectral density estimation error, (ii) the developed functional recovery predictor surpasses the static predictor not exploiting the temporal dependence, (iii) neither of the two considered regularisation techniques can, in general, dominate the other for the estimation in functional lagged regression models. The new methodologies are illustrated by applications to real data: the meteorological data revolving around the fair-weather atmospheric electricity measured in Tashkent, Uzbekistan, and at Wank mountain, Germany; and a case study analysing the dependence of the US Treasury yield curve on macroeconomic variables.
As a secondary contribution, we present a novel simulation method for general stationary functional time series defined through their spectral properties. A simulation study shows universality of such approach and superiority of the spectral domain simulation over the temporal domain in some situations.

Marie-Hélène Descary, Victor Panaretos

Functional data analyses typically proceed by smoothing, followed by functional PCA. This paradigm implicitly assumes that rough variation is due to nuisance noise. Nevertheless, relevant functional features such as time-localised or short scale fluctuations may indeed be rough relative to the global scale, but still smooth at shorter scales. These may be confounded with the global smooth components of variation by the smoothing and PCA, potentially distorting the parsimony and interpretability of the analysis. The goal of this paper is to investigate how both smooth and rough variations can be recovered on the basis of discretely observed functional data. Assuming that a functional datum arises as the sum of two uncorrelated components, one smooth and one rough, we develop identifiability conditions for the recovery of the two corresponding covariance operators. The key insight is that they should possess complementary forms of parsimony: one smooth and finite rank (large scale), and the other banded and potentially infinite rank (small scale). Our conditions elucidate the precise interplay between rank, bandwidth and grid resolution. Under these conditions, we show that the recovery problem is equivalent to rank-constrained matrix completion, and exploit this to construct estimators of the two covariances, without assuming knowledge of the true bandwidth or rank; we study their asymptotic behaviour, and then use them to recover the smooth and rough components of each functional datum by best linear prediction. As a result, we effectively produce separate functional PCAs for smooth and rough variation.

, ,

Nonparametric inference for functional data over two-dimensional domains entails additional computational and statistical challenges, compared to the one-dimensional case. Separability of the covariance is commonly assumed to address these issues in the densely observed regime. Instead, we consider the sparse regime, where the latent surfaces are observed only at few irregular locations with additive measurement error, and propose an estimator of covariance based on local linear smoothers. Consequently, the assumption of separability reduces the intrinsically four-dimensional smoothing problem into several two-dimensional smoothers and allows the proposed estimator to retain the classical minimax-optimal convergence rate for two-dimensional smoothers. Even when separability fails to hold, imposing it can be still advantageous as a form of regularization. A simulation study reveals a favorable bias-variance tradeoff and massive speed-ups achieved by our approach. Finally, the proposed methodology is used for qualitative analysis of implied volatility surfaces corresponding to call options, and for prediction of the latent surfaces based on information from the entire dataset, allowing for uncertainty quantification. Our cross-validated out-of-sample quantitative results show that the proposed methodology outperforms the common approach of pre-smoothing every implied volatility surface separately. Supplementary materials for this article are available online.