**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Covariance Estimation for Random Surfaces beyond Separability

Résumé

This thesis focuses on non-parametric covariance estimation for random surfaces, i.e.~functional data on a two-dimensional domain. Non-parametric covariance estimation lies at the heart of functional data analysis, andconsiderations of statistical and computational efficiency often compel the use of separability of the covariance, when working with random surfaces. We seek to provide efficient alternatives to this ambivalent assumption.In Chapter 2, we study a setting where the covariance structure may fail to be separable locally -- either due to noise contamination or due to the presence of a non-separable short-range dependent signal component. That is, the covariance is an additive perturbation of a separable component by a non-separable but banded component. We introduce non-parametric estimators hinging on shifted partial tracing -- a novel concept enjoying strong denoising properties. We illustrate the usefulness of the proposed methodology on a data set of mortality surfaces.In Chapter 3, we propose a distinctive decomposition of the covariance, which allows us to understand separability as an unconventional form of low-rankness. From this perspective, a separable covariance has rank one. Allowing for a higher rank suggests a structured class in which any covariance can be approximated up to an arbitrary precision. The key notion of the partial inner product allows us to generalize the power iteration method to general Hilbert spaces and estimate the aforementioned decomposition from data. Truncation and retention of the leading terms automatically induces a non-parametric estimator of the covariance, whose parsimony is dictated by the truncation level. Advantages of this approach, allowing for estimation beyond separability, are demonstrated on the task of classification of EEG signals.While Chapters 2 and 3 propose several generalizations of separability in the densely sampled regime, Chapter 4 deals with the sparse regime, where the latent surfaces are observed only at few irregular locations. Here, a separable covariance estimator based on local linear smoothers is proposed, which is the first non-parametric utilization of separability in the sparse regime. The assumption of separability reduces the intrinsically four-dimensional smoothing problem into several two-dimensional smoothers and allows the proposed estimator to retain the classical minimax-optimal convergence rate for two-dimensional smoothers. The proposed methodology is used for a qualitative analysis of implied volatility surfaces corresponding to call options, and for prediction of the latent surfaces based on information from the entire data set, allowing for uncertainty quantification. Our quantitative results show that the proposed methodology outperforms the common approach of pre-smoothing every implied volatility surface separately.Throughout the thesis, we put emphasis on computational aspects, since those are the main reason behind the immense popularity of separability. We show that the covariance structures of Chapters 2 and 3 come with no (asymptotic) computational overhead relative to assuming separability. In fact, the proposed covariance structures can be estimated and manipulated with the same asymptotic costs as the separable model. In particular, we develop numerical algorithms that can be used for efficient inversion, as required e.g.~for prediction. All the methods are implemented in R and available on~GitHub.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (55)

Estimateur (statistique)

En statistique, un estimateur est une fonction permettant d'estimer un moment d'une loi de probabilité (comme son espérance ou sa variance). Il peut par exemple servir à estimer certaines caractérist

Covariance

En théorie des probabilités et en statistique, la covariance entre deux variables aléatoires est un nombre permettant de quantifier leurs écarts conjoints par rapport à leurs espérances respectives.

Méthodologie

La méthodologie est l'étude de l'ensemble des méthodes scientifiques. Elle peut être considérée comme la science de la méthode, ou « méthode des méthodes » (comme il y a une métalinguistique ou ling

Publications associées (34)

Chargement

Chargement

Chargement

Traditional approaches to analysing functional data typically follow a two-step procedure, consisting in first smoothing and then carrying out a functional principal component analysis. The idea underlying this procedure is that functional data are well approximated by smooth functions, and that rough variations are due to noise. However, it may very well happen that localised features are rough at a global scale but still smooth at some finer scale. In this thesis we put forward a new statistical approach for functional data arising as the sum of two uncorrelated components: one smooth plus one rough. We give non-parametric conditions under which the covariance operators of the smooth and of the rough components are jointly identifiable on the basis of discretely observed data: the covariance operator corresponding to the smooth component must be of finite rank and have real analytic eigenfunctions, while the one corresponding to the rough component must have a banded covariance function. We construct consistent estimators of both covariance operators without assuming knowledge of the true rank or bandwidth. We then use them to estimate the best linear predictors of the the smooth and the rough components of each functional datum. In both the identifiability and the inference part, we do not follow the usual strategy used in functional data analysis which is to first employ smoothing and work with continuous estimate of the covariance operator. Instead, we work directly with the covariance matrix of the discretely observed data, which allows us to use results and tools from linear algebra. In fact, we show that the whole problem of uniquely recovering the covariance operator of the smooth component given the one of the raw data can be seen as a low-rank matrix completion problem, and we make great use of a classical relation between the rank and the minors of a matrix to solve this matrix completion problem. The finite-sample performance of our approach is studied by means of simulation study.

The problem of covariance estimation for replicated surface-valued processes is examined from the functional data analysis perspective. Considerations of statistical and computational efficiency often compel the use of separability of the covariance, even though the assumption may fail in practice. We consider a setting where the covariance structure may fail to be separable locally-either due to noise contamination or due to the presence of a nonseparable short-range dependent signal component. That is, the covariance is an additive perturbation of a separable component by a nonseparable but banded component. We introduce nonparametric estimators hinging on the novel concept of shifted partial tracing, enabling computationally efficient estimation of the model under dense observation. Due to the denoising properties of shifted partial tracing, our methods are shown to yield consistent estimators even under noisy discrete observation, without the need for smoothing. Further to deriving the convergence rates and limit theorems, we also show that the implementation of our estimators, including prediction, comes at no computational overhead relative to a separable model. Finally, we demonstrate empirical performance and computational feasibility of our methods in an extensive simulation study and on a real dataset. Supplementary materials for this article are available online.

Tomas Masák, Victor Panaretos, Tomas Rubin

Nonparametric inference for functional data over two-dimensional domains entails additional computational and statistical challenges, compared to the one-dimensional case. Separability of the covariance is commonly assumed to address these issues in the densely observed regime. Instead, we consider the sparse regime, where the latent surfaces are observed only at few irregular locations with additive measurement error, and propose an estimator of covariance based on local linear smoothers. Consequently, the assumption of separability reduces the intrinsically four-dimensional smoothing problem into several two-dimensional smoothers and allows the proposed estimator to retain the classical minimax-optimal convergence rate for two-dimensional smoothers. Even when separability fails to hold, imposing it can be still advantageous as a form of regularization. A simulation study reveals a favorable bias-variance tradeoff and massive speed-ups achieved by our approach. Finally, the proposed methodology is used for qualitative analysis of implied volatility surfaces corresponding to call options, and for prediction of the latent surfaces based on information from the entire dataset, allowing for uncertainty quantification. Our cross-validated out-of-sample quantitative results show that the proposed methodology outperforms the common approach of pre-smoothing every implied volatility surface separately. Supplementary materials for this article are available online.