**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.

Publication# Transportation-based functional ANOVA and PCA for covariance operators

Abstract

We consider the problem of comparing several samples of stochastic processes with respect to their second-order structure, and describing the main modes of variation in this second order structure, if present. These tasks can be seen as an Analysis of Variance (ANOVA) and a Principal Component Analysis (PCA) of covariance operators, respectively. They arise naturally in functional data analysis, where several populations are to be contrasted relative to the nature of their dispersion around their means, rather than relative to their means themselves. We contribute a novel approach based on optimal (multi)transport, where each covariance can be identified with a a centred Gaussian process of corresponding covariance. By means of constructing the optimal simultaneous coupling of these Gaussian processes, we contrast the (linear) maps that achieve it with the identity with respect to a norm-induced distance. The resulting test statistic, calibrated by permutation, is seen to distinctly outperform the state-of-the-art, and to furnish considerable power even under local alternatives. This effect is seen to be genuinely functional, and is related to the potential for perfect discrimination in infinite dimensions. In the event of a rejection of the null hypothesis stipulating equality, a geometric interpretation of the transport maps allows us to construct a (tangent space) PCA revealing the main modes of variation. As a necessary step to developing our methodology, we prove results on the existence and boundedness of optimal multitransport maps. These are of independent interest in the theory of transport of Gaussian processes. The transportation ANOVA and PCA are illustrated on a variety of simulated and real examples.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (56)

Related concepts (45)

Related MOOCs (10)

Statistical hypothesis testing

A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. While hypothesis testing was popularized early in the 20th century, early forms were used in the 1700s. The first use is credited to John Arbuthnot (1710), followed by Pierre-Simon Laplace (1770s), in analyzing the human sex ratio at birth; see .

Test statistic

A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing. A hypothesis test is typically specified in terms of a test statistic, considered as a numerical summary of a data-set that reduces the data to one value that can be used to perform the hypothesis test. In general, a test statistic is selected or defined in such a way as to quantify, within observed data, behaviours that would distinguish the null from the alternative hypothesis, where such an alternative is prescribed, or that would characterize the null hypothesis if there is no explicitly stated alternative hypothesis.

Norm (mathematics)

In mathematics, a norm is a function from a real or complex vector space to the non-negative real numbers that behaves in certain ways like the distance from the origin: it commutes with scaling, obeys a form of the triangle inequality, and is zero only at the origin. In particular, the Euclidean distance in a Euclidean space is defined by a norm on the associated Euclidean vector space, called the Euclidean norm, the 2-norm, or, sometimes, the magnitude of the vector.

Introduction to optimization on smooth manifolds: first order methods

Learn to optimize on smooth, nonlinear spaces: Join us to build your foundations (starting at "what is a manifold?") and confidently implement your first algorithm (Riemannian gradient descent).

Algebra (part 1)

Un MOOC francophone d'algèbre linéaire accessible à tous, enseigné de manière rigoureuse et ne nécessitant aucun prérequis.

Algebra (part 1)

Un MOOC francophone d'algèbre linéaire accessible à tous, enseigné de manière rigoureuse et ne nécessitant aucun prérequis.

How can we discern whether the covariance operator of a stochastic pro-cess is of reduced rank, and if so, what its precise rank is? And how can we do so at a given level of confidence? This question is central to a great deal of methods for functional dat ...

We study the homogenization of the Poisson equation with a reaction term and of the eigenvalue problem associated to the generator of multiscale Langevin dynamics. Our analysis extends the theory of two-scale convergence to the case of weighted Sobolev spa ...

Victor Panaretos, Soham Sarkar

Covariance estimation is ubiquitous in functional data analysis. Yet, the case of functional observations over multidimensional domains introduces computational and statistical challenges, rendering the standard methods effectively inapplicable. To address ...