**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.

Publication# Distributional Regression and Autoregression via Optimal Transport

Abstract

We present a framework for performing regression when both covariate and response are probability distributions on a compact and convex subset of $\R^d$. Our regression model is based on the theory of optimal transport and links the conditional Fr'echet mean of the response to the covariate via an optimal transport map. We define a Fr'echet-least-squares estimator of this regression map, and establish its consistency and rate of convergence to the true map under full observation of the regression pairs.For the specific case when $d=1$, we obtain additional results: we establish the minimax rate of estimation of such a regression function, by deriving a lower bound that matches the convergence rate attained by the Fr'echet least squares estimator.Additionally, we find an upper-bound for the convergence rate of an estimator when observing only samples from the covariate and response distributions. Also in this case, the computation of the estimator is shown to reduce to a standard convex optimisation problem, and thus our regression model can be implemented with ease. We illustrate our methodology using real and simulated data.We explore the problem of defining and fitting models of autoregressive time series of probability distributions on a compact interval of $\R$. In this context, an order-$1$ autoregressive model is a Markov chain that specifies a certain structure (regression) for the one-step conditional Fr'echet mean with respect to a natural probability metric. We construct and investigate different models based on iterated random function systems of optimal transport maps. While the properties and interpretation of these models depend on how they relate to the iterated transport system, they can all be analyzed theoretically in a unified way. We present such a theoretical analysis, including convergence rates, and illustrate our methodology using real and simulated data. Our models generalise or extend certain existing models of transportation-based regression and autoregression, and in doing so also provides some new insights on those previous models.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts (34)

Related MOOCs (2)

Related publications (141)

Ontological neighbourhood

Linear regression

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

Regression analysis

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one or more independent variables (often called 'predictors', 'covariates', 'explanatory variables' or 'features'). The most common form of regression analysis is linear regression, in which one finds the line (or a more complex linear combination) that most closely fits the data according to a specific mathematical criterion.

Nonlinear regression

In statistics, nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables. The data are fitted by a method of successive approximations. In nonlinear regression, a statistical model of the form, relates a vector of independent variables, , and its associated observed dependent variables, . The function is nonlinear in the components of the vector of parameters , but otherwise arbitrary.

Selected Topics on Discrete Choice

Discrete choice models are used extensively in many disciplines where it is important to predict human behavior at a disaggregate level. This course is a follow up of the online course “Introduction t

Selected Topics on Discrete Choice

Discrete choice models are used extensively in many disciplines where it is important to predict human behavior at a disaggregate level. This course is a follow up of the online course “Introduction t

We propose a novel approach to evaluating the ionic Seebeck coefficient in electrolytes from relatively short equilibrium molecular dynamics simulations, based on the Green-Kubo theory of linear response and Bayesian regression analysis. By exploiting the ...

Victor Panaretos, Laya Ghodrati

We consider the problem of defining and fitting models of autoregressive time series of probability distributions on a compact interval of Double-struck capital R. An order-1 autoregressive model in this context is to be understood as a Markov chain, where ...

Florent Gérard Krzakala, Lenka Zdeborová, Hugo Chao Cui

We consider the problem of learning a target function corresponding to a deep, extensive-width, non-linear neural network with random Gaussian weights. We consider the asymptotic limit where the number of samples, the input dimension and the network width ...

2023