Stochastic approximation | EPFL Graph Search

Stochastic approximation methods are a family of iterative methods typically used for root-finding problems or for optimization problems. The recursive update rules of stochastic approximation methods can be used, among other things, for solving linear systems when the collected data is corrupted by noise, or for approximating extreme values of functions which cannot be computed directly, but only estimated via noisy observations. In a nutshell, stochastic approximation algorithms deal with a function of the form which is the expected value of a function depending on a random variable . The goal is to recover properties of such a function without evaluating it directly. Instead, stochastic approximation algorithms use random samples of to efficiently approximate properties of such as zeros or extrema. Recently, stochastic approximations have found extensive applications in the fields of statistics and machine learning, especially in settings with big data. These applications range from stochastic optimization methods and algorithms, to online forms of the EM algorithm, reinforcement learning via temporal differences, and deep learning, and others. Stochastic approximation algorithms have also been used in the social sciences to describe collective dynamics: fictitious play in learning theory and consensus algorithms can be studied using their theory. The earliest, and prototypical, algorithms of this kind are the Robbins–Monro and Kiefer–Wolfowitz algorithms introduced respectively in 1951 and 1952. The Robbins–Monro algorithm, introduced in 1951 by Herbert Robbins and Sutton Monro, presented a methodology for solving a root finding problem, where the function is represented as an expected value. Assume that we have a function , and a constant , such that the equation has a unique root at . It is assumed that while we cannot directly observe the function , we can instead obtain measurements of the random variable where . The structure of the algorithm is to then generate iterates of the form: Here, is a sequence of positive step sizes.

Accelerated SGD for Non-Strongly-Convex Least Squares

Nicolas Henri Bernard Flammarion, Aditya Vardhan Varre

We consider stochastic approximation for the least squares regression problem in the non-strongly convex setting. We present the first practical algorithm that achieves the optimal prediction error rates in terms of dependence on the noise of the problem, as

O(d/t)

while accelerating the forgetting of the initial conditions to

O(d/t^2)

. Our new algorithm is based on a simple modification of the accelerated gradient descent. We provide convergence results for both the averaged and the last iterate of the algorithm. In order to describe the tightness of these new bounds, we present a matching lower bound in the noiseless setting and thus show the optimality of our algorithm.

2022

A multi-class framework for a pedestrian cell transmission model accounting for population heterogeneity

Guy Alexander Cooper

This work is an extension of PedCTM, an aggregate and transient cell transmission model for multidirectional pedestrian flows in which pedestrian characteristics are assumed to be homogeneous across the population considered. Critically, one fundamental diagram relating pedestrian speed and flow to local pedestrian density is employed across the entire population. This work extends the model to population heterogeneity with a multi-class approach wherein each sub-population is assigned its own characteristic fundamental diagram. The model presented requires the implementation of an update cycle to minimize numerical dis- persion of pedestrians across all classes. In addition, multi-class dynamics introduces an element of competition in determining the flow constraints of the various classes. A priority scheme is implemented that allows for static, dynamic and stochastic determination of flow priorities throughout the network over the course of simulation. An attempt was made to base the class-specific fundamental diagrams off of inference from a dataset related to the PedFlux collaboration. Ultimately, however, the implementation made use of the Kladek formula for the speed-density relation. Preliminary simulations and results are presented to serve as a proof of concept.

2014

A tutorial on adaptive MCMC

Johannes Thoms

We review adaptive Markov chain Monte Carlo algorithms (MCMC) as a mean to optimise their performance. Using simple toy examples we review their theoretical underpinnings, and in particular show why adaptive MCMC algorithms might fail when some fundamental properties are not satisfied. This leads to guidelines concerning the design of correct algorithms. We then review criteria and the useful framework of stochastic approximation, which allows one to systematically optimise generally used criteria, but also analyse the properties of adaptive MCMC algorithms. We then propose a series of novel adaptive algorithms which prove to be robust and reliable in practice. These algorithms are applied to artificial and high dimensional scenarios, but also to the classic mine disaster dataset inference problem.

2008

Accelerated SGD for Non-Strongly-Convex Least Squares

Nicolas Henri Bernard Flammarion, Aditya Vardhan Varre

O(d/t)

while accelerating the forgetting of the initial conditions to

O(d/t^2)

2022