**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.

Publication# Universal and adaptive methods for robust stochastic optimization

Abstract

Within the context of contemporary machine learning problems, efficiency of optimization process depends on the properties of the model and the nature of the data available, which poses a significant problem as the complexity of either increases ad infinitum. An overarching challenge is to design fast, adaptive algorithms which are provably robust to increasing complexities and unknown properties of the optimization landscape, while ensuring scalable implementation at scale.Having that said, there are two main perspectives to consider: (i) standard proof techniques require precise knowledge of the model and loss function parameters, which are usually prohibitively expensive to estimate; (ii) state-of-the-art methods which show superior performance in practice are mostly heuristics, lacking theoretical basis.In this dissertation, the reader will be presented with several fundamental problem formulations in machine learning which will be studied from the aforementioned perspectives.Specifically, the focus of this dissertation will be on two fundamental concepts; (i) adaptivity: abilitiy of an algorithm to converge without knowing the problem-dependent parameters and (ii) universality: ability of an algorithm to converge adaptively under multiple problem settings simultaneously without any modifications.In the light of this terminology, the goal is to unify the discrepancy between the theory of adaptive algorithms and the heuristic approaches employed in practice.To this end, the results are presented in three chapters based on the properties of the optimization problem; convex minimization, non-convex optimization and monotone variational inequalities.We begin with a universal and adaptive algorithm for compactly constrained convex minimization, which achieves order-optimal convergence rates for smooth/non-smooth problems under deterministic/stochastic oracles, simultaneously. We identify an alternative acceleration scheme together with an appropriate adaptive step-size that enables optimal convergence rates without knowing, a priori, neither the problem parameters nor the problem setting at hand. Then, we propose the first noise-adaptive second-order algorithm which individually adapts to noise in gradient and Hessian estimates.Moving on non-convex minimization, we have two set of results to present; high probability convergence of adaptive gradient method and adaptive variance reduction methods under two scenarios. For the former, we analyze the AdaGrad algorithm under two noise models; bounded variance and sub-Gaussian noise. We provide order-optimal high probability rates while establishing a set of side results on the boundedness of the iterate sequence. For the latter setting of variance reduction, we study both the more generic setting of streaming data and the more practical sub-setting of finite-sum minimization. Under both scenarios, we develop the first parameter-free variance reduction methods with optimal rates in their respective problem settings.Finally, we study the problem of solving monotone variational inequalities under two noise models; standard bounded variance and the relative noise model in which the error in operator computation is proportional to its norm at the evaluation point. With a compatible, mild set of assumptions, we prove that a class of extra-gradient algorithms with a particular adaptive step-size universally adapts to both models of noise without knowing the setting, a priori.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related MOOCs (32)

Related concepts (34)

Related publications (172)

Introduction to optimization on smooth manifolds: first order methods

Learn to optimize on smooth, nonlinear spaces: Join us to build your foundations (starting at "what is a manifold?") and confidently implement your first algorithm (Riemannian gradient descent).

Mathematical optimization

Mathematical optimization (alternatively spelled optimisation) or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfields: discrete optimization and continuous optimization. Optimization problems arise in all quantitative disciplines from computer science and engineering to operations research and economics, and the development of solution methods has been of interest in mathematics for centuries.

Stochastic gradient descent

Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset of the data).

Convex optimization

Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets (or, equivalently, maximizing concave functions over convex sets). Many classes of convex optimization problems admit polynomial-time algorithms, whereas mathematical optimization is in general NP-hard.

Distributed learning is the key for enabling training of modern large-scale machine learning models, through parallelising the learning process. Collaborative learning is essential for learning from privacy-sensitive data that is distributed across various ...

Modern optimization is tasked with handling applications of increasingly large scale, chiefly due to the massive amounts of widely available data and the ever-growing reach of Machine Learning. Consequently, this area of research is under steady pressure t ...

Volkan Cevher, Kimon Antonakopoulos

While momentum-based accelerated variants of stochastic gradient descent (SGD) are widely used when training machine learning models, there is little theoretical understanding on the generalization error of such methods. In this work, we first show that th ...