**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# An Accelerated First-Order Method for Non-convex Optimization on Manifolds

Abstract

We describe the first gradient methods on Riemannian manifolds to achieve accelerated rates in the non-convex case. Under Lipschitz assumptions on the Riemannian gradient and Hessian of the cost function, these methods find approximate first-order critical points faster than regular gradient descent. A randomized version also finds approximate second-order critical points. Both the algorithms and their analyses build extensively on existing work in the Euclidean case. The basic operation consists in running the Euclidean accelerated gradient descent method (appropriately safe-guarded against non-convexity) in the current tangent space, then moving back to the manifold and repeating. This requires lifting the cost function from the manifold to the tangent space, which can be done for example through the Riemannian exponential map. For this approach to succeed, the lifted cost function (called the pullback) must retain certain Lipschitz properties. As a contribution of independent interest, we prove precise claims to that effect, with explicit constants. Those claims are affected by the Riemannian curvature of the manifold, which in turn affects the worst-case complexity bounds for our optimization algorithms.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related publications (90)

Loading

Loading

Loading

Related concepts (31)

Convex optimization

Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets (or, equivalently, maximizing concave functions over convex sets

Algorithm

In mathematics and computer science, an algorithm (ˈælɡərɪðəm) is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algo

Manifold

In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point. More precisely, an n-dimensional manifold, or n-manifold for sho

We consider the problem of provably finding a stationary point of a smooth function to be minimized on the variety of bounded-rank matrices. This turns out to be unexpectedly delicate. We trace the difficulty back to a geometric obstacle: On a nonsmooth set, there may be sequences of points along which standard measures of stationarity tend to zero, but whose limit points are not stationary. We name such events apocalypses, as they can cause optimization algorithms to converge to non-stationary points. We illustrate this explicitly for an existing optimization algorithm on bounded-rank matrices. To provably find stationary points, we modify a trust-region method on a standard smooth parameterization of the variety. The method relies on the known fact that second-order stationary points on the parameter space map to stationary points on the variety. Our geometric observations and proposed algorithm generalize beyond bounded-rank matrices. We give a geometric characterization of apocalypses on general constraint sets, which implies that Clarke-regular sets do not admit apocalypses. Such sets include smooth manifolds, manifolds with boundaries, and convex sets. Our trust-region method supports parameterization by any complete Riemannian manifold.

We have developed a new derivative-free algorithm based on Radial Basis Functions (RBFs). Derivative-free optimization is an active field of research and several algorithms have been proposed recently. Problems of this nature in the industrial setting are quite frequent. The reason is that in a number of applications the optimization process contains simulation packages which are treated as black boxes. The development of our own algorithm was originally motivated by an application in biomedical imaging: the medical image registration problem. The particular characteristics of this problem have incited us to develop a new optimization algorithm based on trust-region methods. However it has been designed to be generic and to be applied to a wide range of problems. The main originality of our approach is the use of RBFs to build the models. In particular we have adapted the existing theory based on quadratic models to our own models and developed new procedures especially designed for models based on RBFs. We have tested our algorithm called BOOSTERS against state-of-the-art methods (UOBYQA, NEWUOA, DFO). On the medical image registration problem, BOOSTERS appears to be the method of choice. The tests on problems from the CUTEr collection show that BOOSTERS is comparable to, but not better than other methods on small problems (size 2-20). It is performing very well for medium size problems (20-80). Moreover, it is able to solve problems of dimension 200, which is considered very large in derivative-free optimization. We have also developed a new class of algorithms combining the robustness of derivative-free algorithms with the faster rate of convergence characterizing Newtonlike-methods. In fact, they define a new class of algorithms lying between derivative-free optimization and quasi-Newton methods. These algorithms are built on the skeleton of our derivative-free algorithm but they can incorporate the gradient when it is available. They can be interpreted as a way of doping derivative-free algorithms with derivatives. If the derivatives are available at each iteration, then our method can be seen as an alternative to quasi-Newton methods. At the opposite, if the derivatives are never evaluated, then the algorithm is totally similar to BOOSTERS. It is a very interesting alternative to existing methods for problems whose objective function is expensive to evaluate and when the derivatives are not available. In this situation, the gradient can be approximated by finite differences and its costs corresponds to n additional function evaluations assuming that Rn is the domain of definition of the objective function. We have compared our method with CFSQP and BTRA, two gradient-based algorithms, and the results show that our doped method performs best. We have also a theoretical analysis of the medical image registration problem based on maximization of mutual information. Most of the current research in this field is concentrated on registration based on nonlinear image transformation. However, little attention has been paid to the theoretical properties of the optimization problem. In our analysis, we focus on the continuity and the differentiability of the objective function. We show in particular that performing a registration without extension of the reference image may lead to discontinuities in the objective function. But we demonstrate that, under some mild assumptions, the function is differentiable almost everywhere. Our analysis is important from an optimization point of view and conditions the choice of a solver. The usual practice is to use generic optimization packages without worrying about the differentiability of the objective function. But the use of gradient-based methods when the objective function is not differentiable may result in poor performance or even in absence of convergence. One of our objectives with this analysis is also that practitioners become aware of these problems and to propose them new algorithms having a potential interest for their applications.

Stochastic gradient descent (SGD) and randomized coordinate descent (RCD) are two of the workhorses for training modern automated decision systems. Intriguingly, convergence properties of these methods are not well-established as we move away from the specific case of smooth minimization. In this dissertation, we focus on related problems of nonsmooth optimization and min-max optimization to improve the theoretical understanding of stochastic algorithms.First, we study SGD-based adaptive algorithms and propose a regret analysis framework overcoming the limitations of the existing ones in the convex case. In the nonconvex case, we prove convergence of an adaptive gradient algoritm for solving constrained weakly convex optimization, generalizing the previously known results on unconstrained smooth optimization. We also propose an algorithm combining Nesterov's smoothing with SGD to solve convex problems with infinitely many linear constraints, with optimal rates.Then, we move on to convex-concave min-max problems with bilinear coupling and analyze primal-dual coordinate descent (PDCD) algorithms. We obtain the first PDCD methods with the optimal $\mathcal{O}(1/k)$ rate on the the standard optimality measure expected primal-dual gap, which was an open question since 2014. Our analysis also aims to explain the practical behavior of these algorithms by showing that the last iterate enjoys adaptive linear convergence without altering the parameters, depending on a certain error bound condition. Furthermore, we propose an algorithm combining the favorable properties of two branches of PDCD methods: the new method uses large step sizes with dense data and its per-iteration cost depends on the number of nonzeros of the data matrix. Thanks to these unique properties, this method enjoys compelling practical performance complementing its rigorous theoretical guarantees.Next, we consider monotone variational inequalities that generalize convex-concave min-max problems with nonbilinear coupling. We introduce variance reduced algorithms that converge under the same set of assumptions as their deterministic counterparts and improve the best-known complexities for solving convex-concave min-max problems with finite-sum structure. Optimality of our algorithms for this problem class is established in a recent work via matching lower bounds. Finally, we show our preliminary results on policy optimization methods for solving two player zero-sum Markov games for competitive reinforcement learning (RL). Even though this is a nonconvex-nonconcave min-max problem in general, thanks to the special structure, it is tractable to find an approximate Nash equilibrium. We introduce an algorithm that improves the best-known sample complexity of policy gradient methods. This development combines tools from RL and stochastic primal-dual optimization, showing the importance of techniques from convex-concave optimization.