**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Accelerated SGD for Non-Strongly-Convex Least Squares

Résumé

We consider stochastic approximation for the least squares regression problem in the non-strongly convex setting. We present the first practical algorithm that achieves the optimal prediction error rates in terms of dependence on the noise of the problem, as $O(d/t)$ while accelerating the forgetting of the initial conditions to $O(d/t^2)$. Our new algorithm is based on a simple modification of the accelerated gradient descent. We provide convergence results for both the averaged and the last iterate of the algorithm. In order to describe the tightness of these new bounds, we present a matching lower bound in the noiseless setting and thus show the optimality of our algorithm.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (10)

Algorithme du gradient

Lalgorithme du gradient, aussi appelé algorithme de descente de gradient, désigne un algorithme d'optimisation différentiable. Il est par conséquent destiné à minimiser une fonction réelle différentia

Algorithme

thumb|Algorithme de découpe d'un polygone quelconque en triangles (triangulation).
Un algorithme est une suite finie et non ambiguë d'instructions et d’opérations permettant de résoudre une classe de

Méthode des moindres carrés

La méthode des moindres carrés, indépendamment élaborée par Legendre et Gauss au début du , permet de comparer des données expérimentales, généralement entachées d’erreurs de mesure, à un modèle mat

Publications associées (10)

Chargement

Chargement

Chargement

We have developed a new derivative-free algorithm based on Radial Basis Functions (RBFs). Derivative-free optimization is an active field of research and several algorithms have been proposed recently. Problems of this nature in the industrial setting are quite frequent. The reason is that in a number of applications the optimization process contains simulation packages which are treated as black boxes. The development of our own algorithm was originally motivated by an application in biomedical imaging: the medical image registration problem. The particular characteristics of this problem have incited us to develop a new optimization algorithm based on trust-region methods. However it has been designed to be generic and to be applied to a wide range of problems. The main originality of our approach is the use of RBFs to build the models. In particular we have adapted the existing theory based on quadratic models to our own models and developed new procedures especially designed for models based on RBFs. We have tested our algorithm called BOOSTERS against state-of-the-art methods (UOBYQA, NEWUOA, DFO). On the medical image registration problem, BOOSTERS appears to be the method of choice. The tests on problems from the CUTEr collection show that BOOSTERS is comparable to, but not better than other methods on small problems (size 2-20). It is performing very well for medium size problems (20-80). Moreover, it is able to solve problems of dimension 200, which is considered very large in derivative-free optimization. We have also developed a new class of algorithms combining the robustness of derivative-free algorithms with the faster rate of convergence characterizing Newtonlike-methods. In fact, they define a new class of algorithms lying between derivative-free optimization and quasi-Newton methods. These algorithms are built on the skeleton of our derivative-free algorithm but they can incorporate the gradient when it is available. They can be interpreted as a way of doping derivative-free algorithms with derivatives. If the derivatives are available at each iteration, then our method can be seen as an alternative to quasi-Newton methods. At the opposite, if the derivatives are never evaluated, then the algorithm is totally similar to BOOSTERS. It is a very interesting alternative to existing methods for problems whose objective function is expensive to evaluate and when the derivatives are not available. In this situation, the gradient can be approximated by finite differences and its costs corresponds to n additional function evaluations assuming that Rn is the domain of definition of the objective function. We have compared our method with CFSQP and BTRA, two gradient-based algorithms, and the results show that our doped method performs best. We have also a theoretical analysis of the medical image registration problem based on maximization of mutual information. Most of the current research in this field is concentrated on registration based on nonlinear image transformation. However, little attention has been paid to the theoretical properties of the optimization problem. In our analysis, we focus on the continuity and the differentiability of the objective function. We show in particular that performing a registration without extension of the reference image may lead to discontinuities in the objective function. But we demonstrate that, under some mild assumptions, the function is differentiable almost everywhere. Our analysis is important from an optimization point of view and conditions the choice of a solver. The usual practice is to use generic optimization packages without worrying about the differentiability of the objective function. But the use of gradient-based methods when the objective function is not differentiable may result in poor performance or even in absence of convergence. One of our objectives with this analysis is also that practitioners become aware of these problems and to propose them new algorithms having a potential interest for their applications.

The analysis of collections of visual data, e.g., their classification, modeling and clustering, has become a problem of high importance in a variety of applications. Meanwhile, image data captured in uncontrolled environments by arbitrary users is very likely to be exposed to geometric transformations. Therefore, efficient methods are needed for analyzing high-dimensional visual data sets that can cope with geometric transformations of the visual content of interest. In this thesis, we study parametric models for transformation-invariant analysis of geometrically transformed image data, which provide low-dimensional image representations that capture relevant information efficiently. We focus on transformation manifolds, which are image sets created by parametrizable geometric transformations of a reference image model. Transformation manifolds provide a geometric interpretation of several image analysis problems. In particular, image registration corresponds to the computation of the projection of the target image onto the transformation manifold of the reference image. Similarly, in classification, the class label of a query image can be estimated in a transformation-invariant way by comparing its distance to transformation manifolds that represent different image classes. In this thesis, we explore several problems related to the registration, modeling, and classification of images with transformation manifolds. First, we address the problem of sampling transformation manifolds of known parameterization, where we focus on the target applications of image registration and classification in the sampling. We first propose an iterative algorithm for sampling a manifold such that the selected set of samples gives an accurate estimate of the distance of a query image to the manifold. We then extend this method to a classification setting with several transformation manifolds representing different image classes. We develop an algorithm to jointly sample multiple transformation manifolds such that the class label of query images can be estimated accurately by comparing their distances to the class-representative manifold samples. The proposed methods outperform baseline sampling schemes in image registration and classification. Next, we study the problem of learning transformation manifolds that are good models of a given set of geometrically transformed image data. We first learn a representative pattern whose transformation manifold fits well the input images and then generalize the problem to a supervised classification setting, where we jointly learn multiple class-representative pattern transformation manifolds from training images with known class labels. The proposed manifold learning methods exploit the information of the type of the geometric transformation in the data to compute an accurate data model, which is ignored in previous manifold learning algorithms. Finally, we focus on the usage of transformation manifolds in multiscale image registration. We consider two different methods in image registration, namely, the tangent distance method and the minimization of the image intensity difference with gradient descent. We present a multiscale performance analysis of these methods. We derive upper bounds for the alignment errors yielded by the two methods and analyze the variations of these bounds with noise and low-pass filtering, which is useful for gaining an understanding of the performance of these methods in image registration. To the best of our knowledge, these are the first such studies in multiscale registration settings. Geometrically transformed image sets have a particular structure, and classical image analysis methods do not always suit well for the treatment of such data. This thesis is motivated by this observation and proposes new techniques and insights for handling geometric transformations in image analysis and processing.

This paper considers the multi-agent linear least-squares problem in a server-agent network architecture. The system comprises multiple agents, each with a set of local data points. The agents are connected to a server, and there is no inter-agent communication. The agents' goal is to compute a linear model that optimally fits the collective data. The agents, however, cannot share their data points. In principle, the agents can solve this problem by collaborating with the server using the server-agent network variant of the classical gradient-descent method. However, when the data points are ill-conditioned, the gradient-descent method requires a large number of iterations to converge. We propose an iterative pre-conditioning technique to mitigate the deleterious impact of the data points' conditioning on the convergence rate of the gradient-descent method. Unlike the conventional preconditioning techniques, the pre-conditioner matrix used in our proposed technique evolves iteratively. We show that our proposed algorithm converges linearly with an improved rate of convergence in comparison to both the classical and the accelerated gradient-descent methods. For the special case, when the solution of the least-squares problem is unique, our algorithm converges to the solution superlinearly. Through numerical experiments on benchmark least-squares problems, we validate our theoretical findings, and also demonstrate our algorithm's improved robustness against process noise. (C) 2021 Elsevier Ltd. All rights reserved.