**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# General Proximal Gradient Method: A Case for Non-Euclidean Norms

Abstract

In this paper, we consider composite convex minimization problems. We advocate the merit of considering Generalized Proximal gradient Methods (GPM) where the norm employed is not Euclidean. To that end, we show the tractability of the general proximity operator for a broad class of structure priors by proposing a polynomial-time approach to approximately compute it. We also identify a special case of regularizers whose proximity operator admits an efficient greedy algorithm. We then introduce a proximity/projection-free accelerated variant of GPM. We illustrate numerically the benefit of non-Euclidean norms, on the estimation quality of the Lasso problem and on the time-complexity of the latent group Lasso problem.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related MOOCs (21)

Related concepts (32)

Related publications (56)

Introduction to optimization on smooth manifolds: first order methods

Learn to optimize on smooth, nonlinear spaces: Join us to build your foundations (starting at "what is a manifold?") and confidently implement your first algorithm (Riemannian gradient descent).

Digital Signal Processing I

Basic signal processing concepts, Fourier analysis and filters. This module can
be used as a starting point or a basic refresher in elementary DSP

Digital Signal Processing II

Adaptive signal processing, A/D and D/A. This module provides the basic
tools for adaptive filtering and a solid mathematical framework for sampling and
quantization

Time complexity

In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to be related by a constant factor.

Lasso (statistics)

In statistics and machine learning, lasso (least absolute shrinkage and selection operator; also Lasso or LASSO) is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the resulting statistical model. It was originally introduced in geophysics, and later by Robert Tibshirani, who coined the term. Lasso was originally formulated for linear regression models. This simple case reveals a substantial amount about the estimator.

Complexity class

In computational complexity theory, a complexity class is a set of computational problems "of related resource-based complexity". The two most commonly analyzed resources are time and memory. In general, a complexity class is defined in terms of a type of computational problem, a model of computation, and a bounded resource like time or memory. In particular, most complexity classes consist of decision problems that are solvable with a Turing machine, and are differentiated by their time or space (memory) requirements.

Nicolas Henri Bernard Flammarion, Scott William Pesme

In this paper we fully describe the trajectory of gradient flow over diagonal linear networks in the limit of vanishing initialisation. We show that the limiting flow successively jumps from a saddle of the training loss to another until reaching the minim ...

2023Mikhail Kapralov, Jakab Tardos, Navid Nouri, Aidasadat Mousavifar

In this paper, we resolve the complexity problem of spectral graph sparcification in dynamic streams up to polylogarithmic factors. Using a linear sketch we design a streaming algorithm that uses (O) over tilde (n) space, and with high probability, recover ...

Rachid Guerraoui, El Mahdi El Mhamdi, Alexandre David Olivier Maurer, Sébastien Louis Alexandre Rouault

Modern machine learning architectures distinguish servers and workers. Typically, a d-dimensional model is hosted by a server and trained by n workers, using a distributed stochastic gradient descent (SGD) optimization scheme. At each SGD step, the goal is ...