Publications related to Masked Training of Neural Networks with Partial Gradients

Stochasticity helps to navigate rough landscapes: comparing gradient-descent-based algorithms in the phase retrieval problem

In this paper we investigate how gradient-based algorithms such as gradient descent (GD), (multi-pass) stochastic GD, its persistent variant, and the Langevin algorithm navigate non-convex loss-landscapes and which of them is able to reach the best general ...

IOP PUBLISHING LTD2021

Efficient second-order methods for model compression

Sidak Pal Singh

Second-order information, in the form of Hessian- or Inverse-Hessian-vector products, is a fundamental tool for solving optimization problems. Recently, there has been a tremendous amount of work on utilizing this information for the current compute and me ...

2020

(Geometry Aware) Deep Learning-based Omnidirectional Image Compression

Yamin Sepehri

Omnidirectional images are the spherical visual signals that provide a wide, 360◦, view of a scene from a specific position. Such images are becoming increasingly popular in fields like virtual reality and robotics. Compared to conventional 2D images, the ...

2020

Model Fusion via Optimal Transport

Martin Jaggi, Sidak Pal Singh

Combining different models is a widely used paradigm in machine learning applications. While the most common approach is to form an ensemble of models and average their individual predictions, this approach is often rendered infeasible by given resource co ...

2020

Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

Lenka Zdeborová

p>We study the dynamics of optimization and the generalization properties of one-hidden layer neural networks with quadratic activation function in the overparametrized regime where the layer width m is larger than the input dimension d. We conside ...

Curran Associates, Inc.2020

Deep Generative Models and Applications

Tatjana Chavdarova

Over the past few years, there have been fundamental breakthroughs in core problems in machine learning, largely driven by advances in deep neural networks. The amount of annotated data drastically increased and supervised deep discriminative models exceed ...

EPFL2020

The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Updates

Sebastian Urban Stich, Sai Praneeth Reddy Karimireddy

We analyze (stochastic) gradient descent (SGD) with delayed updates on smooth quasi-convex and non-convex functions and derive concise, non-asymptotic, convergence rates. We show that the rate of convergence in all cases consists of two terms: (i) a stocha ...

MICROTOME PUBL2020

Extrapolation for Large-batch Training in Deep Learning

Martin Jaggi, Sebastian Urban Stich, Tao Lin, Lingjing Kong

Deep learning networks are typically trained by Stochastic Gradient Descent (SGD) methods that iteratively improve the model parameters by estimating a gradient on a very small fraction of the training data. A major roadblock faced when increasing the batc ...

2020

Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems

Negar Kiyavash

Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning. Simple algorithms such as the gradient descent ascent (GDA) are the common practice for solving these ...

2020

Square Attack: a query-efficient black-box adversarial attack via random search

Nicolas Henri Bernard Flammarion, Maksym Andriushchenko, Francesco Croce

We propose the Square Attack, a new score-based black-box

l_2

and

l_\infty

adversarial attack that does not rely on local gradient information and thus is not affected by gradient masking. The Square Attack is based on a randomized search scheme where ...

2020

Masked Training of Neural Networks with Partial Gradients

Graph Chatbot

Chat with Graph Search

Stochasticity helps to navigate rough landscapes: comparing gradient-descent-based algorithms in the phase retrieval problem

Efficient second-order methods for model compression

(Geometry Aware) Deep Learning-based Omnidirectional Image Compression

Model Fusion via Optimal Transport

Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

Deep Generative Models and Applications

The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Updates

Extrapolation for Large-batch Training in Deep Learning

Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems

Square Attack: a query-efficient black-box adversarial attack via random search

Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

(Geometry Aware) Deep Learning-based Omnidirectional Image Compression

The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Updates

Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems

Deep Generative Models and Applications

Square Attack: a query-efficient black-box adversarial attack via random search

Stochasticity helps to navigate rough landscapes: comparing gradient-descent-based algorithms in the phase retrieval problem

Extrapolation for Large-batch Training in Deep Learning

Model Fusion via Optimal Transport

Efficient second-order methods for model compression