Publication

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Related publications (37)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

On the influence of momentum acceleration on online learning

Ali H. Sayed, Bicheng Ying, Kun Yuan

This paper examines the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime. The results establish that momentum methods are equivalent to the standard stochastic g ...

IEEE2016

On the influence of momentum acceleration on online learning

Ali H. Sayed, Bicheng Ying, Kun Yuan

The article examines in some detail the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime. The results establish that momentum methods are equivalent to the stand ...

2016

Adaptive data augmentation for image classification

Pascal Frossard, Alhussein Fawzi

Data augmentation is the process of generating samples by transforming training data, with the target of improving the accuracy and robustness of classifiers. In this paper, we propose a new automatic and adaptive algorithm for choosing the transformations ...

IEEE2016

Theory of representation learning in cortical neural networks

Carlos Stein Naves de Brito

Our brain continuously self-organizes to construct and maintain an internal representation of the world based on the information arriving through sensory stimuli. Remarkably, cortical areas related to different sensory modalities appear to share the same f ...

EPFL2016

The Interchangeability of Learning Rate and Gain in Backpropagation Neural Networks

The backpropagation algorithm is widely used for training multilayer neural networks. In this publication the gain of its activation function(s) is investigated. In specific, it is proven that changing the gain of the activation function is equivalent to c ...

MIT Press1996

Results on the Steepness in Backpropagation Neural Networks

The backpropagation algorithm is widely used for training multilayer neural networks. In this publication the steepness of its activation functions is investigated. In specific, it is discussed that changing the steepness of the activation function is equi ...

1994

Stochastic Gradient Descent for Spectral Embedding with Implicit Orthogonality Constraint

Pascal Frossard, Mireille El Gheche, Giovanni Chierchia

In this paper, we propose a scalable algorithm for spectral embedding. The latter is a standard tool for graph clustering. However, its computational bottleneck is the eigendecomposition of the graph Laplacian matrix, which prevents its application to larg ...

IEEE0