Publication

Momentum-Based Policy Gradient with Second-Order Information

Publications associées (40)

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Connectez-vous pour utiliser Chat avec Graph Search

Theory of representation learning in cortical neural networks

Carlos Stein Naves de Brito

Our brain continuously self-organizes to construct and maintain an internal representation of the world based on the information arriving through sensory stimuli. Remarkably, cortical areas related to different sensory modalities appear to share the same f ...

EPFL2016

Stochastic gradient descent with finite samples sizes

Ali H. Sayed, Stefan Vlaski, Bicheng Ying, Kun Yuan

The minimization of empirical risks over finite sample sizes is an important problem in large-scale machine learning. A variety of algorithms has been proposed in the literature to alleviate the computational burden per iteration at the expense of converge ...

IEEE2016

Stochastic Spectral Descent for Restricted Boltzmann Machines.

Volkan Cevher

Restricted Boltzmann Machines (RBMs) are widely used as building blocks for deep learning models. Learning typically proceeds by using stochastic gradient descent, and the gradients are estimated with sampling methods. However, the gradient estimation is a ...

2015

Probabilistic inverse reinforcement learning in unknown environments

Christos Dimitrakakis

We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to sol ...

2013

Fast Proximal Algorithms For Self-Concordant Function Minimization With Application To Sparse Graph Selection

Volkan Cevher, Anastasios Kyrillidis

The convex l(1)-regularized log det divergence criterion has been shown to produce theoretically consistent graph learning. However, this objective function is challenging since the l(1)-regularization is nonsmooth, the log det objective is not globally Li ...

IEEE2013

Wavelet Shrinkage With Consistent Cycle Spinning Generalizes Total Variation Denoising

Michaël Unser, Emrah Bostan, Ulugbek Kamilov

We introduce a new wavelet-based method for the implementation of Total-Variation-type denoising. The data term is least-squares, while the regularization term is gradient-based. The particularity of our method is to exploit a link between the discrete gra ...

IEEE2012

Robust Bayesian reinforcement learning through tight lower bounds

Christos Dimitrakakis

In the Bayesian approach to sequential decision making, exact calculation of the (subjective) utility is intractable. This extends to most special cases of interest, such as reinforcement learning problems. While utility bounds are known to exist for this ...

2011

Adding prediction risk to the theory of reward learning

This article analyzes the simple Rescorla-Wagner learning rule from the vantage point of least squares learning theory. In particular, it suggests how measures of risk, such as prediction risk, can be used to adjust the learning constant in reinforcement l ...

2007

Adding prediction risk to the theory of reward learning

2007

Local Learning Algorithm For Optical Neural Networks

Demetri Psaltis

An anti-Hebbian local learning algorithm for two-layer optical neural networks is introduced. With this learning rule, the weight update for a certain connection depends only on the input and output of that connection and a global, scalar error signal. The ...

1992