Momentum-Based Policy Gradient with Second-Order Information
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
We introduce a new wavelet-based method for the implementation of Total-Variation-type denoising. The data term is least-squares, while the regularization term is gradient-based. The particularity of our method is to exploit a link between the discrete gra ...
An anti-Hebbian local learning algorithm for two-layer optical neural networks is introduced. With this learning rule, the weight update for a certain connection depends only on the input and output of that connection and a global, scalar error signal. The ...
Restricted Boltzmann Machines (RBMs) are widely used as building blocks for deep learning models. Learning typically proceeds by using stochastic gradient descent, and the gradients are estimated with sampling methods. However, the gradient estimation is a ...
The minimization of empirical risks over finite sample sizes is an important problem in large-scale machine learning. A variety of algorithms has been proposed in the literature to alleviate the computational burden per iteration at the expense of converge ...
The convex l(1)-regularized log det divergence criterion has been shown to produce theoretically consistent graph learning. However, this objective function is challenging since the l(1)-regularization is nonsmooth, the log det objective is not globally Li ...
This article analyzes the simple Rescorla-Wagner learning rule from the vantage point of least squares learning theory. In particular, it suggests how measures of risk, such as prediction risk, can be used to adjust the learning constant in reinforcement l ...
We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to sol ...
This article analyzes the simple Rescorla-Wagner learning rule from the vantage point of least squares learning theory. In particular, it suggests how measures of risk, such as prediction risk, can be used to adjust the learning constant in reinforcement l ...
Our brain continuously self-organizes to construct and maintain an internal representation of the world based on the information arriving through sensory stimuli. Remarkably, cortical areas related to different sensory modalities appear to share the same f ...
In the Bayesian approach to sequential decision making, exact calculation of the (subjective) utility is intractable. This extends to most special cases of interest, such as reinforcement learning problems. While utility bounds are known to exist for this ...