Backpropagation

As a machine-learning algorithm, backpropagation performs a backward pass to adjust the model's parameters, aiming to minimize the mean squared error (MSE). In a single-layered network, backpropagation uses the following steps: Traverse through the network from the input to the output by computing the hidden layers' output and the output layer. (the feedforward step) In the output layer, calculate the derivative of the cost function with respect to the input and the hidden layers. Repeatedly update the weights until they converge or the model has undergone enough iterations. It is an efficient application of the Leibniz chain rule (1673) to such networks. It is also known as the reverse mode of automatic differentiation or reverse accumulation, due to Seppo Linnainmaa (1970). The term "back-propagating error correction" was introduced in 1962 by Frank Rosenblatt, but he did not know how to implement this, even though Henry J. Kelley had a continuous precursor of backpropagation already in 1960 in the context of control theory. Backpropagation computes the gradient of a loss function with respect to the weights of the network for a single input–output example, and does so efficiently, computing the gradient one layer at a time, iterating backward from the last layer to avoid redundant calculations of intermediate terms in the chain rule; this can be derived through dynamic programming. Gradient descent, or variants such as stochastic gradient descent, are commonly used. Strictly the term backpropagation refers only to the algorithm for computing the gradient, not how the gradient is used; but the term is often used loosely to refer to the entire learning algorithm – including how the gradient is used, such as by stochastic gradient descent. In 1986 David E. Rumelhart et al. published an experimental analysis of the technique. This contributed to the popularization of backpropagation and helped to initiate an active period of research in multilayer perceptrons.

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Nicolas Henri Bernard Flammarion, Hristo Georgiev Papazov, Scott William Pesme

In this work, we investigate the effect of momentum on the optimisation trajectory of gradient descent. We leverage a continuous-time approach in the analysis of momentum gradient descent with step size

\gamma

and momentum parameter

\beta

that allows u ...

2024

Graph Chatbot

Chat with Graph Search

Explainable Face Verification via Feature-Guided Gradient Backpropagation

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Explainable Face Verification via Feature-Guided Gradient Backpropagation

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Deep Learning Theory Through the Lens of Diagonal Linear Networks