As a machine-learning algorithm, backpropagation performs a backward pass to adjust the model's parameters, aiming to minimize the mean squared error (MSE). In a single-layered network, backpropagation uses the following steps: Traverse through the network from the input to the output by computing the hidden layers' output and the output layer. (the feedforward step) In the output layer, calculate the derivative of the cost function with respect to the input and the hidden layers. Repeatedly update the weights until they converge or the model has undergone enough iterations. It is an efficient application of the Leibniz chain rule (1673) to such networks. It is also known as the reverse mode of automatic differentiation or reverse accumulation, due to Seppo Linnainmaa (1970). The term "back-propagating error correction" was introduced in 1962 by Frank Rosenblatt, but he did not know how to implement this, even though Henry J. Kelley had a continuous precursor of backpropagation already in 1960 in the context of control theory. Backpropagation computes the gradient of a loss function with respect to the weights of the network for a single input–output example, and does so efficiently, computing the gradient one layer at a time, iterating backward from the last layer to avoid redundant calculations of intermediate terms in the chain rule; this can be derived through dynamic programming. Gradient descent, or variants such as stochastic gradient descent, are commonly used. Strictly the term backpropagation refers only to the algorithm for computing the gradient, not how the gradient is used; but the term is often used loosely to refer to the entire learning algorithm – including how the gradient is used, such as by stochastic gradient descent. In 1986 David E. Rumelhart et al. published an experimental analysis of the technique. This contributed to the popularization of backpropagation and helped to initiate an active period of research in multilayer perceptrons.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (32)
EE-559: Deep learning
This course explores how to design reliable discriminative and generative neural networks, the ethics of data acquisition and model deployment, as well as modern multi-modal models.
DH-406: Machine learning for DH
This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple
CS-456: Deep reinforcement learning
This course provides an overview and introduces modern methods for reinforcement learning (RL.) The course starts with the fundamentals of RL, such as Q-learning, and delves into commonly used approac
Show more
Related lectures (116)
Deep Learning Fundamentals
Introduces deep learning, from logistic regression to neural networks, emphasizing the need for handling non-linearly separable data.
Automatic Differentiation: BackProp revisited
Discusses automatic differentiation, emphasizing reverse mode differentiation for optimizing convolutional layer filters by gradient descent.
Neural Networks: Training and Activation
Explores neural networks, activation functions, backpropagation, and PyTorch implementation.
Show more
Related publications (127)

Explainable Face Verification via Feature-Guided Gradient Backpropagation

Touradj Ebrahimi, Yuhang Lu, Zewei Xu

Recent years have witnessed significant advance- ment in face recognition (FR) techniques, with their applications widely spread in people’s lives and security-sensitive areas. There is a growing need for reliable interpretations of decisions of such syste ...
2024

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Nicolas Henri Bernard Flammarion, Hristo Georgiev Papazov, Scott William Pesme

In this work, we investigate the effect of momentum on the optimisation trajectory of gradient descent. We leverage a continuous-time approach in the analysis of momentum gradient descent with step size γ\gamma and momentum parameter β\beta that allows u ...
2024

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Scott William Pesme

In this PhD manuscript, we explore optimisation phenomena which occur in complex neural networks through the lens of 22-layer diagonal linear networks. This rudimentary architecture, which consists of a two layer feedforward linear network with a diagonal ...
EPFL2024
Show more
Related concepts (25)
Stochastic gradient descent
Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset of the data).
Gradient descent
In mathematics, gradient descent (also often called steepest descent) is a iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the gradient (or approximate gradient) of the function at the current point, because this is the direction of steepest descent. Conversely, stepping in the direction of the gradient will lead to a local maximum of that function; the procedure is then known as gradient ascent.
Deep learning
Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.
Show more
Related MOOCs (1)
Introduction to optimization on smooth manifolds: first order methods
Learn to optimize on smooth, nonlinear spaces: Join us to build your foundations (starting at "what is a manifold?") and confidently implement your first algorithm (Riemannian gradient descent).

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.