In mathematics, gradient descent (also often called steepest descent) is a iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the gradient (or approximate gradient) of the function at the current point, because this is the direction of steepest descent. Conversely, stepping in the direction of the gradient will lead to a local maximum of that function; the procedure is then known as gradient ascent. It is particularly useful in machine learning for minimizing the cost or loss function. Gradient descent should not be confused with local search algorithms, although both are iterative methods for optimization. Gradient descent is generally attributed to Augustin-Louis Cauchy, who first suggested it in 1847. Jacques Hadamard independently proposed a similar method in 1907. Its convergence properties for non-linear optimization problems were first studied by Haskell Curry in 1944, with the method becoming increasingly well-studied and used in the following decades. A simple extension of gradient descent, stochastic gradient descent, serves as the most basic algorithm used for training most deep networks today. Gradient descent is based on the observation that if the multi-variable function is defined and differentiable in a neighborhood of a point , then decreases fastest if one goes from in the direction of the negative gradient of at . It follows that, if for a small enough step size or learning rate , then . In other words, the term is subtracted from because we want to move against the gradient, toward the local minimum. With this observation in mind, one starts with a guess for a local minimum of , and considers the sequence such that We have a monotonic sequence so, hopefully, the sequence converges to the desired local minimum. Note that the value of the step size is allowed to change at every iteration. With certain assumptions on the function (for example, convex and Lipschitz) and particular choices of (e.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (29)
EE-612: Fundamentals in statistical pattern recognition
This course provides in-depth understanding of the most fundamental algorithms in statistical pattern recognition or machine learning (including Deep Learning) as well as concrete tools (as Python sou
EE-613: Machine Learning for Engineers
The objective of this course is to give an overview of machine learning techniques used for real-world applications, and to teach how to implement and use them in practice. Laboratories will be done i
DH-406: Machine learning for DH
This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple
Show more
Related lectures (301)
Deep Learning Fundamentals
Introduces deep learning, from logistic regression to neural networks, emphasizing the need for handling non-linearly separable data.
Kernel Methods: Neural Networks
Covers the fundamentals of neural networks, focusing on RBF kernels and SVM.
Estimation Methods
Covers various methods for estimating model parameters, such as method of moments and maximum likelihood estimation.
Show more
Related publications (448)

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Nicolas Henri Bernard Flammarion, Hristo Georgiev Papazov, Scott William Pesme

In this work, we investigate the effect of momentum on the optimisation trajectory of gradient descent. We leverage a continuous-time approach in the analysis of momentum gradient descent with step size γ\gamma and momentum parameter β\beta that allows u ...
2024

Explainable Face Verification via Feature-Guided Gradient Backpropagation

Touradj Ebrahimi, Yuhang Lu, Zewei Xu

Recent years have witnessed significant advance- ment in face recognition (FR) techniques, with their applications widely spread in people’s lives and security-sensitive areas. There is a growing need for reliable interpretations of decisions of such syste ...
2024
Show more
Related concepts (24)
Backpropagation
As a machine-learning algorithm, backpropagation performs a backward pass to adjust the model's parameters, aiming to minimize the mean squared error (MSE). In a single-layered network, backpropagation uses the following steps: Traverse through the network from the input to the output by computing the hidden layers' output and the output layer. (the feedforward step) In the output layer, calculate the derivative of the cost function with respect to the input and the hidden layers.
Convex optimization
Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets (or, equivalently, maximizing concave functions over convex sets). Many classes of convex optimization problems admit polynomial-time algorithms, whereas mathematical optimization is in general NP-hard.
Mathematical optimization
Mathematical optimization (alternatively spelled optimisation) or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfields: discrete optimization and continuous optimization. Optimization problems arise in all quantitative disciplines from computer science and engineering to operations research and economics, and the development of solution methods has been of interest in mathematics for centuries.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.