Concept# Chain rule

Summary

In calculus, the chain rule is a formula that expresses the derivative of the composition of two differentiable functions f and g in terms of the derivatives of f and g. More precisely, if h=f\circ g is the function such that h(x)=f(g(x)) for every x, then the chain rule is, in Lagrange's notation,
:h'(x) = f'(g(x)) g'(x).
or, equivalently,
:h'=(f\circ g)'=(f'\circ g)\cdot g'.
The chain rule may also be expressed in Leibniz's notation. If a variable z depends on the variable y, which itself depends on the variable x (that is, y and z are dependent variables), then z depends on x as well, via the intermediate variable y. In this case, the chain rule is expressed as
:\frac{dz}{dx} = \frac{dz}{dy} \cdot \frac{dy}{dx},
and
: \left.\frac{dz}{dx}\right

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related publications (1)

Related people

No results

Loading

Related units

No results

We derive generalization and excess risk bounds for neural networks using a family of complexity measures based on a multilevel relative entropy. The bounds are obtained by introducing the notion of generated hierarchical coverings of neural networks and by using the technique of chaining mutual information introduced by Asadi et al. '18. The resulting bounds are algorithm-dependent and multiscale: they exploit the multilevel structure of neural networks. This, in turn, leads to an empirical risk minimization problem with a multilevel entropic regularization. The minimization problem is resolved by introducing a multiscale extension of the celebrated Gibbs posterior distribution, proving that the derived distribution achieves the unique minimum. This leads to a new training procedure for neural networks with performance guarantees, which exploits the chain rule of relative entropy rather than the chain rule of derivatives (as in backpropagation), and which takes into account the interactions between different scales of the hypothesis sets of neural networks corresponding to different depths of the hidden layers. To obtain an efficient implementation of the latter, we further develop a multilevel Metropolis algorithm simulating the multiscale Gibbs distribution, with an experiment for a two-layer neural network on the MNIST data set.

Related concepts (37)

In mathematics, the derivative shows the sensitivity of change of a function's output with respect to the input. Derivatives are a fundamental tool of calculus. For example, the derivative of the p

In mathematics, an integral is the continuous analog of a sum, which is used to calculate areas, volumes, and their generalizations. Integration, the process of computing an integral, is one of the

Calculus is the mathematical study of continuous change, in the same way that geometry is the study of shape, and algebra is the study of generalizations of arithmetic operations.
It has two major br

Related lectures (70)

Related courses (39)

Étudier les concepts fondamentaux d'analyse, et le calcul différentiel et intégral des fonctions réelles de plusieurs
variables.

Differentiable manifolds are a certain class of topological spaces which, in a way we will make precise, locally resemble R^n. We introduce the key concepts of this subject, such as vector fields, differential forms, integration of differential forms etc.

This course introduces the analysis and design of linear analog circuits based on operational amplifiers. A Laplace early approach is chosen to treat important concepts such as time and frequency responses, convolution, and filter design. The course is complemented with exercises and simulations.