Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
In the context of artificial neural networks, the rectifier or ReLU (rectified linear unit) activation function is an activation function defined as the positive part of its argument: where x is the input to a neuron. This is also known as a ramp function and is analogous to half-wave rectification in electrical engineering. This activation function was introduced by Kunihiko Fukushima in 1969 in the context of visual feature extraction in hierarchical neural networks. It was later argued that it has strong biological motivations and mathematical justifications. In 2011 it was found to enable better training of deeper networks, compared to the widely used activation functions prior to 2011, e.g., the logistic sigmoid (which is inspired by probability theory; see logistic regression) and its more practical counterpart, the hyperbolic tangent. The rectifier is, , the most popular activation function for deep neural networks. Rectified linear units find applications in computer vision and speech recognition using deep neural nets and computational neuroscience. Sparse activation: For example, in a randomly initialized network, only about 50% of hidden units are activated (have a non-zero output). Better gradient propagation: Fewer vanishing gradient problems compared to sigmoidal activation functions that saturate in both directions. Efficient computation: Only comparison, addition and multiplication. Scale-invariant: . Rectifying activation functions were used to separate specific excitation and unspecific inhibition in the neural abstraction pyramid, which was trained in a supervised way to learn several computer vision tasks. In 2011, the use of the rectifier as a non-linearity has been shown to enable training deep supervised neural networks without requiring unsupervised pre-training. Rectified linear units, compared to sigmoid function or similar activation functions, allow faster and effective training of deep neural architectures on large and complex datasets.
The capabilities of deep learning systems have advanced much faster than our ability to understand them. Whilst the gains from deep neural networks (DNNs) are significant, they are accompanied by a growing risk and gravity of a bad outcome. This is tr ...
Michaël Unser, Alexis Marie Frederic Goujon
Marco Picasso, Alexandre Caboussat, Maude Girardin