**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Concept# Redresseur (réseaux neuronaux)

Résumé

In the context of artificial neural networks, the rectifier or ReLU (rectified linear unit) activation function is an activation function defined as the positive part of its argument:
where x is the input to a neuron. This is also known as a ramp function and is analogous to half-wave rectification in electrical engineering. This activation function was introduced by Kunihiko Fukushima in 1969 in the context of visual feature extraction in hierarchical neural networks. It was later argued that it has strong biological motivations and mathematical justifications. In 2011 it was found to enable better training of deeper networks, compared to the widely used activation functions prior to 2011, e.g., the logistic sigmoid (which is inspired by probability theory; see logistic regression) and its more practical counterpart, the hyperbolic tangent. The rectifier is, , the most popular activation function for deep neural networks.
Rectified linear units find applications in computer vision and speech recognition using deep neural nets and computational neuroscience.
Sparse activation: For example, in a randomly initialized network, only about 50% of hidden units are activated (have a non-zero output).
Better gradient propagation: Fewer vanishing gradient problems compared to sigmoidal activation functions that saturate in both directions.
Efficient computation: Only comparison, addition and multiplication.
Scale-invariant: .
Rectifying activation functions were used to separate specific excitation and unspecific inhibition in the neural abstraction pyramid, which was trained in a supervised way to learn several computer vision tasks. In 2011, the use of the rectifier as a non-linearity has been shown to enable training deep supervised neural networks without requiring unsupervised pre-training. Rectified linear units, compared to sigmoid function or similar activation functions, allow faster and effective training of deep neural architectures on large and complex datasets.

Source officielle

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Publications associées (7)

Concepts associés (20)

Cours associés (38)

DH-406: Machine learning for DH

This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple

PHYS-467: Machine learning for physicists

Machine learning and data analysis are becoming increasingly central in sciences including physics. In this course, fundamental principles and methods of machine learning will be introduced and practi

MATH-212: Analyse numérique et optimisation

L'étudiant apprendra à résoudre numériquement divers problèmes mathématiques. Les propriétés théoriques de ces
méthodes seront discutées.

Fonction d'activation

Dans le domaine des réseaux de neurones artificiels, la fonction d'activation est une fonction mathématique appliquée à un signal en sortie d'un neurone artificiel. Le terme de "fonction d'activation" vient de l'équivalent biologique "potentiel d'activation", seuil de stimulation qui, une fois atteint entraîne une réponse du neurone. La fonction d'activation est souvent une fonction non linéaire. Un exemple de fonction d'activation est la fonction de Heaviside, qui renvoie tout le temps 1 si le signal en entrée est positif, ou 0 s'il est négatif.

Fonction softmax

vignette|Fonction softmax utilisée après un CNN (Réseau neuronal convolutif). Ici le vecteur (35.4, 38.1, -5.0) est transformée en (0.06, 0.94, 0.00). Dans ce contexte de classification d'images, le chien est reconnu. En mathématiques, la fonction softmax, aussi appelée fonction softargmax ou fonction exponentielle normalisée, est une généralisation de la fonction logistique. Elle convertit un vecteur de K nombres réels en une distribution de probabilités sur K choix.

Vanishing gradient problem

In machine learning, the vanishing gradient problem is encountered when training artificial neural networks with gradient-based learning methods and backpropagation. In such methods, during each iteration of training each of the neural networks weights receives an update proportional to the partial derivative of the error function with respect to the current weight. The problem is that in some cases, the gradient will be vanishingly small, effectively preventing the weight from changing its value.

Arthur Ulysse Jacot-Guillarmod

In the recent years, Deep Neural Networks (DNNs) have managed to succeed at tasks that previously appeared impossible, such as human-level object recognition, text synthesis, translation, playing games and many more. In spite of these major achievements, our understanding of these models, in particular of what happens during their training, remains very limited. This PhD started with the introduction of the Neural Tangent Kernel (NTK) to describe the evolution of the function represented by the network during training. In the infinite-width limit, i.e. when the number of neurons in the layers of the network grows to infinity, the NTK converges to a deterministic and time-independent limit, leading to a simple yet complete description of the dynamics of infinitely-wide DNNs. This allowed one to give the first general proof of convergence of DNNs to a global minimum, and yielded the first description of the limiting spectrum of the Hessian of the loss surface of DNNs throughout training.More importantly, the NTK plays a crucial role in describing the generalization abilities of DNNs, i.e. the performance of the trained network on unseen data. The NTK analysis uncovered a direct link between the function learned by infinitely wide DNNs and Kernel Ridge Regression predictors, whose generalization properties are studied in this thesis using tools of random matrix theory. Our analysis of KRR reveals the importance of the eigendecomposition of the NTK, which is affected by a number of architectural choices. In very deep networks, an ordered regime and a chaotic regime appear, determined by the choice of non-linearity and the balance between the weights and bias parameters; these two phases are characterized by different speeds of decay of the eigenvalues of the NTK, leading to a tradeoff between convergence speed and generalization. In practical contexts such as Generative Adversarial Networks or Topology Optimization, the network architecture can be chosen to guarantee certain properties of the NTK and its spectrum.These results give an almost complete description DNNs in this infinite-width limit. It is then natural to wonder how it extends to finite-width networks used in practice. In the so-called NTK regime, the discrepancy between finite- and infinite-widths DNNs is mainly a result of the variance w.r.t. to the sampling of the parameters, as shown empirically and mathematically relying on the similarity between DNNs and random feature models.In contrast to the NTK regime, where the NTK remains constant during training, there exist so-called active regimes, where the evolution of the NTK is significant, which appear in a number of settings. One such regime appears in Deep Linear Networks with a very small initialization, where the training dynamics approach a sequence of saddle-points, representing linear maps of increasing rank, leading to a low-rank bias which is absent in the NTK regime.

Séances de cours associées (358)

Réseaux neuronaux : Réseaux neuronaux profondsMATH-212: Analyse numérique et optimisation

Explore les bases des réseaux neuraux, en mettant l'accent sur les réseaux neuraux profonds, leur architecture et leur formation.

Méthodes de noyau: Réseaux neuronauxPHYS-467: Machine learning for physicists

Couvre les fondamentaux des réseaux neuronaux, en mettant l'accent sur les noyaux RBF et SVM.

Réseaux neuronaux : formation et activationCIVIL-226: Introduction to machine learning for engineers

Explore les réseaux neuronaux, les fonctions d'activation, la rétropropagation et l'implémentation de PyTorch.

In this paper, we study an emerging class of neural networks, the Morphological Neural networks, from some modern perspectives. Our approach utilizes ideas from tropical geometry and mathematical morphology. First, we state the training of a binary morphological classifier as a Difference-of-Convex optimization problem and extend this method to multiclass tasks. We then focus on general morphological networks trained with gradient descent variants and show, quantitatively via pruning schemes as well as qualitatively, the sparsity of the resulted representations compared to FeedForward networks with ReLU activations as well as the effect the training optimizer has on such compression techniques. Finally, we show how morphological networks can be employed to guarantee monotonicity and present a softened version of a known architecture, based on Maslov Dequantization, which alleviates issues of gradient propagation associated with its "hard" counterparts and moderately improves performance.

In the last decade, deep neural networks have achieved tremendous success in many fields of machine learning.However, they are shown vulnerable against adversarial attacks: well-designed, yet imperceptible, perturbations can make the state-of-the-art deep neural networks output incorrect results.Understanding adversarial attacks and designing algorithms to make deep neural networks robust against these attacks are key steps to building reliable artificial intelligence in real-life applications.In this thesis, we will first formulate the robust learning problem.Based on the notations of empirical robustness and verified robustness, we design new algorithms to achieve both of these types of robustness.Specifically, we investigate the robust learning problem from the optimization perspectives.Compared with classic empirical risk minimization, we show the slow convergence and large generalization gap in robust learning.Our theoretical and numerical analysis indicates that these challenges arise, respectively, from non-smooth loss landscapes and model's fitting hard adversarial instances.Our insights shed some light on designing algorithms for mitigating these challenges.Robust learning has other challenges, such as large model capacity requirements and high computational complexity.To solve the model capacity issue, we combine robust learning with model compression.We design an algorithm to obtain sparse and binary neural networks and make it robust.To decrease the computational complexity, we accelerate the existing adversarial training algorithm and preserve its performance stability.In addition to making models robust, our research provides other benefits.Our methods demonstrate that robust models, compared with non-robust ones, usually utilize input features in a way more similar to the way human beings use them, hence the robust models are more interpretable.To obtain verified robustness, our methods indicate the geometric similarity of the decision boundaries near data points.Our approaches towards reliable artificial intelligence can not only render deep neural networks more robust in safety-critical applications but also make us better aware of how they work.