**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Theory of Deep Learning: Neural Tangent Kernel and Beyond

Abstract

In the recent years, Deep Neural Networks (DNNs) have managed to succeed at tasks that previously appeared impossible, such as human-level object recognition, text synthesis, translation, playing games and many more. In spite of these major achievements, our understanding of these models, in particular of what happens during their training, remains very limited. This PhD started with the introduction of the Neural Tangent Kernel (NTK) to describe the evolution of the function represented by the network during training. In the infinite-width limit, i.e. when the number of neurons in the layers of the network grows to infinity, the NTK converges to a deterministic and time-independent limit, leading to a simple yet complete description of the dynamics of infinitely-wide DNNs. This allowed one to give the first general proof of convergence of DNNs to a global minimum, and yielded the first description of the limiting spectrum of the Hessian of the loss surface of DNNs throughout training.More importantly, the NTK plays a crucial role in describing the generalization abilities of DNNs, i.e. the performance of the trained network on unseen data. The NTK analysis uncovered a direct link between the function learned by infinitely wide DNNs and Kernel Ridge Regression predictors, whose generalization properties are studied in this thesis using tools of random matrix theory. Our analysis of KRR reveals the importance of the eigendecomposition of the NTK, which is affected by a number of architectural choices. In very deep networks, an ordered regime and a chaotic regime appear, determined by the choice of non-linearity and the balance between the weights and bias parameters; these two phases are characterized by different speeds of decay of the eigenvalues of the NTK, leading to a tradeoff between convergence speed and generalization. In practical contexts such as Generative Adversarial Networks or Topology Optimization, the network architecture can be chosen to guarantee certain properties of the NTK and its spectrum.These results give an almost complete description DNNs in this infinite-width limit. It is then natural to wonder how it extends to finite-width networks used in practice. In the so-called NTK regime, the discrepancy between finite- and infinite-widths DNNs is mainly a result of the variance w.r.t. to the sampling of the parameters, as shown empirically and mathematically relying on the similarity between DNNs and random feature models.In contrast to the NTK regime, where the NTK remains constant during training, there exist so-called active regimes, where the evolution of the NTK is significant, which appear in a number of settings. One such regime appears in Deep Linear Networks with a very small initialization, where the training dynamics approach a sequence of saddle-points, representing linear maps of increasing rank, leading to a low-rank bias which is absent in the NTK regime.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related MOOCs (8)

Related publications (26)

Related concepts (28)

Neural tangent kernel

In the study of artificial neural networks (ANNs), the neural tangent kernel (NTK) is a kernel that describes the evolution of deep artificial neural networks during their training by gradient descent. It allows ANNs to be studied using theoretical tools from kernel methods. In general, a kernel is a positive-semidefinite symmetric function of two inputs which represents some notion of similarity between the two inputs. The NTK is a specific kernel derived from a given neural network; in general, when the neural network parameters change during training, the NTK evolves as well.

Deep learning

Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

Generative adversarial network

A generative adversarial network (GAN) is a class of machine learning framework and a prominent framework for approaching generative AI. The concept was initially developed by Ian Goodfellow and his colleagues in June 2014. In a GAN, two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is another agent's loss. Given a training set, this technique learns to generate new data with the same statistics as the training set.

Seyedeh Sahar Sadrizadeh, Hatef Otroshi Shahreza

In digital imaging, especially in the process of data acquisition and transmission, images are often affected by impulsive noise. Therefore, it is essential to remove impulsive noise from images befor

In the last decade, deep neural networks have achieved tremendous success in many fields of machine learning.However, they are shown vulnerable against adversarial attacks: well-designed, yet impercep

Deep Neural Networks (DNNs) have achieved great success in a wide range of applications, such as image recognition, object detection, and semantic segmentation. Even thoughthe discriminative power of