**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Neural Tangent Kernel: Convergence and Generalization in Neural Networks (Invited Paper)

Abstract

The Neural Tangent Kernel is a new way to understand the gradient descent in deep neural networks, connecting them with kernel methods. In this talk, I'll introduce this formalism and give a number of results on the Neural Tangent Kernel and explain how they give us insight into the dynamics of neural networks during training and into their generalization features.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related publications (10)

Loading

Loading

Loading

Related concepts (8)

Neural network

A neural network can refer to a neural circuit of biological neurons (sometimes also called a biological neural network), a network of artificial neurons or nodes in the case of an artificial neur

Gradient descent

In mathematics, gradient descent (also often called steepest descent) is a iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated ste

Paper

Paper is a thin sheet material produced by mechanically or chemically processing cellulose fibres derived from wood, rags, grasses, or other vegetable sources in water, draining the water through a fi

In this thesis, we propose new algorithms to solve inverse problems in the context of biomedical images. Due to ill-posedness, solving these problems require some prior knowledge of the statistics of the underlying images. The traditional algorithms, in the field, assume prior knowledge related to smoothness or sparsity of these images. Recently, they have been outperformed by the second generation algorithms which harness the power of neural networks to learn required statistics from training data. Even more recently, last generation deep-learning-based methods have emerged which require neither training nor training data. This thesis devises algorithms which progress through these generations. It extends these generations to novel formulations and applications while bringing more robustness. In parallel, it also progresses in terms of complexity, from proposing algorithms for problems with 1D data and an exact known forward model to the ones with 4D data and an unknown parametric forward model. We introduce five main contributions. The last three of them propose deep-learning-based latest-generation algorithms that require no prior training. 1) We develop algorithms to solve the continuous-domain formulation of inverse problems with both classical Tikhonov and total-variation regularizations. We formalize the problems, characterize the solution set, and devise numerical approaches to find the solutions. 2) We propose an algorithm that improves upon end-to-end neural-network-based second generation algorithms. In our method, a neural network is first trained as a projector on a training set, and is then plugged in as a projector inside the projected gradient descent (PGD). Since the problem is nonconvex, we relax the PGD to ensure convergence to a local minimum under some constraints. This method outperforms all the previous generation algorithms for Computed Tomography (CT). 3) We develop a novel time-dependent deep-image-prior algorithm for modalities that involve a temporal sequence of images. We parameterize them as the output of an untrained neural network fed with a sequence of latent variables. To impose temporal directionality, the latent variables are assumed to lie on a 1D manifold. The network is then tuned to minimize the data fidelity. We obtain state-of-the-art results in dynamic magnetic resonance imaging (MRI) and even recover intra-frame images. 4) We propose a novel reconstruction paradigm for cryo-electron-microscopy (CryoEM) called CryoGAN. Motivated by generative adversarial networks (GANs), we reconstruct a biomolecule's 3D structure such that its CryoEM measurements resemble the acquired data in a distributional sense. The algorithm is pose-or-likelihood-estimation-free, needs no ab initio, and is proven to have a theoretical guarantee of recovery of the true structure. 5) We extend CryoGAN to reconstruct continuously varying conformations of a structure from heterogeneous data. We parameterize the conformations as the output of a neural network fed with latent variables on a low-dimensional manifold. The method is shown to recover continuous protein conformations and their energy landscape.

In this thesis, we advocate that Computer-Aided Engineering could benefit from a Geometric Deep Learning revolution, similarly to the way that Deep Learning revolutionized Computer Vision. To do so, we consider a variety of Computer-Aided Engineering problems, including physics simulation, design optimization, shape parameterization and shape reconstruction. For each of these problems, we develop novel algorithms that use Geometric Deep Learning to improve the capabilities of existing systems. First, we demonstrate how Geometric Deep Learning architectures can be used to learn to emulate physics simulations. Specifically, we design a neural architecture which, given as input a 3D surface mesh, directly regresses physical quantities of interest defined over the mesh surface. The key to making our approach practical is re-meshing the original shape using a polycube map, which makes it possible to perform computations on Graphic Process Units efficiently. This results in a speed up of 2 orders of magnitude with respect to physics simulators with little loss in accuracy: our main motivation is to provide lightweight performance feedback to improve interactivity in early design stages. Furthermore, being a neural network, our physics emulator is naturally differentiable with respect to input geometry parameters, allowing us to solve shape design problems through gradient-descent. The resulting algorithm outperforms state of-the-art methods by 5 to 20% for 2D optimization tasks and, in contrast to existing methods, our approach can be further used to optimize raw 3D geometry. This could empower designers and engineers to improve the performance of a given design automatically, i.e. without requiring any specific knowledge about the physics of the problem they are trying to solve. To perform shape optimization robustly, we develop novel parametric representations for 3D surface meshes that can be used as strong priors during the optimization process. To this end, we introduce a differentiable way to produce explicit surface mesh representations from Neural Signed Distance Functions. Our key insight is that by reasoning on how implicit field perturbations impact local surface geometry, one can ultimately differentiate the 3D location of surface samples with respect to the underlying neural implicit field. This results in surface mesh parameterizations that can handle topology changes, something that is not feasible with currently available techniques. Finally, we propose a pipeline for reconstructing and editing 3D shapes from line drawings that leverages our end-to-end differentiable surface mesh representation. When integrated into a user interface that provides camera parameters for the sketches, we can exploit our latent parametrization to refine a 3D mesh so that its projections match the external contours outlined in the sketch. We show that this is crucial to make our approach robust with respect to domain gap. Furthermore, it can be used for shape refinement given only single pen strokes. This system could allow engineers and designers to translate legacy 2D sketches to real-world 3D models that can readily be used for downstream tasks such as physics simulations or fabrication, or to interact and modify 3D geometry in the most natural way possible, i.e. with a pen stroke.

Nicolas Henri Bernard Flammarion, Loucas Pillaud-Vivien

The training of neural networks by gradient descent methods is a cornerstone of the deep learning revolution. Yet, despite some recent progress, a complete theory explaining its success is still missing. This article presents, for orthogonal input vectors, a precise description of the gradient flow dynamics of training one-hidden layer ReLU neural networks for the mean squared error at small initialisation. In this setting, despite non-convexity, we show that the gradient flow converges to zero loss and characterise its implicit bias towards minimum variation norm. Furthermore, some interesting phenomena are highlighted: a quantitative description of the initial alignment phenomenon and a proof that the process follows a specific saddle to saddle dynamics.

2022