**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Person# Franck Raymond Gabriel

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related units

Loading

Courses taught by this person

Loading

Related research domains

Loading

Related publications

Loading

People doing similar research

Loading

People doing similar research

No results

Related publications (2)

Related research domains

No results

Loading

Loading

Related units (4)

Courses taught by this person (1)

The goal of this class is to teach how to look at two-dimensional field theories, how to analyse them, how to put structures on them. In the end, the student should have a good picture into what we understand for what is happening at the intersection of Quantum Field Theory and Statistical Mechanics

Franck Raymond Gabriel, Clément Hongler

The Neural Tangent Kernel is a new way to understand the gradient descent in deep neural networks, connecting them with kernel methods. In this talk, I'll introduce this formalism and give a number of results on the Neural Tangent Kernel and explain how they give us insight into the dynamics of neural networks during training and into their generalization features.

, , , , ,

Supervised deep learning involves the training of neural networks with a large number N of parameters. For large enough N, in the so-called over-parametrized regime, one can essentially fit the training data points. Sparsitybased arguments would suggest that the generalization error increases as N grows past a certain threshold N*. Instead, empirical studies have shown that in the over-parametrized regime, generalization error keeps decreasing with N. We resolve this paradox through a new framework. We rely on the so-called Neural Tangent Kernel, which connects large neural nets to kernel methods, to show that the initialization causes finite-size random fluctuations parallel to f(N) - < f(N)>parallel to similar to N-1/4 of the neural net output function f(N) around its expectation < f(N)>. These affect the generalization error epsilon(f(N)) for classification: under natural assumptions, it decays to a plateau value epsilon(f(infinity)) in a power-law fashion similar to N-1/2. This description breaks down at a so-called jamming transition N = N*. At this threshold, we argue that parallel to f(N)parallel to diverges. This result leads to a plausible explanation for the cusp in test error known to occur at N*. Our results are confirmed by extensive empirical observations on the MNIST and CIFAR image datasets. Our analysis finally suggests that, given a computational envelope, the smallest generalization error is obtained using several networks of intermediate sizes, just beyond N*, and averaging their outputs.