Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
In the past few years, Machine Learning (ML) techniques have ushered in a paradigm shift, allowing the harnessing of ever more abundant sources of data to automate complex tasks. The technical workhorse behind these important breakthroughs arguably lies in the use of artificial neural networks to learn informative and actionable representations of data, from data. While the number of empirical successes accrues, a solid theoretical comprehension of the unreasonable effectiveness of ML methods in learning from high-dimensional data still proves largely elusive. This is the question addressed in this thesis, through the study of solvable models in high dimensions, satisfying the dual requirement of (a) capturing the key features of practical ML tasks while (b) remaining amenable to mathematical analysis. Borrowing ideas from statistical physics, this thesis presents sharp asymptotic incursions into a selection of central aspects of modern ML.The remarkable versatility of ML models lies in their ability to extract informative features from data. The first part of the thesis delves into analyzing which structural characteristics of these features condition the learning of ML methods. Specifically, it highlights how, in several settings, a theory formulated in terms of two statistical descriptors can tightly capture the learning curves of simple real tasks. For kernel methods in particular, this insight enables one to relate the error scaling laws to the structure of the features.The second part then refines the focus to study which features are extracted in multi-layer neural networks, both (a) when untrained and (b) when trained in the framework of Bayesian learning, or after one large gradient step. In particular, it delineates cases in which Gaussian universality holds and limits the network expressivity, and cases in which neural networks succeed in learning non-trivial features.Finally, supervised learning tasks with fully-connected architectures constitute but a small part of the zoology of modern ML tasks. The last part of the thesis opens up the sharp asymptotic explorations to some modern aspects of the discipline, in particular transport-based generative models, and dot-product attention mechanisms.
Sabine Süsstrunk, Mathieu Salzmann, Tong Zhang, Yi Wu