**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.

Publication# Fundamental Limits in Statistical Learning Problems: Block Models and Neural Networks

Abstract

This thesis focuses on two selected learning problems: 1) statistical inference on graphs models, and, 2) gradient descent on neural networks, with the common objective of defining and analysing the measures that characterize the fundamental limits.In the first part of the thesis, we consider spin synchronization problems on graphs, which consist of reconstructing a vector of n independent spins living on the vertices of a graph, based on noisy observations of their interactions on the edges of the graph. In particular, we consider synchronization models with erasure (BEC) side-information, where the spins of a small fraction of nodes are revealed, and investigate how the addition of such side-information influences the correlations between spins at distant sites. We show that on trees, whenever spins at distant sites are nearly independent given the edge observations, thenthey are still nearly independent given the edge observations and the side-information. We conjecture this to hold for any graph. On the other hand, (Kanade et al., 2014) conjectured that, on regular trees and on Galton-Watson trees, whenever any small fraction of node labels are revealed, the boundary at infinite depth becomes ineffective for detecting the root bit, even in the reconstruction regime. We explain how this can be used for computing the limiting entropy of the sparse Stochastic Block Model (SBM) with two symmetric communities. Finally, we show that the latter conjecture does not hold for every tree.In the second part of the thesis, we consider the problem of learning Boolean target functions with gradient descent (GD) on fully connected neural networks. We introduce the notion of "Initial Alignment" (INAL) between a neural network at initialization and a target function and prove that if a network and target do not have a noticeable INAL, then noisy gradient descent on a fully connected network with i.i.d. Gaussian initialization cannot learn the target in polynomial time. We show that for finite depth networks trained with the correlation loss, the result can be extended beyond Boolean inputs. Moreover, we prove that in a similar setting, the generalization error can be lower-bounded in terms of the noise-stability of the target function, supporting a conjecture made in (Zhang et al., 2021).We then show that in the distribution shift setting, when the data withholding corresponds to freezing a single feature, the generalisation error admits a tight characterisation in terms of the Boolean influence for several relevant architectures. This is shown on linear models and supported experimentally on other models such as MLPs and Transformers. In particular, this puts forward the hypothesis that for such architectures and for learning logical functions, GD tends to have an implicit bias towards low-degree representations. Finally, we consider a 'curriculum learning' (CL) strategy for learning k -parities on d bits of a binary string. We show that a wise choice of training examples, involving two or more product distributions, allows to learn k -parities in d O (1) time with a fully connected neural network trained with GD. We further show that for another class of functions - namely the 'Hamming mixtures' - CL strategies involving a bounded number of product distributions are not beneficial.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related MOOCs (32)

Related concepts (37)

Related publications (283)

Ontological neighbourhood

Neuronal Dynamics - Computational Neuroscience of Single Neurons

The activity of neurons in the brain and the code used by these neurons is described by mathematical neuron models at different levels of detail.

Neuronal Dynamics - Computational Neuroscience of Single Neurons

The activity of neurons in the brain and the code used by these neurons is described by mathematical neuron models at different levels of detail.

Neuronal Dynamics 2- Computational Neuroscience: Neuronal Dynamics of Cognition

This course explains the mathematical and computational models that are used in the field of theoretical neuroscience to analyze the collective dynamics of thousands of interacting neurons.

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition.

Artificial neural networks (ANNs, also shortened to neural networks (NNs) or neural nets) are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains. An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal to other neurons.

There are many types of artificial neural networks (ANN). Artificial neural networks are computational models inspired by biological neural networks, and are used to approximate functions that are generally unknown. Particularly, they are inspired by the behaviour of neurons and the electrical signals they convey between input (such as from the eyes or nerve endings in the hand), processing, and output from the brain (such as reacting to light, touch, or heat). The way neurons semantically communicate is an area of ongoing research.

In this PhD manuscript, we explore optimisation phenomena which occur in complex neural networks through the lens of $2$-layer diagonal linear networks. This rudimentary architecture, which consists of a two layer feedforward linear network with a diagonal ...

In inverse problems, the task is to reconstruct an unknown signal from its possibly noise-corrupted measurements. Penalized-likelihood-based estimation and Bayesian estimation are two powerful statistical paradigms for the resolution of such problems. They ...

Alexander Mathis, Alberto Silvio Chiappa, Alessandro Marin Vargas, Axel Bisi

Here we provide the neural data, activation and predictions for the best models and result dataframes of our article "Task-driven neural network models predict neural dynamics of proprioception". It contains the behavioral and neural experimental data (cu ...