**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# High-Dimensional Inference on Dense Graphs with Applications to Coding Theory and Machine Learning

Abstract

We are living in the era of "Big Data", an era characterized by a voluminous amount of available data. Such amount is mainly due to the continuing advances in the computational capabilities for capturing, storing, transmitting and processing data. However, it is not always the volume of data that matters, but rather the "relevant" information that resides in it. Exactly 70 years ago, Claude Shannon, the father of information theory, was able to quantify the amount of information in a communication scenario based on a probabilistic model of the data. It turns out that Shannon's theory can be adapted to various probability-based information processing fields, ranging from coding theory to machine learning. The computation of some information theoretic quantities, such as the mutual information, can help in setting fundamental limits and devising more efficient algorithms for many inference problems. This thesis deals with two different, yet intimately related, inference problems in the fields of coding theory and machine learning. We use Bayesian probabilistic formulations for both problems, and we analyse them in the asymptotic high-dimensional regime. The goal of our analysis is to assess the algorithmic performance on the first hand and to predict the Bayes-optimal performance on the second hand, using an information theoretic approach. To this end, we employ powerful analytical tools from statistical physics. The first problem is a recent forward-error-correction code called sparse superposition code. We consider the extension of such code to a large class of noisy channels by exploiting the similarity with the compressed sensing paradigm. Moreover, we show the amenability of sparse superposition codes to perform joint distribution matching and channel coding. In the second problem, we study symmetric rank-one matrix factorization, a prominent model in machine learning and statistics with many applications ranging from community detection to sparse principal component analysis. We provide an explicit expression for the normalized mutual information and the minimum mean-square error of this model in the asymptotic limit. This allows us to prove the optimality of a certain iterative algorithm on a large set of parameters. A common feature of the two problems stems from the fact that both of them are represented on dense graphical models. Hence, similar message-passing algorithms and analysis tools can be adopted. Furthermore, spatial coupling, a new technique introduced in the context of low-density parity-check (LDPC) codes, can be applied to both problems. Spatial coupling is used in this thesis as a "construction technique" to boost the algorithmic performance and as a "proof technique" to compute some information theoretic quantities. Moreover, both of our problems retain close connections with spin glass models studied in statistical mechanics of disordered systems. This allows us to use sophisticated techniques developed in statistical physics. In this thesis, we use the potential function predicted by the replica method in order to prove the threshold saturation phenomenon associated with spatially coupled models. Moreover, one of the main contributions of this thesis is proving that the predictions given by the "heuristic" replica method are exact. Hence, our results could be of great interest for the statistical physics community as well, as they help to set a rigorous mathematical foundation of the replica predictions.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related concepts (41)

Information theory

Information theory is the mathematical study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in

Code

In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for c

System

A system is a group of interacting or interrelated elements that act according to a set of rules to form a unified whole. A system, surrounded and influenced by its environment, is described by its

Related publications (111)

Loading

Loading

Loading

For the past 70 years or so, coding theorists have been aiming at designing transmission schemes with efficient encoding and decoding algorithms that achieve the capacity of various noisy channels. It was not until the '90s that graph-based codes, such as low-density parity-check (LDPC) codes, and their associated low-complexity iterative decoding algorithms were discovered and studied in depth. Although these schemes are efficient, they are not, in general, capacity-achieving. More specifically, these codes perform well up to some algorithmic threshold on the channel parameter, which is lower than the optimal threshold. The gap between the algorithmic and optimal thresholds was finally closed by spatial coupling. In the context of coding, the belief propagation algorithm on spatially coupled codes yields capacity-achieving low-complexity transmission schemes. The reason behind the optimal performance of spatially coupled codes is

`seeding'' perfect information on the replicas at the boundaries of the coupling chain. This extra information makes decoding easier near the boundaries, and this effect is then propagated into the coupling chain upon iterations of the decoding algorithm. Spatial coupling was also applied to various other problems that are governed by low-complexity message-passing algorithms, such as random constraint satisfaction problems, compressive sensing, and statistical physics. Each system has an associated algorithmic threshold and an optimal threshold. As with coding, once the underlying graphs are spatially coupled, the algorithms for these systems exhibit optimal performance. In this thesis, we analyze the performance of iterative low-complexity message-passing algorithms on general spatially coupled systems, and we specialize our results in coding theory applications. To do this, we express the evolution of the state of the system (along iterations of the algorithm) in a variational form, in terms of the so-called potential functional, in the continuum limit approximation. This thesis consists of two parts. In the first part, we consider the dynamic phase of the message-passing algorithm, in which iterations of the algorithm modify the state of the spatially coupled system. Assuming that the boundaries of the coupled chain are appropriately `

seeded'', we find a closed-form analytical formula for the velocity with which the extra information propagates into the chain. We apply this result to coupled irregular LDPC code-ensembles with transmission over general BMS channels and to coupled general scalar systems. We perform numerical simulations for several applications and show that our formula gives values that match the empirical, observed velocity. This confirms that the continuum limit is an approximation well-suited to the derivation of the formula. In the second part of this thesis, we consider the static phase of the message-passing algorithm, when it can no longer modify the state of the system. We introduce a novel proof technique that employs displacement convexity, a mathematical tool from optimal transport, to prove that the potential functional is strictly displacement convex under an alternative structure in the space of probability measures. We hence establish the uniqueness of the state to which the spatially coupled system converges, and we characterize it. We apply this result to the (l,r)-regular Gallager ensemble with transmission over the BEC and to coupled general scalar systems.This thesis is devoted to information-theoretic aspects of community detection. The importance of community detection is due to the massive amount of scientific data today that describes relationships between items from a network, e.g., a social network. Items from this network can be inherently partitioned into a known number of communities, but the partition can only be inferred from the data.
To estimate the underlying partition, data scientists can apply any type of advanced statistical techniques; but the data could be very noisy, or the number of data is inadequate. A fundamental question here is about the possibility of weak recovery: does the data contain a sufficient amount of information that enables us to produce a non-trivial estimate of the partition?
For the purpose of mathematical analysis, the above problem can be formulated as Bayesian inference on generative models. These models, including the stochastic block model (SBM) and censored block model (CBM), consider a random graph generated based on a hidden partition that divides the nodes in the graph into labelled groups. In the SBM, nodes are connected with a probability depending on the labels of the endpoints. Whereas, in the CBM, hidden variables are measured through a noisy channel, and the measurement outcomes form a weighted graph. In both models, inference is the task of recovering the hidden partition from the observed graph. The criteria for weak recovery can be studied via an information-theoretic quantity called mutual information. Once the asymptotic mutual information is computed, phase transitions for the weak recovery can be located.
This thesis pertains to rigorous derivations of single-letter variational expressions for the asymptotic mutual information for models in community detection. These variational expressions, known as the replica predictions, come from heuristic methods of statistical physics. We present our development of new rigorous methods for confirming the replica predictions. These methods are based on extending the recently introduced adaptive interpolation method.
We prove the replica prediction for the SBM in the dense-graph regime with two groups of asymmetric size. The existing proofs in the literature are indirect, as they involve mapping the model to an external problem whose mutual information is determined by a combination of methods. Here, on the contrary, we provide a self-contained and direct proof.
Next, we extend this method to sparse models. Before this thesis, adaptive interpolation was known for providing a conceptually simple proof for replica predictions for dense graphs. Whereas, for a sparse graph, the replica prediction involves a more complicated variational expression, and rigorous confirmations are often lacking or obtained by rather complicated methods. Therefore, we focus on a simple version of CBM on sparse graphs, where hidden variables are measured through a binary erasure channel, for which we fully prove the replica prediction by the adaptive interpolation.
The key for extending the adaptive interpolation to a broader class of sparse models is a concentration result for the so-called "multi-overlaps". This concentration forms the basis of the replica "symmetric" prediction. We prove this concentration result for a related sparse model in the context of physics. This provides inspiration for further development of the adaptive interpolation.

This thesis focuses on two kinds of statistical inference problems in signal processing and data science. The first problem is the estimation of a structured informative tensor from the observation of a noisy tensor in which it is buried. The structure comes from the possibility to decompose the informative tensor as the sum of a small number of rank-one tensors (small compared to its size). Such structure has applications in data science where data, organized into arrays, can often be explained by the interaction between a few features characteristic of the problem. The second problem is the estimation of a signal input to a feedforward neural network whose output is observed. It is relevant for many applications (phase retrieval, quantized signals) where the relation between the measurements and the quantities of interest is not linear.
We look at these two statistical models in different high-dimensional limits corresponding to situations where the amount of observations and size of the signal become infinitely large. These asymptotic regimes are motivated by the ever-increasing computational power and storage capacity that make possible the processing and handling of large data sets. We take an information-theoretic approach in order to establish fundamental statistical limits of estimation in high-dimensional regimes. In particular, the main contributions of this thesis are the proofs of exact formulas for the asymptotic normalized mutual information associated with these inference problems. These are low-dimensional variational formulas that can nonetheless capture the behavior of a large system where each component interacts with all the others. Owing to the relationship between mutual information and Bayesian inference, we use the solutions to these variational problems to rigorously predict the asymptotic minimum mean-square error (MMSE), the error achieved by the (Bayes) optimal estimator. We can thus compare algorithmic performances to the statistical limit given by the MMSE.
Variational formulas for the mutual information are referred to as replica symmetric (RS) ansätze due to the predictions of the heuristic replica method from statistical physics. In the past decade proofs of the validity of these predictions started to emerge.
The general strategy is to show that the RS ansatz is both an upper and lower bound on the asymptotic normalized mutual information. The present work leverages on the adaptive interpolation method that proposes a unified way to prove the two bounds. We extend the adaptive interpolation to situations where the order parameter of the problem is not a scalar but a matrix, and to high-dimensional regimes that differ from the one for which the RS formula is usually conjectured. Our proofs also demonstrate the modularity of the method. Indeed, using statistical models previously studied in the literature as building blocks of more complex ones (e.g., estimated signal with a model of structure), we derive RS formulas for the normalized mutual information associated with estimation problems that are relevant to modern applications.