**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Sparse Probabilistic Models

Abstract

This thesis is concerned with a number of novel uses of spatial coupling, applied to a class of probabilistic graphical models. These models include error correcting codes, random constraint satisfaction problems (CSPs) and statistical physics models called diluted spin systems. Spatial coupling is a technique initially developed for channel coding, which provides a recipe to transform a class of sparse linear codes into codes that are longer but more robust at high noise level. In fact it was observed that for coupled codes there are efficient algorithms whose decoding threshold is the optimal one, a phenomenon called threshold saturation. The main aim of this thesis is to explore alternative applications of spatial coupling. The goal is to study properties of uncoupled probabilistic models (not just coding) through the use of the corresponding spatially coupled models. The methods employed are ranging from the mathematically rigorous to the purely experimental. We first explore spatial coupling as a proof technique in the realm of LDPC codes. The Maxwell conjecture states that for arbitrary BMS channels the optimal (MAP) threshold of the standard (uncoupled) LDPC codes is given by the Maxwell construction. We are able to prove the Maxwell Conjecture for any smooth family of BMS channels by using (i) the fact that coupled codes perform optimally (which was already proved) and (ii) that the optimal thresholds of the coupled and uncoupled LDPC codes coincide. The method is used to derive two more results, namely the equality of GEXIT curves above the MAP threshold and the exactness of the averaged Bethe free energy formula derived under the RS cavity method from statistical physics. As a second application of spatial coupling we show how to derive novel bounds on the phase transitions in random constraint satisfaction problems, and possibly a general class of diluted spin systems. In the case of coloring, we investigate what happens to the dynamic and freezing thresholds. The phenomenon of threshold saturation is present also in this case, with the dynamic threshold moving to the condensation threshold, and the freezing moving to colorability. These claims are supported by experimental evidence, but in some cases, such as the saturation of the freezing threshold it is possible to make part of this claim more rigorous. This allows in principle for the computation of thresholds by use of spatial coupling. The proof is in the spirit of the potential method introduced by Kumar, Young, Macris and Pfister for LDPC codes. Finally, we explore how to find solutions in (uncoupled) probabilistic models. To test this, we start with a typical instance of random K-SAT (the base problem), and we build a spatially coupled structure that locally inherits the structure of the base problem. The goal is to run an algorithm for finding a suitable solution in the coupled structure and then "project" this solution to obtain a solution for the base problem. Experimental evidence points to the fact it is indeed possible to use a form of unit-clause propagation (UCP), a simple algorithm, to achieve this goal. This approach works also in regimes where the standard UCP fails on the base problem.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related concepts (22)

Code

In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for c

Algorithm

In mathematics and computer science, an algorithm (ˈælɡərɪðəm) is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algo

Graphical model

A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random var

Related publications (33)

Loading

Loading

Loading

We are living in the era of "Big Data", an era characterized by a voluminous amount of available data. Such amount is mainly due to the continuing advances in the computational capabilities for capturing, storing, transmitting and processing data. However, it is not always the volume of data that matters, but rather the "relevant" information that resides in it.
Exactly 70 years ago, Claude Shannon, the father of information theory, was able to quantify the amount of information in a communication scenario based on a probabilistic model of the data. It turns out that Shannon's theory can be adapted to various probability-based information processing fields, ranging from coding theory to machine learning. The computation of some information theoretic quantities, such as the mutual information, can help in setting fundamental limits and devising more efficient algorithms for many inference problems.
This thesis deals with two different, yet intimately related, inference problems in the fields of coding theory and machine learning. We use Bayesian probabilistic formulations for both problems, and we analyse them in the asymptotic high-dimensional regime. The goal of our analysis is to assess the algorithmic performance on the first hand and to predict the Bayes-optimal performance on the second hand, using an information theoretic approach. To this end, we employ powerful analytical tools from statistical physics.
The first problem is a recent forward-error-correction code called sparse superposition code. We consider the extension of such code to a large class of noisy channels by exploiting the similarity with the compressed sensing paradigm. Moreover, we show the amenability of sparse superposition codes to perform
joint distribution matching and channel coding.
In the second problem, we study symmetric rank-one matrix factorization, a prominent model in machine learning and statistics with many applications ranging from community detection to sparse principal component analysis. We provide an explicit expression for the normalized mutual information and the minimum mean-square error of this model in the asymptotic limit. This allows us to prove the optimality of a certain iterative algorithm on a large set of parameters.
A common feature of the two problems stems from the fact that both of them are represented on dense graphical models. Hence, similar message-passing algorithms and analysis tools can be adopted. Furthermore, spatial coupling, a new technique introduced in the context of low-density parity-check (LDPC) codes, can be applied to both problems. Spatial coupling is used in this thesis as a "construction technique" to boost the algorithmic performance and as a "proof technique" to compute some information theoretic quantities.
Moreover, both of our problems retain close connections with spin glass models studied in statistical mechanics of disordered systems. This allows us to use sophisticated techniques developed in statistical physics. In this thesis, we use the potential function predicted by the replica method in order to prove the threshold saturation phenomenon associated with spatially coupled models. Moreover, one of the main contributions of this thesis is proving that the predictions given by the "heuristic" replica method are exact. Hence, our results could be of great interest for the statistical physics community as well, as they help to set a rigorous mathematical foundation of the replica predictions.

For the past 70 years or so, coding theorists have been aiming at designing transmission schemes with efficient encoding and decoding algorithms that achieve the capacity of various noisy channels. It was not until the '90s that graph-based codes, such as low-density parity-check (LDPC) codes, and their associated low-complexity iterative decoding algorithms were discovered and studied in depth. Although these schemes are efficient, they are not, in general, capacity-achieving. More specifically, these codes perform well up to some algorithmic threshold on the channel parameter, which is lower than the optimal threshold. The gap between the algorithmic and optimal thresholds was finally closed by spatial coupling. In the context of coding, the belief propagation algorithm on spatially coupled codes yields capacity-achieving low-complexity transmission schemes. The reason behind the optimal performance of spatially coupled codes is

`seeding'' perfect information on the replicas at the boundaries of the coupling chain. This extra information makes decoding easier near the boundaries, and this effect is then propagated into the coupling chain upon iterations of the decoding algorithm. Spatial coupling was also applied to various other problems that are governed by low-complexity message-passing algorithms, such as random constraint satisfaction problems, compressive sensing, and statistical physics. Each system has an associated algorithmic threshold and an optimal threshold. As with coding, once the underlying graphs are spatially coupled, the algorithms for these systems exhibit optimal performance. In this thesis, we analyze the performance of iterative low-complexity message-passing algorithms on general spatially coupled systems, and we specialize our results in coding theory applications. To do this, we express the evolution of the state of the system (along iterations of the algorithm) in a variational form, in terms of the so-called potential functional, in the continuum limit approximation. This thesis consists of two parts. In the first part, we consider the dynamic phase of the message-passing algorithm, in which iterations of the algorithm modify the state of the spatially coupled system. Assuming that the boundaries of the coupled chain are appropriately `

seeded'', we find a closed-form analytical formula for the velocity with which the extra information propagates into the chain. We apply this result to coupled irregular LDPC code-ensembles with transmission over general BMS channels and to coupled general scalar systems. We perform numerical simulations for several applications and show that our formula gives values that match the empirical, observed velocity. This confirms that the continuum limit is an approximation well-suited to the derivation of the formula. In the second part of this thesis, we consider the static phase of the message-passing algorithm, when it can no longer modify the state of the system. We introduce a novel proof technique that employs displacement convexity, a mathematical tool from optimal transport, to prove that the potential functional is strictly displacement convex under an alternative structure in the space of probability measures. We hence establish the uniqueness of the state to which the spatially coupled system converges, and we characterize it. We apply this result to the (l,r)-regular Gallager ensemble with transmission over the BEC and to coupled general scalar systems.The two central topics of information theory are the compression and the transmission of data. Shannon, in his seminal work, formalized both these problems and determined their fundamental limits. Since then the main goal of coding theory has been to find practical schemes that approach these limits. Polar codes, recently invented by Arikan, are the first "practical" codes that are known to achieve the capacity for a large class of channels. Their code construction is based on a phenomenon called "channel polarization". The encoding as well as the decoding operation of polar codes can be implemented with O(N log N) complexity, where N is the blocklength of the code. We show that polar codes are suitable not only for channel coding but also achieve optimal performance for several other important problems in information theory. The first problem we consider is lossy source compression. We construct polar codes that asymptotically approach Shannon's rate-distortion bound for a large class of sources. We achieve this performance by designing polar codes according to the "test channel", which naturally appears in Shannon's formulation of the rate-distortion function. The encoding operation combines the successive cancellation algorithm of Arikan with a crucial new ingredient called "randomized rounding". As for channel coding, both the encoding as well as the decoding operation can be implemented with O(N log N) complexity. This is the first known "practical" scheme that approaches the optimal rate-distortion trade-off. We also construct polar codes that achieve the optimal performance for the Wyner-Ziv and the Gelfand-Pinsker problems. Both these problems can be tackled using "nested" codes and polar codes are naturally suited for this purpose. We further show that polar codes achieve the capacity of asymmetric channels, multi-terminal scenarios like multiple access channels, and degraded broadcast channels. For each of these problems, our constructions are the first known "practical" schemes that approach the optimal performance. The original polar codes of Arikan achieve a block error probability decaying exponentially in the square root of the block length. For source coding, the gap between the achieved distortion and the limiting distortion also vanishes exponentially in the square root of the blocklength. We explore other polar-like code constructions with better rates of decay. With this generalization, we show that close to exponential decays can be obtained for both channel and source coding. The new constructions mimic the recursive construction of Arikan and, hence, they inherit the same encoding and decoding complexity. We also propose algorithms based on message-passing to improve the finite length performance of polar codes. In the final two chapters of this thesis we address two important problems in graphical models related to communications. The first problem is in the area of low-density parity-check codes (LDPC). For practical lengths, LDPC codes using message-passing decoding are still the codes to beat. The current analysis, using density evolution, evaluates the performance of these algorithms on a tree. The tree assumption corresponds to using an infinite length code. But in practice, the codes are of finite length. We analyze the message-passing algorithms for this scenario. The absence of tree assumption introduces correlations between various messages. We show that despite this correlation, the prediction of the tree analysis is accurate. The second problem we consider is related to code division multiple access (CDMA) communication using random spreading. The current analysis mainly focuses on the information theoretic limits, i.e., using Gaussian input distribution. However in practice we use modulation schemes like binary phase-shift keying (BPSK), which is far from being Gaussian. The effects of the modulation scheme cannot be analyzed using traditional tools which are based on spectrum of large random matrices. We follow a new approach using tools developed for random spin systems in statistical mechanics. We prove a tight upper bound on the capacity of the system when the user input is BPSK. We also show that the capacity depends only on the power of the spreading sequences and is independent of their exact distribution.