Concept# Claude Shannon

Summary

Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American mathematician, electrical engineer and cryptographer known as the "father of information theory".
As a 21-year-old master's degree student at the Massachusetts Institute of Technology (MIT), he wrote his thesis demonstrating that electrical applications of Boolean algebra could construct any logical numerical relationship. Shannon contributed to the field of cryptanalysis for national defense of the United States during World War II, including his fundamental work on codebreaking and secure telecommunications. His mathematical theory of information became very well cited and laid the foundation for the field of information theory.
Biography
Childhood
The Shannon family lived in Gaylord, Michigan, and Claude was born in a hospital in nearby Petoskey. His father, Claude Sr. (1862–1934), was a businessman and for a while, a judge of probate in Gaylord. His mother, Mabel Wolf

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related publications (15)

Loading

Loading

Loading

Related concepts (83)

Information theory

Information theory is the mathematical study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in

Bell Labs

Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984),
then AT&T Bell Laboratories (1984–1996)
and Bell Labs Innovations (1996–2007),
is an American industrial research and scienti

Computer science

Computer science is the study of computation, information, and automation. Computer science spans theoretical disciplines (such as algorithms, theory of computation, and information theory) to applied

Related people (1)

Related courses (10)

COM-406: Foundations of Data Science

We discuss a set of topics that are important for the understanding of modern data science but that are typically not taught in an introductory ML course. In particular we discuss fundamental ideas and techniques that come from probability, information theory as well as signal processing.

COM-401: Cryptography and security

This course introduces the basics of cryptography. We review several types of cryptographic primitives, when it is safe to use them and how to select the appropriate security parameters. We detail how they work and sketch how they can be implemented.

CS-119(c): Information, Computation, Communication

L'objectif de ce cours est d'introduire les étudiants à la pensée algorithmique, de les familiariser avec les fondamentaux de l'Informatique et de développer une première compétence en programmation (langage C++).

We are living in the era of "Big Data", an era characterized by a voluminous amount of available data. Such amount is mainly due to the continuing advances in the computational capabilities for capturing, storing, transmitting and processing data. However, it is not always the volume of data that matters, but rather the "relevant" information that resides in it.
Exactly 70 years ago, Claude Shannon, the father of information theory, was able to quantify the amount of information in a communication scenario based on a probabilistic model of the data. It turns out that Shannon's theory can be adapted to various probability-based information processing fields, ranging from coding theory to machine learning. The computation of some information theoretic quantities, such as the mutual information, can help in setting fundamental limits and devising more efficient algorithms for many inference problems.
This thesis deals with two different, yet intimately related, inference problems in the fields of coding theory and machine learning. We use Bayesian probabilistic formulations for both problems, and we analyse them in the asymptotic high-dimensional regime. The goal of our analysis is to assess the algorithmic performance on the first hand and to predict the Bayes-optimal performance on the second hand, using an information theoretic approach. To this end, we employ powerful analytical tools from statistical physics.
The first problem is a recent forward-error-correction code called sparse superposition code. We consider the extension of such code to a large class of noisy channels by exploiting the similarity with the compressed sensing paradigm. Moreover, we show the amenability of sparse superposition codes to perform
joint distribution matching and channel coding.
In the second problem, we study symmetric rank-one matrix factorization, a prominent model in machine learning and statistics with many applications ranging from community detection to sparse principal component analysis. We provide an explicit expression for the normalized mutual information and the minimum mean-square error of this model in the asymptotic limit. This allows us to prove the optimality of a certain iterative algorithm on a large set of parameters.
A common feature of the two problems stems from the fact that both of them are represented on dense graphical models. Hence, similar message-passing algorithms and analysis tools can be adopted. Furthermore, spatial coupling, a new technique introduced in the context of low-density parity-check (LDPC) codes, can be applied to both problems. Spatial coupling is used in this thesis as a "construction technique" to boost the algorithmic performance and as a "proof technique" to compute some information theoretic quantities.
Moreover, both of our problems retain close connections with spin glass models studied in statistical mechanics of disordered systems. This allows us to use sophisticated techniques developed in statistical physics. In this thesis, we use the potential function predicted by the replica method in order to prove the threshold saturation phenomenon associated with spatially coupled models. Moreover, one of the main contributions of this thesis is proving that the predictions given by the "heuristic" replica method are exact. Hence, our results could be of great interest for the statistical physics community as well, as they help to set a rigorous mathematical foundation of the replica predictions.

Related units (1)

Claude Elwood Shannon in 1948, then of the Bell Labs, published one of the ground breaking papers in the history of engineering [1]. This paper (”A Mathematical Theory of Communication”, Bell System Tech. Journal, Vol. 27, July and October 1948, pp. 379 - 423 and pp. 623 - 656) laid the groundwork of an entirely new scientific discipline, information Theory, that enabled engineers for the first time to deal quantitatively with the elusive concept of information”. In his celebrated work, Shannon cleanly laid the foundation for transmission and storage of information. Using a probabilistic model, his theory helped to get further insight into the achievable limits of information transfer over perturbed medium called channel. Indeed the very same concept is used to predict the limits on data compression and achievable transmission rate on a probabilistic channel.These underlying concepts can be thought of as inequalities involving measures of probability distributions. Shannon defined several such basic measures in his original work. The field of Information Theory grew with researchers finding more results and insights into the fundamental problem of transmission of and storage using probabilistic models. By nature of the subject itself, the results obtained are usually inequalities involving basic Shannon’s measures such as entropies. Some of them are elementary, some rather complicated expressions. In order to prove further theorems as well it required to check whether certain expressions are true in an Information Theoretic sense. This motivated researchers to seek a formal method to check all possible inequalities. Raymond Yeung [2] in 1998 came out with a remarkable framework, which could verify many of the inequalities in this field. His framework thus enabled to verify all inequalities, derived from the basic Shannon measure properties. A central notion of Information Theory is entropy, which Shannon defines as measure of information itself. Given a set of jointly distributed random variables X1, X2, . . . , Xn, we can consider entropies of all random variables H(Xi), entropies of all pairs H (Xi , Xj ), etc. (2n − 1 entropy values for all nonempty subsets of {X1 , X2 , ..., Xn }). For every n-tuple of random variables we get a point in R2n−1, representing entropies of the given distribution. Following [2] we call a point in R2n−1 constructible if it represents entropy values of some collection of n random variables. The set of all constructible points is denoted b y Γ ∗n It is hard to characterize Γ∗n for an arbitrary n (for n ≥ 3, it is not even closed [?]). A more feasible (but also highly non- trivial) problem is to describe the closure Γ ̄∗n of the set Γ∗n. The set Γ ̄∗n n is a convex cone [?], and to characterize it we should describe the class of all linear inequalities of the form λ1H(X1) + . . . + λnH(Xn) + λ1,2H(X1X2) + . . . + λ1,2,3H(X1, X2, X3) + . . . + λ1,2,3,...,nH(X1, X2, X3, . . . , Xn) which are true for any random variables X1, X2, . . . , Xn (λi are real coefficients). Information inequalities are widely used for proving converse coding theorems in Information Theory. Recently interesting applications of information inequalities beyond Information Theory were found [10],[12],[14]. So investigation of the class of all valid information inequalities is an interesting problem. We refer the reader to [15] for a comprehensive treatment of the subject. Yeung’s framework thus helped to verify all the Shannon type inequalities. Yeung and Yan have also developed a software, to computationally verify such inequalities. Since the software is rather outdated, we have made an attempt to make a more efficient and user friendly implementation of the software, hinging from the original work of Yeung. The software, which we call information inequality solver (iis) is freely available for download from EPFL website. The new software suit has the added advantage that it is freed of dependencies on any licensed products such as Matlab (or toolboxes).

2007Shannon in his seminal work \cite{paper:shannon} formalized the framework on the problem of digital communication of information and storage. He quantified the fundamental limits of compression and transmission rates. The quantity \textit{channel capacity} was coined to give an operational meaning on the ultimate limit of information transfer between two entities, transmitter and receiver. By using a random coding argument, he showed the achievability of reliable communication on any noisy channel as long as the rate is less than the channel capacity. More precisely, he argued that, using a fixed composition random codebook at the sender and a maximum likelihood decoder in the receiver, one can achieve almost zero error communication of information. Shannon's result was a promise on the achievability on the amount of data one can transfer over a medium. Achieving those promises has spurred scientific interest since 1948. Significant effort has been spent to find practical codes which gets close to the Shannon limits. For some class of channels, codes which are capacity achieving has been found since then. Low Density Parity Check Codes (LDPC) is one such capacity achieving family of codes for the class of binary erasure channels. Recently, Arikan proposed a family of codes known as Polar codes which are found to be capacity achieving for binary input memoryless channels. Slowly but steadily the longstanding problem of achieving Shannon limit is thus getting settled. In order to realize a practical system which can achieve the Shannon limits, two important things are necessary. One of them is an efficient capacity achieving code sequence and the other is an implementable decoding rule. Both these require the knowledge of the probabilistic law of the channel through which communication take place. At the transmitter one needs to know the channel capacity whereas at the receiver a perfect knowledge of the channel is necessitated to build optimum decoder. The question of whether a decoder can be designed without knowing the underlying channel, yet we can do a reliable communication arises here. This is the subject of discussion in this report. Consider a situation where a communication system is to be designed without explicit knowledge about the channel. Here, neither the transmitter nor the receiver knows the exact channel law. Suppose, the possible list of channels is made available upfront to both entities (transmitter and receiver). Can we devise a reliable communication scheme, irrespective of the actual channel picked without both transmitter and receiver being aware of it. This is the central theme in what is known\footnote{Also referred as \textit{coding for class of channels} as discussed in \cite{paper:blackwell}, \cite{book:Wolfowitz}.} as \textit{coding for compound channels}\cite{book:Csiszar}. Designing a reliable communication system for compound channels essentially involve building a codebook at the transmitter and a decoding rule at the receiver. The encoder-decoder pair together must guarantee reliable communication over a set of channels. In the general setting, no feedback is assumed from the receiver to the transmitter. Thus, a single coding strategy (codebook) must be devised (and fixed) upfront prior to transmission. At the receiver end, the decoding strategy must be devised independent of any knowledge of the channel. Of course, the codebook devised at the sender is perfectly known to the receiver. The answer to the question on whether such a communication system can be build is on the affirmative. In order to talk about the existence of reliable communication strategies, one must first talk about the maximum possible rate, the capacity. The highest achievable rate in a compound setting is known as the \textit{compound capacity} or capacity of the family. This is analogous to the famous Shannon capacity being the ultimate limit on achievable rate for any single channel. The capacity of compound set of memoryless channels has been studied by Blackwell et al in \cite{paper:blackwell} and that of linear dispersive channels is investigated by Root and Varaiya in \cite{paper:root:varaiya}. Lapidoth and Telatar looked at the compound capacity for families of channels with memory, more specifically the finite state channels \cite{paper:lapidoth-telatar1998}. As a special case, they have also derived the compound channel capacity of a class of Gilbert-Elliot channels. A decoder which operates without the knowledge of the channel in this setup is called a universal decoder. It is known that Maximum Mutual Information (MMI) decoders proposed by Goppa are universal. MMI decoders compute the empirical mutual between a received codeword against the codebook and find the best matching word as the true estimate. The complexity of MMI decoder remain fixed even if we were to find structured codes. This motivate us to ask the question whether we can build a universal decoder which offer better structure. Decoding rules which brings additive nature were considered in the literature as a potential scheme. Our work in this has been driven by this line of thoughts. In this work, we focus on class of discrete memoryless channels (DMC) and more specifically binary memoryless channels. We show that it is possible to build a linear universal decoder for any compound binary memoryless channels. The recent introduction of Polar codes motivated us to look into their suitability to the compound channel setup. We have carried out a preliminary investigation in this direction. While it is not clear whether Polar codes are universal under the optimum universal decoding, we find that they are universal for certain restricted classes of compound BMC sets.

2009Related lectures (10)