**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Cooperative Data Exchange and Private Information Retrieval

Abstract

Coding techniques have been well studied and used for improving communication quality by combating noise and mitigating interference. Recently, it has been shown that the same coding techniques can also be exploited to further improve communication performance and provide specific communication features even when the communication channel is ideal. In this thesis, we study two problems where coding techniques are used for improving communications in distributed systems and protecting the privacy of the client from untrusted servers, respectively.\

The first part of this thesis studies the cooperative data exchange problem for fully connected networks. While many previous studies have shown that the problem can be solved by algorithms based on submodular function minimization, we tackle this problem via a concept we refer to as "conditioning basis", which is closely linked to linear coding schemes with particular additional properties. We show that such special linear coding schemes are optimal for the cooperative data exchange problem. Hence, by searching the existence of such a conditioning basis and special linear coding schemes, we can solve this problem with lower complexity. We propose a deterministic algorithm for this problem and briefly show how to construct the optimal linear coding schemes starting from a Vandermonde matrix. Moreover, we show that our new method can be used to solve two generalized problems, which are cooperative data exchange with weighted cost and successive local omniscience problems.\

The second part of this thesis investigates the problem of private information retrieval with side information. Specifically, three different extensions are studied: multi-message, multi-server, and multi-user, respectively. For each problem, we provide a proof of the converse for the download rate as well as propose efficient approaches to construct optimal coding schemes. For multi-message and multi-server cases, we give closed-form expressions for the download rates and introduce two useful tools, {\it conditioning answer string} and {\it virtual private information}, to analyze the problem. For multi-user cases, we show that the optimal download rate can be obtained by solving an optimization problem over all partitions of the total number of messages and propose a novel algorithm based on dynamic programming to solve the optimization problem.\

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related concepts (30)

Virtual private network

A virtual private network (VPN) is a mechanism for creating a secure connection between a computing device and a computer network, or between two networks, using an insecure communication medium suc

Information asymmetry

In contract theory and economics, information asymmetry deals with the study of decisions in transactions where one party has more or better information than the other.
Information asymmetry creates

Kolmogorov complexity

In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in

Related publications (133)

Loading

Loading

Loading

Recent advances in data processing and communication systems have led to a continuous increase in the amount of data communicated over today’s networks. These large volumes of data pose new challenges on the current networking infrastructure that only offers a best effort mechanism for data delivery. The emergence of new distributed network architectures, such as peer-to-peer networks and wireless mesh networks, and the need for efficient data delivery mechanisms have motivated researchers to reconsider the way that information is communicated and processed in the networks. This has given rise to a new research field called network coding. The network coding paradigm departs from the traditional routing principle where information is simply relayed by the network nodes towards the destination, and introduces some intelligence in the network through coding at the intermediate nodes. This in-network data processing has been proved to substantially improve the performance of data delivery systems in terms of throughput and error resilience in networks with high path diversity. Motivated by the promising results in the network coding research, we focus in this thesis on the design of network coding algorithms for simultaneous transmission of multiple data sources in overlay networks. We investigate several problems that arise in the context of inter-session network coding, namely (i) decoding delay minimization in inter-session network coding, (ii) distributed rate allocation for inter-session network coding and (iii) correlation-aware decoding of incomplete network coded data. We start by proposing a novel framework for data delivery from multiple sources to multiple clients in an overlay wireline network, where intermediate nodes employ randomized inter-session network coding. We consider networks with high resource diversity, which creates network coding opportunities with possibly large gains in terms of throughput, delay and error robustness. However, the coding operations in the intermediate nodes must be carefully designed in order to enable efficient data delivery. We look at the problem from the decoding delay perspective and design solutions that lead to a small decoding delay at clients through proper coding and rate allocation. We cast the optimization problem as a rate allocation problem, which seeks for the coding operations that minimize the average decoding delay in the client population. We demonstrate the validity of our algorithm through simulations in representative network topologies. The results show that an effective combination of intra- and inter-session network coding based on randomized linear coding permits to reach small decoding delays and to better exploit the available network resources even in challenging network settings. Next, we design a distributed rate allocation algorithm where the users decide locally how many intra- and inter-session network coded packets should be requested from the parent nodes in order to get minimal decoding delay. The capability to take coding decisions locally with only a partial knowledge of the network statistics is of crucial importance for applications where users are organized in dynamic overlay networks. We propose a receiver-driven communication protocol that operates in two rounds. First, the users request and obtain information regarding the network conditions and packet availability in their local neighborhood. Then, every user independently optimizes the rate allocation among different possible intra- and inter-session packet combinations to be requested from its parents. We also introduce the novel concept of equivalent flows, which permits to efficiently estimate the expected number of packets that are necessary for decoding and hence to simplify the rate allocation process. Experimental results indicate that our algorithm is capable of eliminating the bottlenecks and reducing the decoding delay of users with limited resources. We further investigate the application of the proposed distributed rate allocation algorithm to the transmission of video sequences and validate the performance of our system using the NS-3 simulator. The simulation results show that the proposed rate allocation algorithm is successful in improving the quality of the delivered video compared to intra-session network coding based solutions. Finally, we investigate the problem of decoding the source information from an incomplete set of network coded data with the help of source priors in a finite algebraic field. The inability to form a complete decoding system can be often caused by transmission losses or timing constraints imposed by the application. In this case, exact reconstruction of the source data by conventional algorithms such as Gaussian elimination is not feasible; however, partial recovery of the source data may still be possible, which can be useful in applications where approximate reconstruction is informative. We use the statistical characteristics of the source data in order to perform approximate decoding. We first analyze the performance of a hypothetical maximum a posteriori decoder, which recovers the source data from an incomplete set of network coded data given the joint statistics of the sources. We derive an upper bound on the probability of erroneous source sequence decoding as a function of the system parameters. We then propose a constructive solution to the approximate decoding problem and design an iterative decoding algorithm based on message passing, which jointly considers the network coding and the correlation constraints. We illustrate the performance of our decoding algorithm through extensive simulations on synthetic and real data sets. The results demonstrate that, even by using a simple correlation model expressed as a correlation noise between pairs of sources, the original source data can be partially decoded in practice from an incomplete set of network coded symbols. Overall, this thesis addresses several important issues related to the design of efficient data delivery methods with inter-session network coding. Our novel framework for decoding delay minimization can impact the development of practical inter-session network coding algorithms that are appropriate for applications with low delay requirements. Our rate allocation algorithms are able to exploit the high resource diversity of modern networking systems and represent an effective alternative in the development of distributed communication systems. Finally, our algorithm for data recovery from incomplete network coded data using correlation priors can contribute significantly to the improvement of the delivered data quality and provide new insights towards the design of joint source and network coding algorithms.

Razvan Cristescu, Martin Vetterli

We consider the problem of correlated data gathering by a network with a sink node and a tree based communication structure, where the goal is to minimize the total trans- mission cost of transporting the information collected by the nodes, to the sink node. For source coding of correlated data, we consider a joint entropy based coding model with explicit communication where coding is simple and the transmission structure optimiza- tion is difficult. We first formulate the optimization problem definition in the general case and then we study further a network setting where the entropy conditioning at nodes does not depend on the amount of side information, but only on its availability. We prove that even in this simple case, the optimization problem is NP-hard. We propose some efficient, scalable, and distributed heuristic approximation algorithms for solving this problem and show by numerical simulations that the total transmission cost can be significantly improved over direct transmission or the shortest path tree. We also present an approximation algorithm that provides a tree transmission structure with total cost within a constant factor from the optimal.

2006The main goal in network information theory is to identify fundamental limits of communication over networks, and design solutions which perform close to such limits. After several decades of effort, many important problems still do not have a characterization of achievable performance in terms of a finite dimensional description. Given this discouraging state of affairs, a natural question to ask is whether there are systematic approaches to make progress on these open questions. Recently, there has been significant progress on several open questions by seeking a (provably) approximate characterization for these open questions. The main goal of approximation in network information theory is to obtain a universal approximation gap between the achievable and the optimal performance. This approach consists of four ingredients: simplify the model, obtain optimal solution for the simplified model, translate this optimal scheme and outer bounds back to the original model, and finally bound the gap between what can be achieved using the obtained technique and the outer bound. Using such an approach, recent progress has been made in several problems such as the Gaussian interference channel, Gaussian relay networks, etc. In this thesis, we demonstrate that this approach is not only successful in problems of transmission over noisy networks, but gives the first approximation for a network data compression problem. We use this methodology to (approximately) resolve problems that have been open for several decades. Not only do we give theoretical characterization, but we also develop new coding schemes that are required to satisfy this approximate optimality property. These ideas could give insights into efficient design of future network communication systems. This thesis is split into two main parts. The first part deals with the approximation in lossy network data compression. Here, a lossy data compression problem is approximated by a lossless counterpart problem, where all the bits in the binary expansion of the source above the required distortion have to be losslessly delivered to the destination. In particular, we study the multiple description (MD) problem, based on the multi-level diversity (MLD) coding problem. The symmetric version of the MLD problem is well-studied, and we can directly use it to approximate the symmetric MD problem. We formulate the asymmetric multi-level diversity problem, and solve it for three-description case. The optimal solution for this problem, which will be later used to approximate the asymmetric multiple description problem, is based on jointly compressing of independent sources. In both symmetric and asymmetric cases, we derive inner and outer bounds for the achievable rate region, which together with the gap analysis, provide an approximate solution for the problem. In particular, we resolve the symmetric Gaussian MD problem, which has been open for three decades, to within 1 bit. In the second part, we initiate a study of a Gaussian relay-interference network, in which relay (helper) nodes are to facilitate competing information flows over a wireless network. We focus on a two-stage relay-interference network where there are weak cross-links, causing the networks to behave like a chain of Z Gaussian channels. For these Gaussian ZZ and ZS networks, we establish an approximate characterization of the rate region. The outer bounds to the capacity region are established using genie-aided techniques that yield bounds sharper than the traditional cut-set outer bounds. For the inner bound of the ZZ network, we propose a new interference management scheme, termed interference neutralization, which is implemented using structured lattice codes. This technique allows for over-the-air interference removal, without the transmitters having complete access to the interfering signals. We use insights gained from an exact characterization of the corresponding linear deterministic version of the problem, in order to study the Gaussian network. We resolve the Gaussian relay-interference network to within 2 bits. The new interference management technique (interference neutralization) shows the use of structured lattice codes in the problem. We also consider communication from a source to a destination over a wireless network with the help of a set of authenticated relays, and presence of an adversarial jammer who wishes to disturb communication. We focus on a special diamond network, and show that use of interference suppression (nulling) is crucial to approach the capacity of the network. The exact capacity characterization for the deterministic network, along with an approximate characterization (to within 4 bits) for the Gaussian network is provided. The common theme that binds the diverse network communication problems in this thesis is that of approximate characterization, when exact resolutions are difficult. The approach of focusing on the deterministic/lossless problems underlying the noisy/lossy network communication problems has allowed us to develop new techniques to study these questions. These new techniques might be of independent interest in other network information theory problems.