**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Network Correlated Data Gathering with Explicit Communication: NP-Completeness and Algorithms

Abstract

We consider the problem of correlated data gathering by a network with a sink node and a tree based communication structure, where the goal is to minimize the total trans- mission cost of transporting the information collected by the nodes, to the sink node. For source coding of correlated data, we consider a joint entropy based coding model with explicit communication where coding is simple and the transmission structure optimiza- tion is difficult. We first formulate the optimization problem definition in the general case and then we study further a network setting where the entropy conditioning at nodes does not depend on the amount of side information, but only on its availability. We prove that even in this simple case, the optimization problem is NP-hard. We propose some efficient, scalable, and distributed heuristic approximation algorithms for solving this problem and show by numerical simulations that the total transmission cost can be significantly improved over direct transmission or the shortest path tree. We also present an approximation algorithm that provides a tree transmission structure with total cost within a constant factor from the optimal.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related concepts (19)

Data

In common usage and statistics, data (USˈdætə; UKˈdeɪtə) is a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic uni

Algorithm

In mathematics and computer science, an algorithm (ˈælɡərɪðəm) is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algo

NP-completeness

In computational complexity theory, a problem is NP-complete when:
# It is a decision problem, meaning that for any input to the problem, the output is either "yes" or "no".

# When the answer is "ye

Related publications (123)

Loading

Loading

Loading

This thesis is concerned with problems in decentralized communication in large networks. Namely, we address the problems of joint rate allocation and transmission of data sources measured at nodes, and of controlling the multiple access of sources to a shared medium. In our study, we consider in particular the important case of a sensor network measuring correlated data. In the first part of this thesis, we consider the problem of correlated data gathering by a network with a sink node and a tree communication structure, where the goal is to minimize the total transmission cost of transporting the information collected by the nodes, to the sink node. Two coding strategies are analyzed: a Slepian-Wolf model where optimal coding is complex and transmission optimization is simple, and a joint entropy coding model with explicit communication where coding is simple and transmission optimization is difficult. This problem requires a joint optimization of the rate allocation at the nodes and of the transmission structure. For the Slepian-Wolf setting, we derive a closed form solution and an efficient distributed approximation algorithm with a good performance. We generalize our results to the case of multiple sinks. For the explicit communication case, we prove that building an optimal data gathering tree is NP-complete and we propose various distributed approximation algorithms. We compare asymptotically, for dense networks, the total costs associated with Slepian-Wolf coding and explicit communication, by finding their corresponding scaling laws and analyzing the ratio of their respective costs. We argue that, for large networks and under certain conditions on the correlation structure, "intelligent", but more complex Slepian-Wolf coding provides unbounded gains over the widely used straightforward approach of opportunistic aggregation and compression by explicit communication. In the second part of this thesis, we consider a queuing problem in which the service rate of a queue is a function of a partially observed Markov chain, and in which the arrivals are controlled based on those partial observations so as to keep the system in a desirable mildly unstable regime. The optimal controller for this problem satisfies a separation property: we first compute a probability measure on the state space of the chain, namely the information state, then use this measure as the new state based on which to make control decisions. We give a formal description of the system considered and of its dynamics, we formalize and solve an optimal control problem, and we show numerical simulations to illustrate with concrete examples properties of the optimal control law. We show how the ergodic behavior of our queuing model is characterized by an invariant measure over all possible information states, and we construct that measure. Our results may be applied for designing efficient and stable algorithms for medium access control in multiple accessed systems, in particular for sensor networks.

Recent advances in data processing and communication systems have led to a continuous increase in the amount of data communicated over today’s networks. These large volumes of data pose new challenges on the current networking infrastructure that only offers a best effort mechanism for data delivery. The emergence of new distributed network architectures, such as peer-to-peer networks and wireless mesh networks, and the need for efficient data delivery mechanisms have motivated researchers to reconsider the way that information is communicated and processed in the networks. This has given rise to a new research field called network coding. The network coding paradigm departs from the traditional routing principle where information is simply relayed by the network nodes towards the destination, and introduces some intelligence in the network through coding at the intermediate nodes. This in-network data processing has been proved to substantially improve the performance of data delivery systems in terms of throughput and error resilience in networks with high path diversity. Motivated by the promising results in the network coding research, we focus in this thesis on the design of network coding algorithms for simultaneous transmission of multiple data sources in overlay networks. We investigate several problems that arise in the context of inter-session network coding, namely (i) decoding delay minimization in inter-session network coding, (ii) distributed rate allocation for inter-session network coding and (iii) correlation-aware decoding of incomplete network coded data. We start by proposing a novel framework for data delivery from multiple sources to multiple clients in an overlay wireline network, where intermediate nodes employ randomized inter-session network coding. We consider networks with high resource diversity, which creates network coding opportunities with possibly large gains in terms of throughput, delay and error robustness. However, the coding operations in the intermediate nodes must be carefully designed in order to enable efficient data delivery. We look at the problem from the decoding delay perspective and design solutions that lead to a small decoding delay at clients through proper coding and rate allocation. We cast the optimization problem as a rate allocation problem, which seeks for the coding operations that minimize the average decoding delay in the client population. We demonstrate the validity of our algorithm through simulations in representative network topologies. The results show that an effective combination of intra- and inter-session network coding based on randomized linear coding permits to reach small decoding delays and to better exploit the available network resources even in challenging network settings. Next, we design a distributed rate allocation algorithm where the users decide locally how many intra- and inter-session network coded packets should be requested from the parent nodes in order to get minimal decoding delay. The capability to take coding decisions locally with only a partial knowledge of the network statistics is of crucial importance for applications where users are organized in dynamic overlay networks. We propose a receiver-driven communication protocol that operates in two rounds. First, the users request and obtain information regarding the network conditions and packet availability in their local neighborhood. Then, every user independently optimizes the rate allocation among different possible intra- and inter-session packet combinations to be requested from its parents. We also introduce the novel concept of equivalent flows, which permits to efficiently estimate the expected number of packets that are necessary for decoding and hence to simplify the rate allocation process. Experimental results indicate that our algorithm is capable of eliminating the bottlenecks and reducing the decoding delay of users with limited resources. We further investigate the application of the proposed distributed rate allocation algorithm to the transmission of video sequences and validate the performance of our system using the NS-3 simulator. The simulation results show that the proposed rate allocation algorithm is successful in improving the quality of the delivered video compared to intra-session network coding based solutions. Finally, we investigate the problem of decoding the source information from an incomplete set of network coded data with the help of source priors in a finite algebraic field. The inability to form a complete decoding system can be often caused by transmission losses or timing constraints imposed by the application. In this case, exact reconstruction of the source data by conventional algorithms such as Gaussian elimination is not feasible; however, partial recovery of the source data may still be possible, which can be useful in applications where approximate reconstruction is informative. We use the statistical characteristics of the source data in order to perform approximate decoding. We first analyze the performance of a hypothetical maximum a posteriori decoder, which recovers the source data from an incomplete set of network coded data given the joint statistics of the sources. We derive an upper bound on the probability of erroneous source sequence decoding as a function of the system parameters. We then propose a constructive solution to the approximate decoding problem and design an iterative decoding algorithm based on message passing, which jointly considers the network coding and the correlation constraints. We illustrate the performance of our decoding algorithm through extensive simulations on synthetic and real data sets. The results demonstrate that, even by using a simple correlation model expressed as a correlation noise between pairs of sources, the original source data can be partially decoded in practice from an incomplete set of network coded symbols. Overall, this thesis addresses several important issues related to the design of efficient data delivery methods with inter-session network coding. Our novel framework for decoding delay minimization can impact the development of practical inter-session network coding algorithms that are appropriate for applications with low delay requirements. Our rate allocation algorithms are able to exploit the high resource diversity of modern networking systems and represent an effective alternative in the development of distributed communication systems. Finally, our algorithm for data recovery from incomplete network coded data using correlation priors can contribute significantly to the improvement of the delivered data quality and provide new insights towards the design of joint source and network coding algorithms.

,

We consider the problem of correlated data gathering by a network with a sink node and a tree communication structure, where the goal is to minimize the total transmission cost of transporting the information collected by the nodes, to the sink node. Two coding strategies are analyzed: a Slepian-Wolf model where optimal coding is complex and transmission optimization is simple, and a joint entropy coding model with explicit communication where coding is simple and transmission optimization is difficult. This problem requires a joint optimization of the rate allocation at the nodes and of the transmission structure. For the Slepian-Wolf setting, we derive a closed form solution and an efficient distributed approximation algorithm with a good performance. For the explicit communication case, we prove that building an optimal data gathering tree is NP-complete and we propose various distributed approximation algorithms.

2004