**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Efficiently Maintaining Distributed Model-Based Views on Real-Time Data Streams

Karl Aberer, Petre Alexandru Arion, Ho Young Jeung, Sebastian Michel

2011

Rapport ou document de travail

2011

Rapport ou document de travail

Résumé

Minimizing communication cost is a fundamental problem in large-scale federated sensor networks. Existing solutions applicable for the problem are often ad-hoc for specific query types, or they are inefficient when query results contain large volumes of data to be transferred over the networks. Maintaining model-based views of data streams has been recently highlighted because it permits the data communication over networks to be efficient by transmitting parameter values for the models, instead of sending original data streams. This paper proposes a novel framework that employs the advantages of using model-based views for communication-efficient stream data processing over federated sensor networks, yet it significantly improves state-of-the-art approaches. The framework is generic and any time-parameterized models can be plugged, as well as accuracy guarantees for query results are ensured throughout the large-scale networks. In addition, we boost the performance of the framework by the coded model update that enables efficient model update from one node to another. It predetermines parameter values for the model, updates only identifiers of the parameter values, and compresses the identifiers by utilizing bitmaps. Moreover, we propose a novel correlation model, named coded inter-variable model, that integrates the efficiency of the coded model update into more precise predictions of correlated models. Empirical studies with real data demonstrate that our proposal achieves substantial amounts of communication reduction, outperforming a state-of-the art method.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (12)

Réseau de capteurs sans fil

Un réseau de capteurs sans fil est un réseau ad hoc d'un grand nombre de nœuds, qui sont des micro-capteurs capables de recueillir et de transmettre des données d'une manière autonome. La position de

Donnée

Une donnée est ce qui est connu et qui sert de point de départ à un raisonnement ayant pour objet la détermination d'une solution à un problème en relation avec cette donnée. Cela peut être une descr

Correlation coefficient

A correlation coefficient is a numerical measure of some type of correlation, meaning a statistical relationship between two variables. The variables may be two columns of a given data set of observa

Publications associées (37)

Chargement

Chargement

Chargement

Recent advances in data processing and communication systems have led to a continuous increase in the amount of data communicated over today’s networks. These large volumes of data pose new challenges on the current networking infrastructure that only offers a best effort mechanism for data delivery. The emergence of new distributed network architectures, such as peer-to-peer networks and wireless mesh networks, and the need for efficient data delivery mechanisms have motivated researchers to reconsider the way that information is communicated and processed in the networks. This has given rise to a new research field called network coding. The network coding paradigm departs from the traditional routing principle where information is simply relayed by the network nodes towards the destination, and introduces some intelligence in the network through coding at the intermediate nodes. This in-network data processing has been proved to substantially improve the performance of data delivery systems in terms of throughput and error resilience in networks with high path diversity. Motivated by the promising results in the network coding research, we focus in this thesis on the design of network coding algorithms for simultaneous transmission of multiple data sources in overlay networks. We investigate several problems that arise in the context of inter-session network coding, namely (i) decoding delay minimization in inter-session network coding, (ii) distributed rate allocation for inter-session network coding and (iii) correlation-aware decoding of incomplete network coded data. We start by proposing a novel framework for data delivery from multiple sources to multiple clients in an overlay wireline network, where intermediate nodes employ randomized inter-session network coding. We consider networks with high resource diversity, which creates network coding opportunities with possibly large gains in terms of throughput, delay and error robustness. However, the coding operations in the intermediate nodes must be carefully designed in order to enable efficient data delivery. We look at the problem from the decoding delay perspective and design solutions that lead to a small decoding delay at clients through proper coding and rate allocation. We cast the optimization problem as a rate allocation problem, which seeks for the coding operations that minimize the average decoding delay in the client population. We demonstrate the validity of our algorithm through simulations in representative network topologies. The results show that an effective combination of intra- and inter-session network coding based on randomized linear coding permits to reach small decoding delays and to better exploit the available network resources even in challenging network settings. Next, we design a distributed rate allocation algorithm where the users decide locally how many intra- and inter-session network coded packets should be requested from the parent nodes in order to get minimal decoding delay. The capability to take coding decisions locally with only a partial knowledge of the network statistics is of crucial importance for applications where users are organized in dynamic overlay networks. We propose a receiver-driven communication protocol that operates in two rounds. First, the users request and obtain information regarding the network conditions and packet availability in their local neighborhood. Then, every user independently optimizes the rate allocation among different possible intra- and inter-session packet combinations to be requested from its parents. We also introduce the novel concept of equivalent flows, which permits to efficiently estimate the expected number of packets that are necessary for decoding and hence to simplify the rate allocation process. Experimental results indicate that our algorithm is capable of eliminating the bottlenecks and reducing the decoding delay of users with limited resources. We further investigate the application of the proposed distributed rate allocation algorithm to the transmission of video sequences and validate the performance of our system using the NS-3 simulator. The simulation results show that the proposed rate allocation algorithm is successful in improving the quality of the delivered video compared to intra-session network coding based solutions. Finally, we investigate the problem of decoding the source information from an incomplete set of network coded data with the help of source priors in a finite algebraic field. The inability to form a complete decoding system can be often caused by transmission losses or timing constraints imposed by the application. In this case, exact reconstruction of the source data by conventional algorithms such as Gaussian elimination is not feasible; however, partial recovery of the source data may still be possible, which can be useful in applications where approximate reconstruction is informative. We use the statistical characteristics of the source data in order to perform approximate decoding. We first analyze the performance of a hypothetical maximum a posteriori decoder, which recovers the source data from an incomplete set of network coded data given the joint statistics of the sources. We derive an upper bound on the probability of erroneous source sequence decoding as a function of the system parameters. We then propose a constructive solution to the approximate decoding problem and design an iterative decoding algorithm based on message passing, which jointly considers the network coding and the correlation constraints. We illustrate the performance of our decoding algorithm through extensive simulations on synthetic and real data sets. The results demonstrate that, even by using a simple correlation model expressed as a correlation noise between pairs of sources, the original source data can be partially decoded in practice from an incomplete set of network coded symbols. Overall, this thesis addresses several important issues related to the design of efficient data delivery methods with inter-session network coding. Our novel framework for decoding delay minimization can impact the development of practical inter-session network coding algorithms that are appropriate for applications with low delay requirements. Our rate allocation algorithms are able to exploit the high resource diversity of modern networking systems and represent an effective alternative in the development of distributed communication systems. Finally, our algorithm for data recovery from incomplete network coded data using correlation priors can contribute significantly to the improvement of the delivered data quality and provide new insights towards the design of joint source and network coding algorithms.

Karl Aberer, Petre Alexandru Arion, Ho Young Jeung

Minimizing communication cost is a fundamental problem in large-scale federated sensor networks. Maintaining model-based views of data streams has been highlighted because it permits efficient data communication by transmitting parameter values of models, instead of original data streams. We propose a framework that employs the advantages of using model-based views for communication-efficient stream data processing over federated sensor networks, yet it significantly improves state-of-the-art approaches. The framework is generic and any time-parameterized models can be plugged, while accuracy guarantees for query results are ensured throughout the large-scale networks. In addition, we boost the performance of the framework by the coded model update that enables efficient model update from one node to another. It predetermines parameter values for the model, updates only identifiers of the parameter values, and compresses the identifiers by utilizing bitmaps. Moreover, we propose a correlation model, named coded inter-variable model, that merges the efficiency of the coded model update with that of correlation models. Empirical studies with real data demonstrate that our proposal achieves substantial amounts of communication reduction, outperforming state-of-the art methods.

Sensor networks measuring correlated data are considered, where the task is to gather data from the network nodes to a sink. A specific scenario is addressed, where data at nodes are lossy coded with high-resolution, and the information measured by the nodes has to be reconstructed at the sink within both certain total and individual distortion bounds. The first problem considered is to find the optimal transmission structure and the rate-distortion allocations at the various spatially located nodes, such as to minimize the total power consumption cost of the network, by assuming fixed nodes positions. The optimal transmission structure is the shortest path tree and the problems of rate and distortion allocation separate in the high-resolution case, namely, first the distortion allocation is found as a function of the transmission structure, and second, for a given distortion allocation, the rate allocation is computed. The second problem addressed is the case when the node positions can be chosen, by finding the optimal node placement for two different targets of interest, namely total power minimization and network lifetime maximization. Finally, a node placement solution that provides a tradeoff between the two metrics is proposed.

2006