With technological advances, the sources of available information have become more and more diverse. Recently, a new source of information has gained growing importance: sensor data. Sensors are devices sensing their environment in various ways and reporting in general a numeric result. A sensor continuously reports values, thus the flow of information is also continuous, like a stream. As the field has developed, the usage paradigm has shifted from stand-alone sensors to interconnected sensors, or sensor networks. Sensors became more complex, generating larger quantities of data and having wireless communication modules for transmitting their data. Initially, data from sensor networks was first stored, and then processed. Thus, classical database technologies could be used. However, the focus has soon shifted towards reacting to sensor data in real time. A user query reacting in real time to a stream of data is called a continuous query, and to answer such a query requires that it is continuously processed, as new values appear in the sensor stream. As sensor networks and sensor based applications become more popular, users identified the need to query sensor data pertaining to different sensor networks. This setting, of interconnected sensor networks, consists of more powerful computational devices, connected with a wired communication, which can process and relay sensor data. Users can launch queries at any node to query sensor events coming from any part of the interconnected network. In this setting, the number of data sources (sensors) is orders of magnitude smaller than the number of user queries, which themselves are orders of magnitude smaller than the full content of the (sensor) data streams, and the communication becomes by far the greatest communication bottleneck. In this thesis, we present our research for reducing communication cost generated by applications accessing large scale interconnected sensor networks. Our first contribution is a probabilistic algorithm for detecting and exploiting subsumption of queries over correlated data sources. This technique reduces the communication traffic generated by query forwarding in an interconnected sensor network, by filtering out queries subsumed by a set of existing queries. In addition, this reduces the number of results that need to be transmitted. We propose an efficient forwarding algorithm of the elements of the result sets, by employing a publish/subscribe data dissemination. To support the general setting of distributed data sources in an interconnected sensor network, we propose a Filter-Split-Forward approach that adapts set subsumption to the case of join queries over distributed data sources. We base our approach on the concept of filter-split-forward phases for efficient query filtering and placement inside the network, and an efficient, publish/subscribe forwarding of matching events. We also propose distributed adaptations of state of the art solutions for continuous query
Andreas Mortensen, David Hernandez Escobar, Léa Deillon, Alejandra Inés Slagter, Eva Luisa Vogt, Jonathan Aristya Setyadji
Andrea Rinaldo, Gianluca Botter