Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
GSN (Global Sensor Networks) is capable of managing configurable virtual sensors through a wide range of wrappers, and is able to manage one-shot and continuous queries, even in a distributed environment with several GSN instances. However, each GSN instance runs on a single machine, and uses a relational-based data storage underneath. While in most medium-size sensor deployments this is just enough, when it comes to process very large numbers of sensor observations, and at very high incoming rates, scalability can become a problem at various stages. The project aims at integrating Spark Streaming, an extension of the core Spark API, with GSN to boost query processing of streams in a multi-node environment and achieve better scalability. We show the feasibility of our approach and demonstrate its scalability through two applications: linear segmentation and anomaly detection: discovering trend of weather data and identifying occasions when live temperature data is delivering unreasonable values.
Michael Lehning, Wolf Hendrik Huwald, Jérôme François Sylvain Dujardin, Franziska Gerber, Fanny Kristianti
Hussein Fadl Hassan Hassan Osman
Alessandro Cicoira, Matthias Meyer, Hugo Raetzo