Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Due to the rising demand for large-scale data processing, there is a growing interest in both batch processing, where large volumes of data are processed offline, and stream processing, where large quantities of streaming data are processed online. The dichotomy between these vastly different computing paradigms has led to the development of substantially different methodologies and systems. As there is an increasing number of applications requiring stream and batch processing, there is a need to bridge this gap and offer support for both paradigms. We introduce a new direction for the unification of stream and batch processing, which, contrary to other proposed approaches, uses a stream processing platform as its foundation and supports batch processing on top. Our proof-of-concept implementation of such a middleware layer, called Cyclone, offers the widely popular MapReduce programming model and translates MapReduce jobs for execution on the underlying streaming platform. Cyclone not only achieves a tight integration of batch and stream processing, our evaluation further shows significant performance gains, in particular for sequential and iterative jobs, which naturally arise in many applications.
Michael Lehning, Irina Gorodetskaya, Katherine Colby Leonard, Etienne Gabriel Henri Vignon