Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers the evolution of stream processing systems from centralized in-memory to distributed dataflow systems like MapReduce, Spark Streaming, and Flink. It explains the concept of state in stream processing, including windows, aggregates, and user-defined variables. The lecture also discusses state management issues such as scalability, persistence, and consistency, and explores different approaches to handling state, including using synopses, user-defined and system-managed state. Examples of state manipulation in stream processing systems are provided, highlighting the importance of state in expressing operators and the trade-offs between system-managed and user-defined state.