Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers the concepts of stream processing and fault tolerance in big data analytics. It discusses the measurement of time in data streams, efficient stream management techniques, scaling-out platforms like Spark Streaming and Apache Flink, fault tolerance strategies such as replication and upstream backup, and the use of DStreams for discretized stream processing. The instructor explains fault tolerance techniques for stream processing systems, including state partitioning and immutable tasks. Examples of streaming word count and sliding window operations are provided, showcasing the combination of batch and streaming computations. The lecture concludes with a vision of unifying batch and stream processing models in a single stack.
This video is available exclusively on Mediaspace for a restricted audience. Please log in to MediaSpace to access it if you have the necessary permissions.
Watch on Mediaspace