Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers advanced concepts in data stream processing, focusing on topics such as stream processing with Apache Kafka, Spark stream, data stream windows in Spark, and end-to-end Kafka-Spark analytics pipelines. It also delves into operations like joins, stream-stream joins, and handling late/out of order data. The instructor provides insights on event time vs processing time, watermarking in Spark Streaming, and practical exercises with Spark Streaming. Students are guided on how to model public transport infrastructure for route planning, build predictive models, implement robust algorithms, and validate results. The lecture emphasizes the importance of teamwork, reproducibility, and effective communication through video presentations.