Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers topics related to scalability, persistence, and consistency in database systems and data-intensive applications. It discusses techniques such as partitioning, out-of-core architectures, embedded state examples, Apache Flink, Spark, Google Dataflow, and Millwheel. The lecture also explores concepts like handling failures, reconfiguring systems correctly, stream processing transactions, exactly-once processing, and distributed stream processing. It delves into action-level transactions, deterministic execution, epoch-level transactions, synchronous epochs commit, causal order in distributed streams, snapshots based on consistent cuts, and aligned snapshots in Flink. The instructor emphasizes the importance of state in addressing scalability, persistence, and consistency challenges, highlighting trade-offs in data movement, load balancing, local data access, elasticity, transaction granularity, and consistency.