Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture discusses the design of a general-purpose distributed execution system, focusing on scaling applications with distributed execution becoming the norm. It covers the challenges of building distributed systems, the use of specialized frameworks like Spark and Flink, and the implementation of distributed futures for high-performance and fault-tolerant shuffle. The lecture explores the benefits of decentralized control logic, pass-by-reference strategies, and the use of futures for asynchronous RPCs. It also showcases the results of using Exoshuffle to improve ML training speed and accuracy, as well as the advancements in fault tolerance and ownership models for distributed futures.