Lecture

General-Purpose Distributed Execution System

Description

This lecture discusses the design of a general-purpose distributed execution system, focusing on scaling applications with distributed execution becoming the norm. It covers the challenges of building distributed systems, the use of specialized frameworks like Spark and Flink, and the implementation of distributed futures for high-performance and fault-tolerant shuffle. The lecture explores the benefits of decentralized control logic, pass-by-reference strategies, and the use of futures for asynchronous RPCs. It also showcases the results of using Exoshuffle to improve ML training speed and accuracy, as well as the advancements in fault tolerance and ownership models for distributed futures.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.