Lecture

General-Purpose Distributed Execution System

Description

This lecture discusses the design of a general-purpose distributed execution system, focusing on scaling applications with distributed execution becoming the norm. It covers the challenges of building distributed systems, the use of specialized frameworks like Spark and Flink, and the implementation of distributed futures for high-performance and fault-tolerant shuffle. The lecture explores the benefits of decentralized control logic, pass-by-reference strategies, and the use of futures for asynchronous RPCs. It also showcases the results of using Exoshuffle to improve ML training speed and accuracy, as well as the advancements in fault tolerance and ownership models for distributed futures.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.