Lecture

Distributed Computing Execution Models

Description

This lecture discusses the challenges of minimizing job completion time in distributed computing, focusing on data skew issues and their impact on performance. It explores the implications of skewed data distribution on reducers, the limitations of standard approaches, and the optimization goals to enhance efficiency. The presentation covers execution models like MapReduce and Spark, emphasizing the importance of parallelism and efficient processing. Various algorithms for theta-joins are examined, including the 1-Bucket-Theta algorithm, highlighting the benefits of randomization in reducing output skew. The lecture concludes by addressing the remaining challenges in achieving optimal join computation over distributed data.

This video is available exclusively on Mediaspace for a restricted audience. Please log in to MediaSpace to access it if you have the necessary permissions.

Watch on Mediaspace
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.