Resource Management and Fault Tolerance

In course

This course is intended for students who want to understand modern large-scale data analysis systems and database systems. It covers a wide range of topics and technologies, and will prepare students

Description

This lecture covers the design choices of Big Data systems, focusing on storage layer, programming model, execution engine, resource management, and fault tolerance. It explains how systems like Yarn enable multiple frameworks to co-exist, decisions of varying granularities in resource management, architectural choices of Spark, and the importance of fault tolerance in the face of hardware/software failures. The lecture also discusses data safety, job recovery in Spark, and the impact of failures on performance.

This video is available exclusively on Mediaspace for a restricted audience. Please log in to MediaSpace to access it if you have the necessary permissions.

Watch on Mediaspace

Instructor

Anastasia Ailamaki

Official source