Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers the importance of data locality in scheduling decisions, focusing on multi-tenant platforms. It discusses the architectural choices of Hadoop, execution engine optimizations, and programming model optimizations. The lecture also explores beyond MapReduce options for distributed processing over Big Data, fault tolerance requirements, data safety strategies, and job recovery techniques.