Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.
This lecture discusses the evolution of execution models for distributed computing, focusing on the spark ecosystem and its architectural choices, the spark sequel interface, and the problems with skew in the spark ecosystem. It also addresses the limitations of MapReduce, such as extensive IO requirements, a limited programming model, and suboptimal implementation according to database experts.