Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers advanced Spark optimization techniques, focusing on data partitioning, shuffle operations, memory management, and Spark architecture. Topics include RDD manipulation, Spark units of work, memory optimization, and partitioning strategies. The instructor provides insights on minimizing shuffling, optimizing memory usage, and improving data processing efficiency.