Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers advanced Spark optimizations and partitioning techniques, including dealing with data skew, imbalance, and using persistency. It also discusses an optimization checklist, best practices, and the use of persistence levels. Additionally, it explores Spark MLlib for machine learning tasks, such as classification, logistic regression, clustering, and provides useful references for further learning.