Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers the execution models for distributed computing in data-intensive applications and systems, introducing the concept of data frames as a space-efficient and computationally-efficient data representation with an extensible SQL-like language. It explains the data model supporting nested and complex types, the key notion of data frames, and their operators for relational operations. The lecture also discusses optimization principles, advantages of the model over traditional SQL, and concludes with the importance of distribution for scalability and the significance of the programming model for expressiveness and performance.