Spark SQL: Schema, Optimization, and UDFs

In course

This course is intended for students who want to understand modern large-scale data analysis systems and database systems. It covers a wide range of topics and technologies, and will prepare students

Description

This lecture covers the second generation of execution models for distributed computing, focusing on Spark SQL. It introduces the Spark Unified Stack, Spark SQL programming interface, data frames, DataFrame operators, and optimization principles. The instructor explains the optimization process, including logical and physical optimization, and the use of Catalyst Rules. Additionally, it delves into user-defined functions (UDFs) and their optimization, highlighting the advantages of Spark SQL over traditional SQL.

This video is available exclusively on Mediaspace for a restricted audience. Please log in to MediaSpace to access it if you have the necessary permissions.

Watch on Mediaspace

Instructor

Anastasia Ailamaki

Official source