Lecture

Hadoop: Execution Models

Related lectures (32)

Hadoop Ecosystem: Architectural Choices & MapReduce Programming

Explores the Hadoop ecosystem's architecture and MapReduce programming model, emphasizing strengths and limitations.

Covers data science tools, Hadoop, Spark, data lake ecosystems, CAP theorem, batch vs. stream processing, HDFS, Hive, Parquet, ORC, and MapReduce architecture.

Data Wrangling with Hive: Managing Big Data Efficiently

Covers data wrangling techniques using Apache Hive for efficient big data management.

Big Data Ecosystems: Technologies and Challenges

Covers the fundamentals of big data ecosystems, focusing on technologies, challenges, and practical exercises with Hadoop's HDFS.

Big Data: Best Practices and Guidelines

Covers best practices and guidelines for big data, including data lakes, typical architecture, challenges, and technologies used to address them.

Optimizing Recursive Queries

Explores optimizing recursive queries in database systems using Datalog and semirings, discussing the challenges and solutions in data analytics.

Big Data Best Practices and Guidelines

Covers best practices and guidelines for big data, including data lakes, architecture, challenges, and technologies like Hadoop and Hive.

Spark Storage Layer

Explores the Spark ecosystem, Resilient Distributed Datasets, and the storage layer abstraction in Spark.

Data Wrangling with Hadoop

Covers data wrangling techniques using Hadoop, focusing on row versus column-oriented databases, popular storage formats, and HBase-Hive integration.

Scheduling Decisions: Data Locality and Multitenancy

Explores data locality in scheduling decisions for multi-tenant platforms and discusses Hadoop's architecture, execution engine optimizations, and fault tolerance strategies.

Big Data Challenges: Distributed Computing with Spark

Explores big data challenges, distributed computing with Spark, RDDs, hardware requirements, MapReduce, transformations, and Spark DataFrames.

Gossip Efficiency: Decentralized Systems

Explores gossip efficiency in decentralized systems, covering protocols, interaction needs, and bandwidth optimization, along with search algorithms and optimizations.

Programming Models: Overview and Examples

Explores programming models for big data processing, including Spark's RDDs and optimizations.

Advanced Spark Optimization

Delves into advanced Spark optimization techniques, emphasizing data partitioning, shuffle operations, and memory management.

Integrating Scalable Data Storage and Map Reduce Processing with Hadoop

Covers the integration of scalable data storage and map reduce processing using Hadoop, including HDFS, Hive, Parquet, ORC, Spark, and HBase.

Data-Parallel Programming: Vector & SIMD Processors

Explores data-parallel programming with vector processors and SIMD, and introduces MapReduce, Pregel, and TensorFlow.

Spark Ecosystem: Architectural Choices

Explores the Spark ecosystem's architectural choices, including RDDs and fault tolerance.

Code Optimization: Speeding-up Analyses

Explores techniques to speed up dataflow analyses and discusses the importance of node ordering and post-order traversal.

Spark Data Frames

Covers Spark Data Frames, distributed collections of data organized into named columns, and the benefits of using them over RDDs.

Advanced Spark Optimization Techniques: Managing Big Data

Discusses advanced Spark optimization techniques for managing big data efficiently, focusing on parallelization, shuffle operations, and memory management.