Lecture

MapReduce: Execution Models for Distributed Computing

Related lectures (31)

Covers data science tools, Hadoop, Spark, data lake ecosystems, CAP theorem, batch vs. stream processing, HDFS, Hive, Parquet, ORC, and MapReduce architecture.

Data-Parallel Programming: Vector & SIMD Processors

Explores data-parallel programming with vector processors and SIMD, and introduces MapReduce, Pregel, and TensorFlow.

Advanced Spark Optimization Techniques: Managing Big Data

Discusses advanced Spark optimization techniques for managing big data efficiently, focusing on parallelization, shuffle operations, and memory management.

Data Wrangling with Hive: Managing Big Data Efficiently

Covers data wrangling techniques using Apache Hive for efficient big data management.

Big Data Best Practices and Guidelines

Covers best practices and guidelines for big data, including data lakes, architecture, challenges, and technologies like Hadoop and Hive.

Big Data: Best Practices and Guidelines

Covers best practices and guidelines for big data, including data lakes, typical architecture, challenges, and technologies used to address them.

Parallel Programming I

Covers the basics of parallel programming, including concurrency, forms of parallelism, synchronization, and programming models like PThreads and OpenMP.

Big Data Ecosystems: Technologies and Challenges

Covers the fundamentals of big data ecosystems, focusing on technologies, challenges, and practical exercises with Hadoop's HDFS.

GPU Introduction: Performance and Programming

Explores the motivation and advantages of using GPUs for computation, focusing on their performance and programming through CUDA.

Parallel Computing: Principles and OpenMP

Covers the principles of parallel computing and introduces OpenMP for creating concurrent code from serial code.

Big Data Challenges: Scaling to Massive Data

Explores challenges of handling massive data in the era of big data, discussing solutions like MapReduce and Spark.

Big Data Challenges: Distributed Computing with Spark

Explores big data challenges, distributed computing with Spark, RDDs, hardware requirements, MapReduce, transformations, and Spark DataFrames.

Technological Evolution: CPUs & Storage

Explores the evolution of CPUs and storage technologies, RAM bit-flipping, Big Data applications, and technical challenges.

Hadoop Ecosystem: Architectural Choices & MapReduce Programming

Explores the Hadoop ecosystem's architecture and MapReduce programming model, emphasizing strengths and limitations.

Data Wrangling with Hadoop

Covers data wrangling techniques using Hadoop, focusing on row versus column-oriented databases, popular storage formats, and HBase-Hive integration.

Introduction to LabVIEW Programming

Introduces LabVIEW programming, covering memory management, data types, and parallel programming concepts, with hands-on demonstrations.

Introduction to Multiprocessor Architecture

Introduces the fundamentals of multiprocessor architecture, covering post-Moore servers, sustainable datacenters, parallel programming, and GPU utilization.

Scaling up: Spark and Big Data

Explores the challenges of big data processing and introduces Spark as a solution.

Parallel Scan Left

Introduces parallel scan left in Scala, covering its properties, sequential solutions, and efficient parallel computation techniques.

Principles of Parallel Computing: OpenMP

Explores the principles of parallel computing, focusing on OpenMP as a tool for creating concurrent code from serial code.