Lecture

Hadoop: Execution Models

In course
DEMO: anim cupidatat cupidatat eu
Est et dolor esse aliqua ea. Ut velit aliquip sint adipisicing ad est officia nisi consectetur reprehenderit fugiat do irure. Commodo consectetur sint quis voluptate sint duis tempor consequat minim quis exercitation consequat magna quis. Quis sunt officia pariatur esse sint Lorem.
Login to see this section
Description

This lecture covers the execution models for distributed computing, focusing on Hadoop and MapReduce. It explains the concepts of primary and backup copies, interleaved declustering, and failure management in distributed systems. The instructor discusses the challenges of fault tolerance, data locality, and scheduling in Hadoop, as well as the architectural choices and optimizations in the Hadoop ecosystem. The lecture also delves into the limitations of MapReduce, the importance of data safety, fault tolerance mechanisms, and alternative distributed processing frameworks like Spark and Pregel.

This video is available exclusively on Mediaspace for a restricted audience. Please log in to MediaSpace to access it if you have the necessary permissions.

Watch on Mediaspace
Instructor
mollit quis laborum
Elit nostrud et esse id dolor reprehenderit pariatur ullamco velit. Ea occaecat commodo minim adipisicing. Consectetur enim id ea nostrud commodo velit ad sunt magna consequat culpa.
Login to see this section
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related lectures (42)
General Introduction to Big Data
Covers data science tools, Hadoop, Spark, data lake ecosystems, CAP theorem, batch vs. stream processing, HDFS, Hive, Parquet, ORC, and MapReduce architecture.
Big Data Best Practices and Guidelines
Covers best practices and guidelines for big data, including data lakes, architecture, challenges, and technologies like Hadoop and Hive.
Spark Data Frames
Covers Spark Data Frames, distributed collections of data organized into named columns, and the benefits of using them over RDDs.
Data Wrangling with Hadoop
Covers data wrangling techniques using Hadoop, focusing on row versus column-oriented databases, popular storage formats, and HBase-Hive integration.
Big Data Ecosystems: Technologies and Challenges
Covers the fundamentals of big data ecosystems, focusing on technologies, challenges, and practical exercises with Hadoop's HDFS.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.