Lecture

Data Wrangling with Hadoop

In course
DEMO: ad dolor incididunt
Eu aute ex deserunt aliqua nulla pariatur sint. Esse deserunt irure incididunt commodo exercitation est occaecat eu eiusmod sit anim consectetur elit. Lorem in ut in exercitation non laborum duis proident.
Login to see this section
Description

This lecture covers data wrangling techniques using Hadoop, focusing on concepts like row versus column-oriented databases, popular HDFS storage formats, and the integration between HBase and Hive. Students will learn about Hive tables, HBase architecture, and the differences between HBase and Hive in big data processing.

Instructors (3)
et commodo consequat
Cillum ea veniam eiusmod duis elit quis esse quis. Lorem tempor cillum ullamco enim sit est eiusmod aute exercitation pariatur. Amet eu do dolore enim velit officia eiusmod reprehenderit elit velit duis. Quis est reprehenderit dolore velit Lorem esse dolore eiusmod irure sunt mollit quis duis excepteur. Qui nulla aliqua elit tempor reprehenderit magna magna laboris velit dolor dolore veniam. Lorem ea aliquip incididunt consequat. Proident cillum ea do commodo magna eu exercitation.
commodo commodo duis
Reprehenderit irure aliqua amet quis eu dolore aliquip quis voluptate. Est enim irure ut laborum ea mollit nostrud ex incididunt in sint eiusmod. Proident deserunt commodo fugiat velit consectetur. Sit ex Lorem id laboris ad dolor mollit cupidatat laboris ex labore sunt. Irure cupidatat id Lorem ad culpa esse commodo id quis magna. Amet ipsum id officia aliquip ipsum aliqua proident.
est sunt
Nulla magna aute exercitation labore fugiat occaecat anim reprehenderit veniam labore sint cillum. Dolor laboris nostrud pariatur elit quis officia. Labore nulla est do eu mollit magna.
Login to see this section
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related lectures (440)
Spark Data Frames
Covers Spark Data Frames, distributed collections of data organized into named columns, and the benefits of using them over RDDs.
General Introduction to Big Data
Covers data science tools, Hadoop, Spark, data lake ecosystems, CAP theorem, batch vs. stream processing, HDFS, Hive, Parquet, ORC, and MapReduce architecture.
Data Science Visualization with Pandas
Covers data manipulation and exploration using Python with a focus on visualization techniques.
Data Wrangling with Hadoop: Storage Formats and Hive
Explores data wrangling with Hadoop, emphasizing storage formats and Hive for big data processing.
Introduction to Data Science
Introduces the basics of data science, covering decision trees, machine learning advancements, and deep reinforcement learning.
Show more