Lecture

Data Wrangling with Hadoop: Advanced Techniques

In course
DEMO: ea cupidatat sunt
Aliqua reprehenderit sunt in fugiat ullamco id occaecat culpa non eiusmod consequat exercitation voluptate. Excepteur culpa in exercitation officia duis do dolore. Sit minim ex enim proident sit et qui irure. Sit ex consectetur nostrud et et nostrud quis. Exercitation ipsum velit exercitation culpa do enim eiusmod adipisicing consequat duis fugiat ea elit sunt. Laboris ad duis ullamco proident ad duis quis qui occaecat veniam. Consequat in amet proident proident ipsum eu laborum aliquip ut est id nostrud.
Login to see this section
Description

This lecture focuses on advanced data wrangling techniques using Hadoop, specifically through the integration of scalable data storage and processing with tools like Hive and HBase. The instructor discusses the importance of data formats such as Parquet and ORC, and how they enhance data processing efficiency. The lecture also covers the use of HiveQL for querying data and the implementation of user-defined functions (UDFs) to handle geospatial and JSON data. Students are guided through practical exercises that involve creating and managing Hive tables, loading data, and performing complex queries. The session emphasizes the Extract, Transform, Load (ETL) process, showcasing how to connect to Hive, create databases, and optimize data storage. Additionally, the lecture highlights the significance of partitioning data in Hive to improve query performance. By the end of the session, students gain a comprehensive understanding of how to leverage Hadoop's capabilities for effective data wrangling in large-scale data environments.

Instructors (3)
officia sint
Id est sit voluptate nostrud fugiat dolor anim. Sit mollit officia eu eu do aute velit. Ullamco qui eu proident aliquip culpa aliqua reprehenderit sunt. Sunt reprehenderit aliquip dolor pariatur. Laborum cillum commodo ea ad labore ad laborum nulla Lorem id cillum exercitation.
sint occaecat quis amet
Culpa mollit eiusmod consequat aute duis occaecat laborum dolor consectetur sint. Ullamco culpa officia fugiat nisi commodo deserunt reprehenderit ut labore fugiat deserunt aliquip in sint. Consectetur ad adipisicing reprehenderit commodo culpa ex nostrud aliqua. Enim elit reprehenderit in reprehenderit enim commodo veniam aliqua eiusmod velit non occaecat et dolor.
ipsum veniam magna exercitation
Laboris nulla eu aliqua id aute pariatur duis cupidatat. Cupidatat non voluptate cillum proident est cupidatat Lorem esse ipsum aliquip. Dolore do ea exercitation esse enim esse enim eiusmod anim elit id. Ea amet labore culpa adipisicing aute occaecat ipsum irure do sit ex deserunt amet. Minim excepteur magna irure enim labore aute anim deserunt eu aliquip adipisicing non aliquip est.
Login to see this section
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related lectures (32)
Data Wrangling with Hive: Managing Big Data Efficiently
Covers data wrangling techniques using Apache Hive for efficient big data management.
Advanced Pandas Functions
Focuses on advanced pandas functions for data manipulation, exploration, and visualization with Python, emphasizing the importance of understanding and preparing data.
General Introduction to Big Data
Covers data science tools, Hadoop, Spark, data lake ecosystems, CAP theorem, batch vs. stream processing, HDFS, Hive, Parquet, ORC, and MapReduce architecture.
Data Wrangling with Hadoop
Covers data wrangling techniques using Hadoop, focusing on row versus column-oriented databases, popular storage formats, and HBase-Hive integration.
Spark Data Frames
Covers Spark Data Frames, distributed collections of data organized into named columns, and the benefits of using them over RDDs.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.