Concept

Apache Hadoop

Related lectures (29)

Explores challenges in minimizing job completion time in distributed computing, focusing on data skew impact and efficient processing.

Big Data Challenges: Scaling to Massive Data

Explores challenges of handling massive data in the era of big data, discussing solutions like MapReduce and Spark.

Fast Data Formats

Compares fast data formats and provides tips for efficient data processing.

Advanced Spark Optimization Techniques: Managing Big Data

Discusses advanced Spark optimization techniques for managing big data efficiently, focusing on parallelization, shuffle operations, and memory management.

Data Wrangling Techniques: HBase and Hive Integration

Covers data wrangling techniques using HBase and Hive, focusing on integration and practical applications.

Collaborative Data Science

Covers collaborative data science tools, big data concepts, Spark, and data stream processing, with tips for the final project.

Advanced Spark Optimization

Delves into advanced Spark optimization techniques, emphasizing data partitioning, shuffle operations, and memory management.

Introduction to Data Stream Processing

Covers the fundamentals of data stream processing, including tools like Apache Storm and Kafka, key concepts like event time and window operations, and the challenges of stream processing.

Indexing for Information Retrieval

Explores indexing techniques, inverted files, map-reduce models, and trie usage for efficient information retrieval.