Lecture

Introduction to Data Stream Processing

Related lectures (32)

Covers data science tools, Hadoop, Spark, data lake ecosystems, CAP theorem, batch vs. stream processing, HDFS, Hive, Parquet, ORC, and MapReduce architecture.

Introduction to Data Stream Processing: Concepts and Applications

Covers data stream processing concepts, focusing on Apache Kafka and Spark Streaming integration, event time management, and project implementation guidelines.

Introduction to Data Stream Processing: Concepts and Applications

Covers the principles of data stream processing and its applications in real-time data analysis.

Introduction to Data Stream Processing

Covers the fundamentals of data stream processing, including tools like Apache Storm and Kafka, key concepts like event time and window operations, and the challenges of stream processing.

Analytics on Data at Rest and Data in Motion

Explores combining data at rest with data in motion, emphasizing the Lambda architecture complexities and quality assessment of streams and batches.

Data Stream Processing: Apache Kafka and Spark

Covers data stream processing with Apache Kafka and Spark, including event time vs processing time, stream processing operations, and stream-stream joins.

Advanced Data Stream Processing Concepts

Explores event time vs. processing time, stream processing operations, stream-stream joins, and handling late/out-of-order data in data stream processing.

Big Data Best Practices and Guidelines

Covers best practices and guidelines for big data, including data lakes, architecture, challenges, and technologies like Hadoop and Hive.

Advanced Spark Optimization Techniques: Managing Big Data

Discusses advanced Spark optimization techniques for managing big data efficiently, focusing on parallelization, shuffle operations, and memory management.

Introduction to Data Stream Processing

Covers the fundamentals of data stream processing, including real-time insights, industry applications, and practical exercises on Kafka and Spark Streaming.

Introduction to Spark Runtime Architecture

Covers the Spark runtime architecture, including RDDs, transformations, actions, and caching for performance optimization.

Data Wrangling with Hive: Managing Big Data Efficiently

Covers data wrangling techniques using Apache Hive for efficient big data management.

Data Wrangling with Hadoop: Storage Formats and Hive

Explores data wrangling with Hadoop, emphasizing storage formats and Hive for big data processing.

Introduction to Applied Data Analysis

Introduces the Applied Data Analysis course at EPFL, covering a broad range of data analysis topics and emphasizing continuous learning in data science.

Advanced Data Stream Processing Concepts

Explores advanced data stream processing concepts, including Kafka, Spark stream, joins, and route planning models.

Big Data: Best Practices and Guidelines

Covers best practices and guidelines for big data, including data lakes, typical architecture, challenges, and technologies used to address them.

Introduction to Data Science

Introduces the basics of data science, covering decision trees, machine learning advancements, and deep reinforcement learning.

Big Data Challenges: Scaling to Massive Data

Explores challenges of handling massive data in the era of big data, discussing solutions like MapReduce and Spark.

Stream Processing and Fault Tolerance

Explores stream processing, fault tolerance, DStreams, and sliding window operations in big data analytics.

Data Science Essentials

Covers the essentials of data science, including data handling, visualization, and analysis, emphasizing practical skills and active engagement.