Lecture

Big Data Challenges: Scaling to Massive Data

Related lectures (31)

Data Wrangling with Hive: Managing Big Data Efficiently

Covers data wrangling techniques using Apache Hive for efficient big data management.

Covers best practices and guidelines for big data, including data lakes, architecture, challenges, and technologies like Hadoop and Hive.

Scaling up: Spark and Big Data

Explores the challenges of big data processing and introduces Spark as a solution.

General Introduction to Big Data

Covers data science tools, Hadoop, Spark, data lake ecosystems, CAP theorem, batch vs. stream processing, HDFS, Hive, Parquet, ORC, and MapReduce architecture.

Introduction to Spark Runtime Architecture

Introduces Apache Spark, covering its architecture, RDDs, transformations, actions, fault tolerance, deployment options, and practical exercises in Jupyter notebooks.

Integrating Scalable Data Storage and Map Reduce Processing with Hadoop

Covers the integration of scalable data storage and map reduce processing using Hadoop, including HDFS, Hive, Parquet, ORC, Spark, and HBase.

Introduction to Spark Runtime Architecture

Covers the Spark runtime architecture, including RDDs, transformations, actions, and caching for performance optimization.

Data Analysis to AI and ML, Social Media

Explores the evolution from data analysis to AI and ML, emphasizing big data, machine learning, and social media interaction.

Big Data Challenges: Distributed Computing with Spark

Explores big data challenges, distributed computing with Spark, RDDs, hardware requirements, MapReduce, transformations, and Spark DataFrames.

Data Wrangling with Hadoop: Advanced Techniques

Covers advanced data wrangling techniques using Hadoop, focusing on Hive and HBase integration.

Introduction to Applied Data Analysis

Introduces the Applied Data Analysis course at EPFL, covering a broad range of data analysis topics and emphasizing continuous learning in data science.

Big Data Ecosystems: Technologies and Challenges

Covers the fundamentals of big data ecosystems, focusing on technologies, challenges, and practical exercises with Hadoop's HDFS.

Data Wrangling with Hadoop

Covers data wrangling techniques using Hadoop, focusing on row versus column-oriented databases, popular storage formats, and HBase-Hive integration.

Digital Transformation: Solutions and Data

Explores digital transformation opportunities, big data, analytics, and technology innovations in business and research.

Big Data: Best Practices and Guidelines

Covers best practices and guidelines for big data, including data lakes, typical architecture, challenges, and technologies used to address them.

Logistic Regression: Vegetation Prediction

Explores logistic regression for predicting vegetation proportions in the Amazon region through remote sensing data analysis.

Linear Regression and Logistic Regression

Covers linear and logistic regression for regression and classification tasks, focusing on loss functions and model training.

Advanced Spark Optimization Techniques: Managing Big Data

Discusses advanced Spark optimization techniques for managing big data efficiently, focusing on parallelization, shuffle operations, and memory management.

Data Wrangling Techniques: HBase and Hive Integration

Covers data wrangling techniques using HBase and Hive, focusing on integration and practical applications.

Scaling to Massive Data: Spark Fundamentals

Covers the fundamentals of scaling to massive data using Spark, focusing on RDDs, transformations, actions, Spark architecture, and Spark's machine learning toolkit.