Lecture

Spark Data Frames: Overview and Performance Analysis

Related lectures (30)
Introduction to Data Science
Introduces the basics of data science, covering decision trees, machine learning advancements, and deep reinforcement learning.
Spark Data Frames
Covers Spark Data Frames, distributed collections of data organized into named columns, and the benefits of using them over RDDs.
Logistic Regression: Vegetation Prediction
Explores logistic regression for predicting vegetation proportions in the Amazon region through remote sensing data analysis.
Data Wrangling with Hadoop
Covers data wrangling techniques using Hadoop, focusing on row versus column-oriented databases, popular storage formats, and HBase-Hive integration.
Decision Tree Classification
Covers decision tree classification using KNIME Analytics Platform for data preprocessing and model creation.
Logistic Regression: Fundamentals and Applications
Explores logistic regression fundamentals, including cost functions, regularization, and classification boundaries, with practical examples using scikit-learn.
Decision Trees: Classification
Explores decision trees for classification, entropy, information gain, one-hot encoding, hyperparameter optimization, and random forests.
Advanced Pandas Functions
Focuses on advanced pandas functions for data manipulation, exploration, and visualization with Python, emphasizing the importance of understanding and preparing data.
Data Science Visualization with Pandas
Covers data manipulation and exploration using Python with a focus on visualization techniques.
General Introduction to Big Data
Covers data science tools, Hadoop, Spark, data lake ecosystems, CAP theorem, batch vs. stream processing, HDFS, Hive, Parquet, ORC, and MapReduce architecture.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.