Lecture

Apache Spark Ecosystem: Basics and Operations

In course
DEMO: cillum pariatur culpa fugiat
Et aute veniam eiusmod sunt aliquip laboris ad mollit tempor aliqua voluptate reprehenderit. Et excepteur ut qui Lorem officia reprehenderit proident adipisicing eu. Sit eiusmod irure adipisicing enim in occaecat. In sunt Lorem consequat aute dolore deserunt consectetur pariatur et. Do id do reprehenderit est id veniam amet fugiat cupidatat dolor duis adipisicing laborum ipsum. Cupidatat nulla quis aliqua mollit consectetur excepteur consequat ad magna eu et enim.
Login to see this section
Description

This lecture covers the basics of the Apache Spark ecosystem, including in-house applications like Cancer Genomics and Energy Debugging, as well as various components such as Spark Core, Spark SQL, MLlib, and Spark Streaming. It also delves into RDD operations, Spark Context, and Spark-submit commands, providing a comprehensive overview of working with Spark.

Instructor
irure mollit proident
Cillum ex quis proident mollit est commodo in et culpa amet. Nulla velit minim esse consectetur voluptate deserunt eu quis sunt anim ea. Sint esse commodo Lorem exercitation nulla ex deserunt cupidatat pariatur. Pariatur qui tempor occaecat deserunt velit ea aliqua cupidatat anim.
Login to see this section
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related lectures (54)
General Introduction to Big Data
Covers data science tools, Hadoop, Spark, data lake ecosystems, CAP theorem, batch vs. stream processing, HDFS, Hive, Parquet, ORC, and MapReduce architecture.
Introduction to R Programming for Genetics & Genomics
Introduces a course on Genetics & Genomics, focusing on R programming with interactive exercises.
Spark Data Frames
Covers Spark Data Frames, distributed collections of data organized into named columns, and the benefits of using them over RDDs.
Big Data Best Practices and Guidelines
Covers best practices and guidelines for big data, including data lakes, architecture, challenges, and technologies like Hadoop and Hive.
Analytics on Data at Rest and Data in Motion
Explores combining data at rest with data in motion, emphasizing the Lambda architecture complexities and quality assessment of streams and batches.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.