Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture provides a general introduction to big data, covering best practices and guidelines. It explores the concept of data lakes, typical big data architecture, and the challenges of addressing big data. The instructor emphasizes the importance of ingesting, cleaning, and integrating data before analytics. The lecture delves into the CAP Theorem of Distributed Data Stores, the clash between batch and stream processing, and the technologies used to address big data challenges. It also covers Hadoop Distributed File Systems, MapReduce, and popular HDFS storage formats. Additionally, it introduces the upcoming topic of HIVE Hadoop Data Warehouse and discusses a graded assignment focusing on CO2 time series modeling and data visualization.