Lecture

General Introduction to Big Data

Related lectures (32)

Data Wrangling with Hive: Managing Big Data Efficiently

Covers data wrangling techniques using Apache Hive for efficient big data management.

Covers best practices and guidelines for big data, including data lakes, architecture, challenges, and technologies like Hadoop and Hive.

Data Wrangling with Hadoop

Covers data wrangling techniques using Hadoop, focusing on row versus column-oriented databases, popular storage formats, and HBase-Hive integration.

Introduction to Data Science

Introduces the basics of data science, covering decision trees, machine learning advancements, and deep reinforcement learning.

Data Science Visualization with Pandas

Covers data manipulation and exploration using Python with a focus on visualization techniques.

Data Wrangling with Hadoop: Storage Formats and Hive

Explores data wrangling with Hadoop, emphasizing storage formats and Hive for big data processing.

Big Data Ecosystems: Technologies and Challenges

Covers the fundamentals of big data ecosystems, focusing on technologies, challenges, and practical exercises with Hadoop's HDFS.

General Introduction to Data Science

Offers a comprehensive introduction to Data Science, covering Python, Numpy, Pandas, Matplotlib, and Scikit-learn, with a focus on practical exercises and collaborative work.

Decision Tree Classification

Covers decision tree classification using KNIME Analytics Platform for data preprocessing and model creation.

Gitlab Agent for Kubernetes (`agentk`)

Covers the setup of a Gitlab agent for Kubernetes, focusing on installation, version control, and troubleshooting.

Analytics on Data at Rest and Data in Motion

Explores combining data at rest with data in motion, emphasizing the Lambda architecture complexities and quality assessment of streams and batches.

Big Data: Best Practices and Guidelines

Covers best practices and guidelines for big data, including data lakes, typical architecture, challenges, and technologies used to address them.

Critical Data Studies: Reproducibility and Renku

Explores the significance of reproducibility in data science and introduces Renku, a platform for managing data-driven projects.

Elements of Collaborative Data Science

Introduces collaborative data science tools like Jupyter notebooks, Docker, and Git, emphasizing data versioning and containerization.

Data Wrangling with Hadoop: Advanced Techniques

Covers advanced data wrangling techniques using Hadoop, focusing on Hive and HBase integration.

Collaborative Data Science

Covers collaborative data science tools, big data concepts, Spark, and data stream processing, with tips for the final project.

Statistical Signal Processing

Covers Gaussian Mixture Models, Denoising, Data Classification, and Spike Sorting using Principal Component Analysis.

Data formats and data wrangling with Hadoop

Explores Apache Hive for data warehousing, data formats, and partitioning, with practical exercises in querying and connecting to Hive.

Spark Data Frames

Covers Spark Data Frames, distributed collections of data organized into named columns, and the benefits of using them over RDDs.

Data, big data, clouds and IoT

Explores data representation, databases, cloud computing, and challenges in the cloud environment.