Covers data wrangling techniques using Hadoop, focusing on row versus column-oriented databases, popular storage formats, and HBase-Hive integration.
Explores how views simplify query writing and are key in data warehouses.
Explores convergence, numerical pathologies, and mesh convergence studies in finite element discretizations.
Covers Spark Data Frames, their performance benefits, Catalyst Optimizer, PySpark integration, and Gutenberg corpus analysis.
Introduces the relational model, SQL, keys, integrity constraints, ER translation, weak entities, ISA hierarchies, and SQL vs. noSQL.
Explores Run-Length Encoding, Query optimization, and Execution models for efficient query processing.
Introduces relational query languages, focusing on algebra operators and query optimization.
Covers data wrangling techniques using Apache Hive for efficient big data management.
Explores replication strategies for fault tolerance, load balancing, and eventual consistency in distributed transactions.
Covers the CS422 project on the iterator model and relational operators.