Explores parallelism in programming, emphasizing trade-offs between programmability and performance, and introduces shared memory parallel programming using OpenMP.
Covers data science tools, Hadoop, Spark, data lake ecosystems, CAP theorem, batch vs. stream processing, HDFS, Hive, Parquet, ORC, and MapReduce architecture.