Covers the exponential growth of data, challenges in processing technology, data variety, cleaning, approximate query processing, multi-query analytics, and hybrid transaction processing.
Covers data science tools, Hadoop, Spark, data lake ecosystems, CAP theorem, batch vs. stream processing, HDFS, Hive, Parquet, ORC, and MapReduce architecture.