Post-Moore's Law Fusion: High-Bandwidth Memory, Accelerators, and Native Half-Precision Processing for CPU-Local Analytics

Modern data management systems aim to provide both cutting-edge functionality and hardware efficiency. With the advent of AI-driven data processing and the post-Moore Law era, traditional memory-bound scale-up data management operations face scalability challenges. On the other hand, using accelerators such as GPUs has long been explored to offload complex analytical patterns while trading-off data movement over an interconnect. GPUs typically provide massive parallelism and high-bandwidth memory, while CPUs are near-data processors and coordinators that are often memory-bound. In this work, we provide a first look over an architecture that mixes the best of the CPU and GPU world: high-bandwidth memory (HBM), core-local accelerators for matrix multiplications (AMX), and native half-precision data processing inside 4th Generation Intel Xeon Scalable processors known as Sapphire Rapids. We analyze the system, provide an overview of its hierarchical NUMA architecture, focus on individual components, and explore their interplay and how they impact the traditional DRAM bandwidth wall on typical data access patterns and novel AI-DB interactions of vector data processing.

Post-Moore's Law Fusion: High-Bandwidth Memory, Accelerators, and Native Half-Precision Processing for CPU-Local Analytics

Graph Chatbot

Chattez avec Graph Search

Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures

Exploring High-Performance and Energy-Efficient Architectures for Edge AI-Enabled Applications

EdgeAI-Aware Design of In-Memory Computing Architectures

Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures

Exploring High-Performance and Energy-Efficient Architectures for Edge AI-Enabled Applications

EdgeAI-Aware Design of In-Memory Computing Architectures