Multi-Gigabyte On-Chip DRAM Caches for Servers

While DRAM latency has long been recognized as a major bottleneck in servers, DRAM bandwidth is emerging as an important bottleneck as server processors shift to many-core architectures to allow for sustainable throughput improvements. The rapid expansion of the digital universe, increasingly stored in memory, rapidly pushes the need for higher DRAM density as well. Emerging die-stacked DRAM technology dramatically improves the three major DRAM properties: latency, bandwidth and density. Recent advancements in die-stacking technology made it possible to integrate a sizeable amount of DRAM directly on top of the processor. While the feasible on-chip DRAM capacities are insufficient to satisfy the memory needs of modern servers, architecting on-chip DRAM as a high-capacity low-latency high-bandwidth cache has the potential to provide significant reduction both in off-chip memory traffic and in average memory access latency. We make the observation that high-capacity on-chip DRAM caches expose abundant spatial locality present in server applications and a modest amount of temporal data reuse. As a consequence, DRAM caches that manage and fetch data at a coarser granularity, e.g., in 2KB pages, exhibit overall superior properties compared to caches that do fine-grain management using 64B blocks. These properties include substantially higher hit rates, smaller tag storage, higher energy efficiency and set-associativity. Unfortunately, naive employment of page-based caches results in excessive data overfetch and capacity waste, as some of the fetched and allocated blocks are never accessed prior to their eviction. We demonstrate that if the cache is organized in pages, then page footprints -- i.e., the set of blocks that are touched while the page is in the cache -- are highly predictable using well-established code-correlation techniques. Accurately predicting access patterns within a page can eliminate most of the bandwidth overhead and capacity waste that page-based caches suffer from.

Multi-Gigabyte On-Chip DRAM Caches for Servers

Graph Chatbot

Multi-Ported GC-eDRAM Bitcell with Dynamic Port Configuration and Refresh Mechanism

EdgeAI-Aware Design of In-Memory Computing Architectures

Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures

Multi-Ported GC-eDRAM Bitcell with Dynamic Port Configuration and Refresh Mechanism

EdgeAI-Aware Design of In-Memory Computing Architectures

Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures