Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache

Babak Falsafi, Ilknur Cansu Kaynak
2014
Conference paper

Abstract

Recent research advocates large die-stacked DRAM caches in manycore servers to break the memory latency and bandwidth wall. To realize their full potential, die-stacked DRAM caches necessitate low lookup latencies, high hit rates and the efficient use of off-chip bandwidth. Today's stacked DRAM cache designs fall into two categories based on the granularity at which they manage data: block-based and page-based. The state-of-the-art block-based design, called Alloy Cache, colocates a tag with each data block (e.g., 64B) in the stacked DRAM to provide fast access to data in a single DRAM access. However, such a design suffers from low hit rates due to poor temporal locality in the DRAM cache. In contrast, the state-of-the-art page-based design, called Footprint Cache, organizes the DRAM cache at page granularity (e.g., 4KB), but fetches only the blocks that will likely be touched within a page. In doing so, the Footprint Cache achieves high hit rates with moderate on-chip tag storage and reasonable lookup latency. However, multi-gigabyte stacked DRAM caches will soon be practical and needed by server applications, thereby mandating tens of MBs of tag storage even for page-based DRAM caches. We introduce a novel stacked-DRAM cache design, Unison Cache. Similar to Alloy Cache's approach, Unison Cache incorporates the tag metadata directly into the stacked DRAM to enable scalability to arbitrary stacked-DRAM capacities. Then, leveraging the insights from the Footprint Cache design, Unison Cache employs large, page-sized cache allocation units to achieve high hit rates and reduction in tag overheads, while predicting and fetching only the useful blocks within each page to minimize the off-chip traffic. Our evaluation using server workloads and caches of up to 8GB reveals that Unison cache improves performance by 14% compared to Alloy Cache due to its high hit rate, while outperforming the state-of-the art page-based designs that require impractical SRAM-based tags of around 50MB.

Official source

https://infoscience.epfl.ch/record/202128?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache

Graph Chatbot

Chat with Graph Search

Multi-Ported GC-eDRAM Bitcell with Dynamic Port Configuration and Refresh Mechanism

Rebooting Virtual Memory with Midgard

A 128-kbit GC-eDRAM With Negative Boosted Bootstrap Driver for 11.3x Lower-Refresh Frequency at a 2.5% Area Overhead in 28-nm FD-SOI

Rebooting Virtual Memory with Midgard

Multi-Ported GC-eDRAM Bitcell with Dynamic Port Configuration and Refresh Mechanism

A 128-kbit GC-eDRAM With Negative Boosted Bootstrap Driver for 11.3x Lower-Refresh Frequency at a 2.5% Area Overhead in 28-nm FD-SOI