Confluence: unified instruction supply for scale-out servers

Babak Falsafi, Boris Robert Grot, Ilknur Cansu Kaynak
2015
Conference paper

Abstract

Multi-megabyte instruction working sets of server workloads defy the capacities of latency-critical instruction-supply components of a core; the instruction cache (L1-I) and the branch target buffer (BTB). Recent work has proposed dedicated prefetching techniques aimed separately at L1-I and BTB, resulting in high metadata costs and/or only modest performance improvements due to the complex control-flow histories required to effectively fill the two components ahead of the core's fetch stream. This work makes the observation that the metadata for both the L1-I and BTB prefetchers require essentially identical information; the control-flow history. While the L1-I prefetcher necessitates the history at block granularity, the BTB requires knowledge of individual branches inside each block. To eliminate redundant metadata and multiple prefetchers, we introduce Confluence -- a frontend design with unified metadata for prefetching into both L1-I and BTB, whose contents are synchronized. Confluence leverages a stream-based prefetcher to proactively fill both components ahead of the core's fetch stream. The prefetcher maintains the control-flow history at block granularity and for each instruction block brought into the L1-I, eagerly inserts the set of branch targets contained in the block into the BTB. Confluence provides 85% of the performance improvement provided by an ideal frontend (with a perfect L1-I and BTB) with 1% area overhead per core, while the highest-performance alternative delivers only 62% of the ideal performance improvement with a per-core area overhead of 8%.

Official source

https://infoscience.epfl.ch/record/220653?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Confluence: unified instruction supply for scale-out servers

Graph Chatbot

Chat with Graph Search

Intermediate Address Space: virtual memory optimization of heterogeneous architectures for cache-resident workloads

HPCache: memory-efficient OLAP through proportional caching revisited

TiC-SAT: Tightly-coupled Systolic Accelerator for Transformers

Intermediate Address Space: virtual memory optimization of heterogeneous architectures for cache-resident workloads

HPCache: memory-efficient OLAP through proportional caching revisited

TiC-SAT: Tightly-coupled Systolic Accelerator for Transformers