NOC-Out: Microarchitecting a Scale-Out Processor

Babak Falsafi, Boris Robert Grot, Pejman Lotfi Kamran
2012
Conference paper

Abstract

Scale-out server workloads benefit from many-core processor organizations that enable high throughput thanks to abundant request-level parallelism. A key characteristic of these workloads is the large instruction footprint that exceeds the capacity of private caches. While a shared last-level cache (LLC) can capture the instruction working set, it necessitates a low-latency interconnect fabric to minimize the core stall time on instruction fetches serviced by the LLC. Many-core processors with a mesh interconnect sacrifice performance on scale-out workloads due to NOC-induced delays. Low diameter topologies can overcome the performance limitations of meshes through rich inter-node connectivity, but at a high area expense. To address the drawbacks of existing designs, this work introduces NOC-Out – a many-core processor organization that affords low LLC access delays at a small area cost. NOC-Out is tuned to accommodate the bilateral core-to-cache access pattern, characterized by minimal coherence activity and lack of inter-core communication, that is dominant in scale-out workloads. Optimizing for the bilateral access pattern, NOC-Out segregates cores and LLC banks into distinct network regions and reduces costly network connectivity by eliminating the majority of inter-core links. NOC-Out further simplifies the interconnect through the use of low-complexity tree based topologies. A detailed evaluation targeting a 64-core CMP and a set of scale-out workloads reveals that NOC-Out improves system performance by 17% and reduces network area by 28% over a tiled mesh-based design. Compared to a design with a richly-connected flattened butterfly topology, NOC-Out reduces network area by 9x while matching the performance.

Official source

https://infoscience.epfl.ch/record/182174?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

NOC-Out: Microarchitecting a Scale-Out Processor

Graph Chatbot

Chat with Graph Search

EdgeAI-Aware Design of In-Memory Computing Architectures

Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures

Imaging sensor device using an array of single-photon avalanche diode photodetectors

EdgeAI-Aware Design of In-Memory Computing Architectures

Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures

Imaging sensor device using an array of single-photon avalanche diode photodetectors