Stretch: Balancing QoS and Throughput for Colocated Server Workloads on SMT Cores

Boris Robert Grot, Siddharth Gupta
2019
Article de conférence

Résumé

In a drive to maximize resource utilization, today's datacenters are moving to colocation of latency-sensitive and batch workloads on the same server. State-of-the-art deployments, such as those at Google, colocate such diverse workloads even on a single SMT core. This form of aggressive colocation is afforded by virtue of the fact that a latency-sensitive service operating below its peak load has significant slack in its response latency with respect to the QoS target. The slack affords a degradation in single-thread performance, which is inevitable under SMT colocation, without compromising QoS targets. This work makes the observation that many batch applications can greatly benefit from a large instruction window to uncover ILP and MLP. Under SMT colocation, conventional wisdom holds that individual hardware threads should be limited in their ability to acquire and hold a disproportionately large share of microarchitectural resources so as not to compromise the performance of a co-running thread. We show that the performance slack inherent in latency-sensitive workloads operating at low to moderate load makes it safe to shift microarchitectural resources to a co-running batch thread without compromising QoS targets. Based on this insight, we introduce Stretch, a simple ROB partitioning scheme that is invoked by system software to provide one hardware thread with a much larger ROB partition at the expense of another thread. When Stretch is enabled for latency-sensitive workloads operating below their peak load on an SMT core, co-running batch applications gain 13% of performance on average (30% max) over a baseline SMT colocation and without compromising QoS constraints.

Source officielle

https://infoscience.epfl.ch/record/266792?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Stretch: Balancing QoS and Throughput for Colocated Server Workloads on SMT Cores

Graph Chatbot

Chattez avec Graph Search

Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures

EdgeAI-Aware Design of In-Memory Computing Architectures

Exploring brain-inspired multi-core heterogeneous hardware templates for low-power biomedical embedded systems

Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures

EdgeAI-Aware Design of In-Memory Computing Architectures

Exploring brain-inspired multi-core heterogeneous hardware templates for low-power biomedical embedded systems