Gem5-X: A Gem5-Based System Level Simulation Framework to Optimize Many-Core Platforms

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

The rapid expansion of online-based services requires novel energy and performance efficient architectures to meet power and latency constraints. Fast architectural exploration has become a key enabler in the proposal of architectural innovation. In this paper, we present gem5-X, a gem5-based system level simulation framework, and a methodology to optimize many-core systems for performance and power. As real-life case studies of many-core server workloads, we use real-time video transcoding and image classification using convolutional neural networks (CNNs). Gem5-X allows us to identify bottlenecks and evaluate the potential benefits of architectural extensions such as in-cache computing and 3D stacked High Bandwidth Memory. For real-time video transcoding, we achieve 15% speed-up using in-order cores with in-cache computing when compared to a baseline in-order system and 76% energy savings when compared to an Out-of-Order system. When using HBM, we further accelerate real-time transcoding and CNNs by up to 7% and 8% respectively.

Gem5-X: A Gem5-Based System Level Simulation Framework to Optimize Many-Core Platforms

Graph Chatbot

Chattez avec Graph Search

Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures

Intermediate Address Space: virtual memory optimization of heterogeneous architectures for cache-resident workloads

Imaging sensor device using an array of single-photon avalanche diode photodetectors

Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures

Intermediate Address Space: virtual memory optimization of heterogeneous architectures for cache-resident workloads

Imaging sensor device using an array of single-photon avalanche diode photodetectors