Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
The rapid expansion of online-based services requires novel energy and performance efficient architectures to meet power and latency constraints. Fast architectural exploration has become a key enabler in the proposal of architectural innovation. In this paper, we present gem5-X, a gem5-based system level simulation framework, and a methodology to optimize many-core systems for performance and power. As real-life case studies of many-core server workloads, we use real-time video transcoding and image classification using convolutional neural networks (CNNs). Gem5-X allows us to identify bottlenecks and evaluate the potential benefits of architectural extensions such as in-cache computing and 3D stacked High Bandwidth Memory. For real-time video transcoding, we achieve 15% speed-up using in-order cores with in-cache computing when compared to a baseline in-order system and 76% energy savings when compared to an Out-of-Order system. When using HBM, we further accelerate real-time transcoding and CNNs by up to 7% and 8% respectively.
David Atienza Alonso, Marina Zapater Sancho, Luis Maria Costero Valero, Darong Huang, Ali Pahlevan
David Atienza Alonso, Marina Zapater Sancho, Giovanni Ansaloni, Rafael Medina Morillas, Yasir Mahmood Qureshi, Joshua Alexander Harrison Klein