Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Chip designers place on-chip thermal sensors to measure local temperatures, thus preventing thermal runaway situations in many-core processing architectures. However, the quality of the thermal reconstruction is directly dependent on the number of placed sensors, which should be minimized, while guaranteeing full detection of all the worst case temperature gradient. In this paper, we present an entire framework for the thermal management of complex many-core architectures, such that we can precisely recover the thermal distribution from a minimal number of sensors. The proposed sensor placement algo- rithm is guaranteed to reduce the impact of noisy measurements on the reconstructed thermal distribution. We achieve significant improvements compared to the state of the art, in terms of both computational complexity and reconstruction precision. For example, if we consider a 64 cores SoC with 64 noisy sensors (σ^2 = 4), we achieve an average reconstruction error of 1.5C, that is less than the half of what previous state-of-the-art methods achieve. We also study the practical limits of the proposed method and show that we do not need realistic workloads to learn the model and efficiently place the sensors. In fact, we show that the reconstruction error is not significantly increased if we randomly generate the power-traces of the components or if we have just a part of the correct workload.
David Atienza Alonso, Giovanni Ansaloni, Alireza Amirshahi
David Atienza Alonso, Marina Zapater Sancho, Giovanni Ansaloni, Rafael Medina Morillas, Yasir Mahmood Qureshi, Joshua Alexander Harrison Klein
David Atienza Alonso, Marina Zapater Sancho, Luis Maria Costero Valero, Darong Huang, Ali Pahlevan