Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Advancements in technology enable integration of multiple devices on a single core, resulting in increased on chip power and temperature densities. Higher temperatures, in turn, present a significant challenge for reliability. In this work we propose a comprehensive framework for analyzing reliability of multi-core systems, considering permanent faults. We show that aggressive power management can have an impact on reliability due to temperature cycling. Our cycle-accurate simulation methodology shows fine-grained variations of device failure rates over short time scales, thus enabling workload analysis and scheduling to control the reliability impact. On the other hand, the statistical reliability simulator and optimizer give a view into the long time horizon reliability analysis—over system lifetime, and help us optimize a power management policy under reliability and performance constraints. We show that our optimization strategy can achieve large power savings while still meeting the reliability and performance constraints.
David Atienza Alonso, Giovanni Ansaloni, Alireza Amirshahi
David Atienza Alonso, Marina Zapater Sancho, Luis Maria Costero Valero, Darong Huang, Ali Pahlevan
David Atienza Alonso, Marina Zapater Sancho, Alexandre Sébastien Julien Levisse, Mohamed Mostafa Sabry Aly, Halima Najibi