Predictive Reliability and Fault Management in Exascale Systems
Related publications (33)
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
Reconfigurable parallel computing is required to provide high-performance embedded computing, hide hardware complexity, boost software development, and manage multiple workloads when multiple applications are running simultaneously on the emerging network- ...
Institute of Electrical and Electronics Engineers2013
In this paper, we investigate the impact of circuit misbehavior due to parametric variations and voltage scaling on the performance of wireless communication systems. Our study reveals the inherent error resilience of such systems and argues that sufficien ...
The invention of the integrated circuit and the manufacturing progress as well as continuing progress in the manufacturing process are the fundamental engines for the implementation of all technologies that support today's information society. The vast maj ...
Replication has recently gained attention in the context of fault tolerance for large scale MPI HPC applications. Existing implementations try to cover all MPI codes and to be independent from the underlying library. In this paper, we evaluate the advantag ...
High performance computing will probably reach exascale in this decade. At this scale, mean time between failures is expected to be a few hours. Existing fault tolerant protocols for message passing applications will not be efficient anymore since they eit ...
Energy consumption is today one of the major topics that the HPC community tries to tackle. In this paper, the authors present a thought experiment aiming at building a node of a supercomputer based on a GPU (Nvidia GTX280). The paper concentrate on BLAS2 ...
In the world of High Performance Computing (newly renamed "High Productivity Computing"), where the race for performance is on, where the hunt for the last Flop rages and where the developers swear by the Cult of Power, the different applications have diff ...
For the last thirty years, electronics, at first built with discrete components, and then as Integrated Circuits (IC), have brought diverse and lasting improvements to our quality of life. Examples might include digital calculators, automotive and airplane ...
Despite recent advances achieved by application of high-performance computing methods and novel algorithmic techniques to maximum likelihood (ML)-based inference programs, the major computational bottleneck still consists in the computation of bootstrap su ...
The reactions and corresponding system of equations for the inorganic SO4 2--NO3 --NH4 + system have been studied with a new heterogeneous partitioning code, HETV. The code is based on the algorithms of ISORROPIA (Nenes et al., Aquat. Geochem. 4 (1998) 123 ...