Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit
Graph Chatbot
Chattez avec Graph Search
Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.
AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.
This paper presents the processreplication protocol of Manetho, a system whose goal is to provide efficient, applicationtransparent fault tolerance to longrunning distributed computations. Manetho uses a new negative acknowledgment multicast protocol t ...
Sender-based message logging, a low-overhead mechanism for providing transparent fault-tolerance in distributed systems, is described. It differs from conventional message logging mechanisms in that each message is logged in volatile memory on the machine ...
Area and power constrained edge devices are increasingly utilized to perform compute intensive workloads, necessitating increasingly area and power efficient accelerators. In this context, in-SRAM computing performs hundreds of parallel operations on spati ...
Commodity computer clusters are often composed of hundreds of computing nodes. These generally off-the-shelf systems are not designed for high reliability. Node failures therefore drive the MTBF of such clusters to unacceptable levels. The software framewo ...
Institute of Electrical and Electronics Engineers Computer Society, Piscataway, NJ 08855-1331, United States2005
The perfectly-synchronized round-based model provides the powerful abstraction of crash-stop failures with atomic and synchronous message delivery. This abstraction makes distributed programming very easy.We describe a technique to automatically transform ...
Message logging and check pointing can provide fault tolerance in distributed systems in which all process communication is through messages. This paper presents a general model for reasoning about recovery in these systems. Using this model_ we prove that ...