Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing

In a distributed system using message logging and checkpointing to provide fault tolerance, there is always a unique maximum recoverable system state, regardless of the message logging protocol used. The proof of this relies on the observation that the set of system states that have occurred during any single execution of a system forms a lattice, with the sets of consistent and recoverable system states as sublattices. The maximum recoverable system state never decreases, and if all messages are eventually logged, the domino effect cannot occur. This paper presents a general model for reasoning about recovery in such a system and, based on this model, an efficient algo rithm for determining the maximum recoverable system state at any time. This work unifies existing approaches to fault tolerance based on message logging and checkpointing, and improves on existing methods for optimistic recovery in distributed systems.

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing

Graph Chatbot

Chattez avec Graph Search

Managing Tail Latency in Datacenter-Scale File Systems Under Production Constraints

The Complexity of Reliable and Secure Distributed Transactions

The Disclosure Power of Shared Objects

Managing Tail Latency in Datacenter-Scale File Systems Under Production Constraints

The Disclosure Power of Shared Objects

The Complexity of Reliable and Secure Distributed Transactions