The Heard-Of Model: Computing in Distributed Systems with Benign Failures

André Schiper
2007
Rapport ou document de travail

Résumé

Problems in fault-tolerant distributed computing have been studied in a variety of models. These models are structured around two central ideas: Degree of synchrony and failure model are two independent parameters that determine a particular type of system. The notion of faulty component is helpful and even necessary for the analysis of distributed computations when failures occur. In this work, we question these two basic principles of fault-tolerant distributed computing, and show that it is both possible and worthy to renounce them in the context of benign failures: we present a computational model, suitable for systems with benign failures, which is based only on the notion of transmission failure. In this model, computations evolve in rounds, and messages missed at a round are lost. Only information transmission is represented: for each round r and each process p, our model provides the set of processes that "hears of'' at round r (heard-of set) namely the processes from which p receives some message at round r. The features of a specific system are thus captured as a whole, just by a predicate over the collection of heard-of sets. We show that our model handles benign failures, be they static or dynamic, permanent or transient, in a unified framework. Using this new approach, we are able to give shorter and simpler proofs of important results (non-solvability, lower bounds). In particular, we prove that in general, Consensus cannot be solved without an implicit and permanent consensus on heard-of sets. We also examine Consensus algorithms in our model. In light of this specific agreement problem, we show how our approach allows us to devise new interesting solutions.

Source officielle

https://infoscience.epfl.ch/record/109375?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

The Heard-Of Model: Computing in Distributed Systems with Benign Failures

Graph Chatbot

Chattez avec Graph Search

Reliable Microsecond-Scale Distributed Computing

Special Session: Challenges and Opportunities for Sustainable Multi-Scale Computing Systems

Limiting Lamport Exposure to Distant Failures in Globally-Managed Distributed Systems

Reliable Microsecond-Scale Distributed Computing

Limiting Lamport Exposure to Distant Failures in Globally-Managed Distributed Systems

Special Session: Challenges and Opportunities for Sustainable Multi-Scale Computing Systems