Publication# On The Weakest Failure Detector Ever

Abstract

Many problems in distributed computing are impossible when no information about process failures is available. So what is the minimal yet non-trivial failure information? In other words, what is the minimal information about failures needed to circumvent any impossibility and sufficient to circumvent some impossibility. This paper proposes a candidate abstraction, denoted ?, to capture this failure information. In every run of the distributed system, ? eventually informs the processes that some set of processes in the system cannot be the set of correct processes in that run. Although seemingly weak, for it might provide random information for an arbitrarily long period of time, and it only excludes one possibility of correct set among many, ? still captures non-trivial failure information. We show that ? is sufficient to circumvent the celebrated wait-free set-agreement impossibility. While doing so, we (a) disprove previous conjectures about the weakest failure detector to solve set-agreement and we (b) prove that solving set-agreement with registers is strictly weaker than solving n + 1-process consensus using n-process consensus. We prove that ? is, in some sense, necessary to circumvent any wait-free impossibility. As a corollary, set-agreement is, from a failure information perspective, a minimal wait-free impossible problem in distributed computing. Our results are generalized through an abstraction ?f that we introduce and prove necessary to solve any problem that cannot be solved in an f -resilient manner, and yet sufficient to solve f -resilient f -set-agreement.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related publications (37)

Loading

Loading

Loading

Related concepts (6)

Computing

Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and developm

Consensus (computer science)

A fundamental problem in distributed computing and multi-agent systems is to achieve overall system reliability in the presence of a number of faulty processes. This often requires coordinating proce

Cloud computing

Cloud computing is the on-demand availability of computer system resources, especially data storage (cloud storage) and computing power, without direct active management by the user. Large clouds of

Many problems in distributed computing are impossible to solve when no information about process failures is available. It is common to ask what information about failures is necessary and sufficient to circumvent some specific impossibility, e. g., consensus, atomic commit, mutual exclusion, etc. This paper asks what information about failures is necessary to circumvent any impossibility and sufficient to circumvent some impossibility. In other words, what is the minimal yet non-trivial failure information. We present an abstraction, denoted gamma, that provides very little information about failures. In every run of the distributed system,. eventually informs the processes that some set of processes in the system cannot be the set of correct processes in that run. Although seemingly weak, for it might provide random information for an arbitrarily long period of time, and it eventually excludes only one set of processes (among many) that is not the set of correct processes in the current run, gamma still captures non-trivial failure information. We show that gamma is sufficient to circumvent the fundamental wait-free set-agreement impossibility. While doing so, (a) we disprove previous conjectures about the weakest failure detector to solve set-agreement and (b) we prove that solving set-agreement with registers is strictly weaker than solving n + 1-process consensus using n-process consensus. We show that. is the weakest stable non-trivial failure detector: any stable failure detector that circumvents some wait-free impossibility provides at least as much information about failures as. does. Our results are generalized, from the wait-free to the f-resilient case, through an abstraction gamma(f) that we introduce and prove minimal to solve any problem that cannot be solved in an f-resilient manner, and yet sufficient to solve f-resilient f-set-agreement.

Dan Alistarh, Seth Gilbert, Rachid Guerraoui

Set agreement is a fundamental problem in distributed computing in which processes collectively choose a small subset of values from a larger set of proposals. The impossibility of fault-tolerant set agreement in asynchronous networks is one of the seminal results in distributed computing. In synchronous networks, too, the complexity of set agreement has been a significant research challenge that has now been resolved. Real systems, however, are neither purely synchronous nor purely asynchronous. Rather, they tend to alternate between periods of synchrony and periods of asynchrony. Nothing specific is known about the complexity of set agreement in such a ``partially synchronous'' setting. In this paper, we address this challenge, presenting the first (asymptotically) tight bound on the complexity of set agreement in such systems. We introduce a novel technique for simulating, in a fault-prone asynchronous shared memory, executions of an asynchronous and failure-prone message-passing system in which some fragments appear synchronous to some processes. We use this simulation technique to derive a lower bound on the round complexity of set agreement in a partially synchronous system by a reduction from asynchronous wait-free set agreement. Specifically, we show that every set agreement protocol requires at least $\floor{t}{k} + 2$ synchronous rounds to decide. We present an (asymptotically) matching algorithm that relies on a distributed asynchrony detection mechanism to decide as soon as possible during periods of synchrony. From these two results, we derive the size of the minimal window of synchrony needed to solve set agreement. By relating synchronous, asynchronous and partially synchronous environments, our simulation technique is of independent interest. In particular, it allows us to obtain a new lower bound on the complexity of early deciding $k$-set agreement complementary to that of~\cite{Gafni2005}, and to re-derive the combinatorial topology lower bound of~\cite{Herlihy2006} in an algorithmic way.

2012The field of distributed computing has a long history, of more than fifty years. During that time, our understanding of the field has improved immensely and a certain body of folklore beliefs has formed. However, such folklore beliefs are not necessarily always true.
In this thesis, we identify hidden complexities (in the sense of intricacies or costs) of distributed systems that contradict some of these folklore beliefs. Specifically, in this thesis, we challenge accepted beliefs in the following way: i) consensus algorithms need not be leader-based: there exists a deterministic leaderless consensus algorithm that is robust to non-synchronous periods, ii) completing a state machine replication command can be arbitrarily more expensive than solving a consensus instance, and iii) no data store actually provides fast transactions because they are impossible. Our results are associated to some of the fundamental problems in the field of distributed computing and are of significant practical relevance.