Publication

Failure Detectors: implementation issues and impact on consensus performance

André Schiper, Xavier Défago, Nicoleta Sergent
1999
Report or working paper

Abstract

Due to their nature, distributed systems are vulnerable to failures of some of their parts. Conversely, distribution also provides a way to increase the fault tolerance of the overall system. However, achieving fault tolerance is not a simple problem and requires complex techniques. An agreement problem known as the problem of consensus is at the heart of most problems encountered during the design of a fault tolerant system. This problem is however not solvable in the asynchronous system model, unless the model is augmented with adequate failure detectors. The resulting system model is a time-free model since all timing issues are abstracted by the characteristics of the failure detectors. It is sometimes claimed that time-based system models are more realistic than time-free models for solving distributed agreement problems. The goal of this paper is to show that solving consensus in the asynchronous system model augmented with failure detectors does not prevent from considering timing issues. We consider the consensus algorithm with various implementations of failure detectors, and we analyse their impact on the termination time of the consensus algorithm. This study shows that the design of fault-tolerant distributed algorithms in the asynchronous system model augmented with failure detectors is orthogonal to the issue of implementing the actual failure detectors. This nicely decouples logical issues (proof of safety and liveness of an algorithm) from engineering issues (e.g., performance and timing constraints).

Official source

https://infoscience.epfl.ch/record/52313?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

André Schiper, Xavier Défago, Nicoleta Sergent
1999
Report or working paper

Abstract

Official source

https://infoscience.epfl.ch/record/52313?ln=en

About this result

Related concepts (32)

Related publications (70)

Related MOOCs (10)

Failure Detectors: implementation issues and impact on consensus performance

Graph Chatbot

Chat with Graph Search

Planetary-Scale Byzantine Fault Tolerance

Building Strongly-Consistent Systems Resilient to Failures, Partitions, and Slowdowns

As easy as ABC: Optimal (A)ccountable (B)yzantine (C)onsensus is easy!

Planetary-Scale Byzantine Fault Tolerance

Building Strongly-Consistent Systems Resilient to Failures, Partitions, and Slowdowns

As easy as ABC: Optimal (A)ccountable (B)yzantine (C)onsensus is easy!