Publication

Evaluating the performance of distributed agreement algorithms

Related publications (132)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Replication for Send-Deterministic MPI HPC Applications

André Schiper, Thomas Ropars, Arnaud Lefray

Replication has recently gained attention in the context of fault tolerance for large scale MPI HPC applications. Existing implementations try to cover all MPI codes and to be independent from the underlying library. In this paper, we evaluate the advantag ...

2013

State Machine Replication

Nuno Filipe de Sousa Santos

Since their invention more than half a century ago, computers have gone from being just an handful of expensive machines each filling an entire room, to being an integral part of almost every aspect of modern life. Nowadays computers are everywhere: in our ...

EPFL2012

A High-Throughput Byzantine Fault-Tolerant Protocol

Nikola Knezevic

State-machine replication (SMR) is a software technique for tolerating failures and for providing high availability in large-scale systems, through the use of commodity hardware. A replicated state-machine comprises a number of replicas, each of which runs ...

EPFL2012

Quantitative Analysis of Consensus Algorithms

André Schiper, Martin Hutle, Fatemeh Borran, Nuno Filipe de Sousa Santos

Consensus is one of the key problems in fault-tolerant distributed computing. Although the solvability of consensus is now a well-understood problem, comparing different algorithms in terms of efficiency is still an open problem. In this paper, we address ...

2012

Round-Based Consensus Algorithms, Predicate Implementations and Quantitative Analysis

Fatemeh Borran

Fault-tolerant computing is the art and science of building computer systems that continue to operate normally in the presence of faults. The fault tolerance field covers a wide spectrum of research area ranging from computer hardware to computer software. ...

EPFL2011

Model Checking of Distributed Algorithm Implementations

Maysam Yabandeh

It is notoriously difficult to develop reliable, high-performance distributed systems that run over asynchronous networks. Even if a distributed system is based on a well-understood distributed algorithm, its implementation can contain errors arising from ...

EPFL2011

Automated Vulnerability Discovery in Distributed Systems

Rachid Guerraoui, George Candea, Radu Banabic

In this paper we present a technique for automatically assessing the amount of damage a small number of participant nodes can inﬂict on the overall performance of a large distributed system. We propose a feedback-driven tool that synthesizes malicious node ...

2011

Verification of consensus algorithms using satisfiability solving

André Schiper

Consensus is at the heart of fault-tolerant distributed computing systems. Much research has been devoted to developing algorithms for this particular problem. This paper presents a semi-automatic verification approach for asynchronous consensus algorithms ...

Springer Verlag2011

A Fault-Tolerant Token-Based Atomic Broadcast Algorithm

André Schiper, Nils Richard Ekwall

Many atomic broadcast algorithms have been published in the last 20 years. Token-based algorithms represent a large class of these algorithms. Interestingly, all the token-based atomic broadcast algorithms rely on a group membership service and none of the ...

2011

The Failure Detector Abstraction

Rachid Guerraoui

A failure detector is a fundamental abstraction in distributed computing. This paper surveys this abstraction through two dimensions. First we study failure detectors as building blocks to simplify the design of reliable distributed algorithms. In particul ...

2011