Publication

Abstractions for Solving Consensus and Related Problems with Byzantine Faults

Zarko Milosevic
2013
Thèse EPFL
Résumé

We become increasingly dependent on online services; therefore, their availability and correct behavior become increasingly important. Software replication is a popular technique for ensuring that computer systems continue to provide a correct service even when some of their components fail. By replicating a service on multiple servers, clients are guaranteed that even if some replica fails, the service is still available. At the core of software replication is the consensus problem, where a set of processes has to agree on a single value. A large number of consensus algorithms for different system models have been proposed. The most general system models (for which consensus is solvable) do not make strong assumptions on the synchrony (allow period of asynchrony) and assume that a subset of processes can fail completely arbitrarily (Byzantine faults). However, solving consensus in the presence of arbitrary faults and asynchrony is hard and demands sophisticated algorithms. Most of the existing consensus algorithms that deal with arbitrary faults are monolithic and developed from scratch, or by modifying existing algorithms in a non-modular manner. As a consequence, these algorithms are rather complex and hard to understand. We impute this complexity to the lack of adequate abstractions. The motivation of this thesis is suggesting abstractions that simplify the understanding of existing consensus algorithms with arbitrary faults and allow modular design of novel algorithms. The thesis also aims to clarify relations between consensus and the total-order broadcast problem in the presence of arbitrary faults. In the context of the consensus problem with arbitrary process faults, the literature distinguishes (1) authenticated Byzantine faults, where messages can be signed by the sending process, and (2) Byzantine faults, where there is no mechanism for signatures. Consensus protocols that assume Byzantine faults (without authentication) are harder to develop and prove correct than algorithms that consider authenticated Byzantine faults, even when they are based on the same idea. We propose an abstraction called weak interactive consistency (or WIC), that allows us to design consensus algorithms that can be instantiated into algorithms for authenticated Byzantine faults (signed messages) and algorithms for Byzantine faults. In other words, WIC unifies Byzantine consensus algorithms with and without signatures. This is illustrated on two seminal Byzantine consensus algorithms: the Castro-Liskov PBFT algorithm (no signatures) and the Martin-Alvisi FaB Paxos algorithms (signatures). WIC allows a very concise expression of these two algorithms. Furthermore, WIC turns out to be fundamental abstraction for solving consensus in the transmission fault model. The transmission fault model captures faults without blaming a specific component for the fault, and it is well-adapted to dynamic and transient faults. Using WIC we designed a consensus algorithm that overcomes limitations of all existing solutions to consensus in this model, which assume the synchronous system model, or require strong conditions for termination that exclude the case where all messages of a process can be corrupted. Then we go one step further in unifying consensus algorithms by proposing a generic consensus algorithm that highlights, through well chosen parameters, the core mechanisms of a number of well-known consensus algorithms including Paxos, OneThirdRule, PBFT and FaB Paxos. Interestingly, the generic algorithm allows us to identify a new Byzantine consensus algorithm that requires n > 4b, in-between the requirement n > 5b of FaB Paxos and n > 3b of PBFT (b is the maximum number of Byzantine processes). Afterwards, we study the relation between consensus and total-order broadcast in the presence of Byzantine faults. Total-order broadcast is defined for a set of processes, where each process can broadcast messages, with the guarantee that all processes in this set see the same sequence of messages. Among the several definitions of Byzantine consensus that differ only by their validity property, we identify those equivalent to total-order broadcast. We also give the first deterministic total-order broadcast reduction to consensus with constant time complexity with respect to consensus. Finally, we consider state-machine replication (SMR) with Byzantine faults. State-machine replication is a general approach for replicating services that can be modeled as a state machine. The key idea of this approach is to guarantee that all replicas start in the same state and then apply requests from clients in the same order, thereby guaranteeing that the replica states do not diverge. Recent studies has shown that most BFT-SMR algorithms do not actually perform well under performance attacks by Byzantine processes. We propose a new BFT-SMR algorithm, called BFT-Mencius, that guarantees, assuming a partially synchronous system model, that the latency of updates of correct processes is eventually upper-bounded, even under performance attacks by Byzantine processes. BFT-Mencius is a modular, signature-free algorithm based on a new communication primitive called Abortable Timely Announced Broadcast (ATAB). We evaluate the performance of BFT-Mencius in cluster settings, and show that it performs comparably to the state-of-the-art algorithms such as PBFT and Spinning in fault-free configurations and outperforms these algorithms under performance attacks by Byzantine processes.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Concepts associés (38)
Problème du consensus
Le problème du consensus est un problème fondamental en théorie du calcul distribué. Il consiste pour un ensemble de machines à se mettre d'accord sur une valeur ou, par extension, sur une séquence de valeurs. La résolution du consensus est primordiale pour la coordination des systèmes distribués. Elle permet notamment la consistance des systèmes répliqués malgré la défaillance d'une partie de leurs composants.
Problème des généraux byzantins
En informatique, le problème des généraux byzantins est une métaphore qui traite de la remise en cause de la fiabilité des transmissions et de l'intégrité des interlocuteurs. La question est donc de savoir comment, et dans quelle mesure, il est possible de prendre en compte une information dont la source ou le canal de transmission est suspect. La solution implique l'établissement d'un algorithme (d'une stratégie) adapté. Ce problème a été traité en profondeur pour la première fois dans l'article The Byzantine Generals Problem publié en 1982.
Tolérance aux pannes
vignette|Fichier GIF animé de 8 algorithmes ECT dans un réseau 802.1aq. La source est surlignée en violet, la destination en jaune. Les lignes violettes sont des chemins entre la source et la destination et l'épaisseur indique combien de chemins traversent un lien donné. La tolérance aux pannes (ou « insensibilité aux pannes ») désigne une méthode de conception permettant à un système de continuer à fonctionner, éventuellement de manière réduite (on dit aussi en « mode dégradé »), au lieu de tomber complètement en panne, lorsque l'un de ses composants ne fonctionne plus correctement.
Afficher plus
Publications associées (87)

Planetary-Scale Byzantine Fault Tolerance

Matteo Monti

The scale and pervasiveness of the Internet make it a pillar of planetary communication, industry and economy, as well as a fundamental medium for public discourse and democratic engagement. In stark contrast with the Internet's decentralized infrastructur ...
EPFL2024

Byzantine consensus is Θ(n^2): the Dolev-Reischuk bound is tight even in partial synchrony!

Rachid Guerraoui, Jovan Komatovic, Pierre Philippe Civit, Manuel José Ribeiro Vidigueira, Vincent Gramoli, Seth Gilbert

The Dolev-Reischuk bound says that any deterministic Byzantine consensus protocol has (at least) quadratic (in the number of processes) communication complexity in the worst case: given a system with n processes and at most f < n/3 failures, any solution t ...
2023

Byzantine-Resilient Learning Beyond Gradients: Distributing Evolutionary Search

Rachid Guerraoui, Andrei Kucharavy, Matteo Monti

Modern machine learning (ML) models are capable of impressive performances. However, their prowess is not due only to the improvements in their architecture and training algorithms but also to a drastic increase in computational power used to train them.|S ...
New York2023
Afficher plus

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.