Nikola Knezevic
State-machine replication (SMR) is a software technique for tolerating failures and for providing high availability in large-scale systems, through the use of commodity hardware. A replicated state-machine comprises a number of replicas, each of which runs an agreement protocol, with the goal of ensuring a consistent state across all of the replicas. In hostile environments, such as the Internet, Byzantine fault tolerant state-machine replication (BFT) is an important technique for providing robust services. During the past decade, we have seen an emergence of various BFT protocols. In order to be adopted, besides providing correctness, a BFT must provide good performance as well. Consequently, all of the new protocols focus on improving performance under various conditions. However, a closer look at the performance of state-of-the-art BFT protocols reveals that even in best-case execution scenarios, they still remain far behind their theoretical maximum. Based on exhaustive evaluation and monitoring of existing BFT protocols, we highlight a few impediments to their scalability. These obstructions include the use of IP multicast, the presence of bottlenecks due to asymmetric replica processing, and an unbalanced network bandwidth utilization. The goal of this thesis is to evaluate the actual impact of these scalability impediments, and to offer a solution for a high-throughput BFT protocol in the case in which the network itself is the bottleneck. To that end, we have developed Ring, a new BFT protocol which circumvents the aforementioned impediments. As its name suggests, Ring uses the ring communication topology, in the fault-free case. In the ring topology, each replica only performs point-to-point communications with two other replicas, namely its neighbors on the ring. Moreover, all of the replicas equally accept requests from clients and perform symmetric processing. Our performance evaluation shows that, with the network as the bottleneck, Ring outperforms all other, state-of-the-art BFT protocols. Ring achieves 118 Mbps on the Fast Ethernet – a 24% improvement in throughput over previous protocols. Finally, we conducted an extensive practical and analytic evaluation of Ring. In order to analyse benefits (and drawbacks) of Ring (and other protocols) under different settings, without resorting to costly experimentation, we developed an analytical performance model. Our performance model is based on queueing theory, and relies only on a handful of protocolagnostic measurements of the environment.
EPFL2012