State Machine Replication : from Analytical Evaluation to High-Performance Paxos

Since their invention more than half a century ago, computers have gone from being just an handful of expensive machines each filling an entire room, to being an integral part of almost every aspect of modern life. Nowadays computers are everywhere: in our planes, in our cars, on our desks, in our home appliances, and even in our pockets. This widespread adoption had a profound impact in our world and in our lives, so much that now we rely on them for many important aspects of everyday life, including work, communication, travel, entertainment, and even managing our money. Given our increased reliance on computers, their continuous and correct operation has become essential for modern society. However, individual computers can fail due to a variety of causes and, if nothing is done about it, these failures can easily lead to a disruption of the service provided by computer system. The field of fault tolerance studies this problem, more precisely, it studies how to enable a computer system to continue operation in spite of the failure of individual components. One of the most popular techniques of achieving fault tolerance is software replication, where a service is replicated on an ensemble of machines (replicas) such that if some of these machines fail, the others will continue providing the service. Software replication is widely used because of its generality (can be applied to most services) and its low cost (can use off-the-shelf hardware). This thesis studies a form of software replication, namely, state machine replication, where the service is modeled as a deterministic state machine whose state transitions consist of the execution of client requests. Although state machine replication was first proposed almost 30 years ago, the proliferation of online services during the last years has led to a renewed interest. Online services must be highly available and for that they frequently rely on state machine replication as part of their fault tolerance mechanisms. However, the unprecedented scale of these services, which frequently have hundreds of thousands or even millions of users, leads to a new set performance requirements on state machine replication. This thesis is organized in two parts. The goal of the first part is to study from a theoretical perspective the performance characteristics of the algorithms behind state machine replication and to propose improved variants of such algorithms. The second part looks at the problem from a practical perspective, proposing new techniques to achieve high-throughput and scalability. In the first part, we start with an analytical analysis of the performance of two consensus algorithms, one leader-free (an adaptation of the fast round of Fast Paxos) and another leader-based (an adaptation of classical Paxos). We express these algorithms in the Heard-Of round model and show that using this model it is fairly easy to determine analytically several interesting performance metrics. We then study the p

Oligolithic Cross-task Optimizations across Isolated Workloads

Anastasia Ailamaki, Panagiotis Sioulas, Eleni Zapridou

Enterprises collect data in large volumes and leverage them to drive numerous concurrent decisions and business processes. Their teams deploy multiple applications that often operate concurrently on the same data and infrastructure but have widely differen ...

2024

Modified capillary number to standardize droplet generation in suction-driven microfluidics

Jatin Panwar, Rahul Roy

In droplet microfluidic devices with suction-based flow control, the microchannel geometry and suction pressure at the outlet govern the dynamic properties of the two phases that influence the droplet generation. Therefore, it is critical to understand the ...

Springer Heidelberg2024

State Machine Replication : from Analytical Evaluation to High-Performance Paxos

Graph Chatbot

Planetary-Scale Byzantine Fault Tolerance

Oligolithic Cross-task Optimizations across Isolated Workloads

Modified capillary number to standardize droplet generation in suction-driven microfluidics

Planetary-Scale Byzantine Fault Tolerance

Oligolithic Cross-task Optimizations across Isolated Workloads

Modified capillary number to standardize droplet generation in suction-driven microfluidics