System Support for Efficient Replication in Distributed Systems

Dragos-Adrian Seredinschi
2019
EPFL thesis

Abstract

Current online applications, such as search engines, social networks, or file sharing services, execute across a distributed network of machines. They provide non-stop services to their users despite failures in the underlying network. To achieve such a high level of reliability, these applications rely on a simple technique: replication. Briefly, there are copies of the application on multiple machines, such that no specific machine represents a single point of failure. Under this apparent simplicity, however, reliability comes at a steep price. This is because replication entails certain side-effects (e.g., overheads or costs, inconsistencies, or tradeoffs) which often lead to inefficient designs and want for better performance.

The high-level problem we study in this dissertation is that of obtaining good performance in replicated systems. We seek to reduce--and when irreducible, hide--the effects of replication. We approach this problem from three fronts, as follows.

First, we look at State Machine Replication (SMR). This is a classic technique for achieving strong consistency in replicated systems. Similar to other techniques for strong consistency, SMR brings a substantial performance overhead. Additionally, this overhead gets worse with growing system size: There is a performance decay as an SMR system gets larger. We shed light on this issue by deploying five SMR systems on up to 100 replicas and reporting on their performance decay. Towards mitigating this issue, we introduce Carousel, an SMR system based on a ring overlay network which can alleviate performance decay.

Second, we discuss a technique for hiding the cost of strong consistency. Given the high overhead of accessing data with strong consistency, we propose that applications combine multiple consistency models. We introduce programming support for doing so, through an abstraction called Correctables. In conjunction with a speculation-based technique, we show how Correctables can lower the latency for strongly-consistent operations.

Third, we propose a technique to bypass SMR (and avoid its costs) in a concrete replicated application. We focus on the problem of implementing token transfer applications (e.g., online payments). We introduce the abstraction of exclusive token accounts, or Exa, supporting asynchronous transfers. We also design and build Astro, a system implementing the Exa abstraction. This system departs from classic consensus-based solutions (i.e., SMR), and relies instead on a broadcast primitive, a more efficient and tractable building block.

Official source

https://infoscience.epfl.ch/record/270110?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

System Support for Efficient Replication in Distributed Systems

Graph Chatbot

Chat with Graph Search

Planetary-Scale Byzantine Fault Tolerance

Altruism, reciprocity, and tokens to reward forwarding data: Is that fair?

Advancing Cell-Free Synthetic Biology: From sustained PURE protein synthesis to an isothermal one-pot DNA replication system

Altruism, reciprocity, and tokens to reward forwarding data: Is that fair?

Planetary-Scale Byzantine Fault Tolerance

Advancing Cell-Free Synthetic Biology: From sustained PURE protein synthesis to an isothermal one-pot DNA replication system