**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# The Universal Gossip Fighter

Rachid Guerraoui, Anne-Marie Kermarrec, Anastasiia Kucherenko, Rafaël Benjamin Pinot

2022

Article de conférence

2022

Article de conférence

Résumé

The notion of adversary is a staple of distributed computing. An adversary typically models “hostile” assumptions about the underlying distributed environment, e.g., a network that can drop messages, an operating system that can delay processes or an attacker that can hack machines. So far, the goal of distributed computing researchers has mainly been to develop a distributed algorithm that can face a given adversary, the abstraction characterizing worst-case scenarios. This paper initiates the study of the somehow opposite approach. Given a distributed algorithm, the adversary is the abstraction we seek to implement. More specifically, we consider the problem of controlling the spread of messages in a large- scale system, conveying the practical motivation of limiting the dissemination of fake news or viruses. Essentially, we assume a general class of gossip protocols, called all-to-all gossip protocols, and devise a practical method to hinder the dissemination. We present the Universal Gossip Fighter (UGF). Just like classical adversaries in distributed computing, UGF can observe the status of a dissemination and decide to stop some processes or delay some messages. The originality of UGF lies in the fact that it is universal, i.e., it applies to any all-to-all gossip protocol. We show that any gossip protocol attacked by UGF ends up exhibiting a quadratic message complexity (in the total number of processes) if it achieves sublinear time of dissemination. We also show that if a gossip protocol aims to achieve a message complexity α times smaller than quadratic, then the time complexity rises exponentially in relation to α. We convey the practical relevance of our theoretical findings by implementing UGF and conducting a set of empirical experiments that confirm some of our results.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (16)

Publications associées (32)

Système d'exploitation

En informatique, un système d'exploitation (souvent appelé OS — de l'anglais operating system — ou parfois SE — en français) est un ensemble de programmes qui dirige l'utilisation des ressources d'u

Computing

Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and developm

Algorithme

thumb|Algorithme de découpe d'un polygone quelconque en triangles (triangulation).
Un algorithme est une suite finie et non ambiguë d'instructions et d’opérations permettant de résoudre une classe de

Chargement

Chargement

Chargement

Partha Dutta, Rachid Guerraoui, Bastian Pochon

Agreement is at the heart of distributed computing. In its simple form, it requires a set of processes to decide on a common value out of the values they propose. The time-complexity of distributed agreement problems is generally measured in terms of the number of communication rounds needed to achieve a global decision; i.e., for all non-faulty (correct) processes to reach a decision. This paper studies the time-complexity of local decisions in agreement problems, which we de¯ne as the number of communication rounds needed for at least one correct process to decide. We explore bounds for early local decision, that depend on the number f of actual failures (that occur in a given run of an algorithm), out of the maximum number t of failures tolerated (by the algorithm). We ¯rst consider the synchronous message-passing model where we give tight local decision bounds for three variants of agreement: consensus, uniform consensus and (non-blocking) atomic commit. We use these results to (1) show that, for consensus, local decision bounds are not compatible with global decision bounds (roughly speaking, they cannot be reached by the same algorithm), and (2) draw the ¯rst sharp line between the time-complexity of uniform consensus and atomic commit. Then we consider the eventually synchronous model where we give tight local decision bounds for synchronous runs of uniform consensus. (In this model, consensus and uniform consensus are similar, atomic commit is impossible, and one cannot bound the number of rounds to reach a decision in non-synchronous runs of consensus algorithms.) We prove a counter-intuitive result that the early local decision bound is the same as the early global decision bound. We also give a matching early deciding consensus algorithm that is signi¯cantly better than previous eventually synchronous consensus algorithms.

2007In a distributed application, high-availability of a critical online service is ensured despite failures by duplicating the vital components of the server. Whilst guaranteeing the access to the server at all times, duplication requires particular care, so as to maintain a consistent state amongst all components. A distributed computation plays a key-role in preserving the consistency of the state in this case. Indeed, one of the fundamental forms of distributed computation is distributed agreement. In a distributed agreement problem, the components of the system, called processes, are supposed to all propose a value and eventually decide on a common value that depends on the proposed values. Designing correct distributed agreement algorithms relies on a mathematical model abstracting the details of the underlying physical model. In practice, there are many models, inspired from various types of physical failures. Coping with one type of failures rather than another one, for a distributed agreement algorithm, is not of equal difficulty. Indeed some types are intrinsically harder to cope with than other ones. The thesis studies how distributed algorithms can be transformed from one model to another one. The idea is to design distributed agreement algorithms in models in which the task is made as easiest and least error-prone as possible, and execute the transformed algorithms in models imposed by the underlying physical constraints. We investigate another aspect of transforming distributed computations, the notion of reducibility amongst distributed computations. Reducibility offers an elegant way for deducing lower or upper bounds on the complexity of a distributed algorithm from complexity bounds that have been proved elsewhere, about a different distributed computation or in a different model. For investigating reducibility, the thesis considers two very general classes of models: synchronous models, in which the communication delay and the process execution speed are known and bounded, and asynchronous models, in which the communication delay and the process execution speed are unknown and potentially unbounded. The thesis investigates reducibility of a distributed agreement problem called set agreement. Precisely, we consider the early-deciding variant of set agreement, in which processes must decide as fast as they can in any execution. We prove a lower bound on the complexity of early-deciding set agreement in the synchronous model. The proof works by reducing the impossibility of set agreement in the asynchronous model to the desired lower bound. In some cases, complexity bounds must still be derived with ad-hoc arguments, for example when there is no apparent connection with existing results. The thesis presents an alternative approach to reducibility, namely the connection of distributed computing with algebraic topology. Within this framework, we derive a lower bound for the complexity of early-deciding set agreement in the synchronous model. For all the lower bounds that we derive in the thesis, we show that these bounds are tight, by presenting for each lower bound an algorithm matching the complexity of the lower bound.

Naama Ben David, Rachid Guerraoui, Athanasios Xygkis, Mihail Igor Zablotchi

We consider the problem of making apps fault-tolerant through replication, when apps operate at the microsecond scale, as in finance, embedded computing, and microservices apps. These apps need a replication scheme that also operates at the microsecond scale, otherwise replication becomes a burden. We propose Mu, a system that takes less than 1.3 microseconds to replicate a (small) request in memory, and less than a millisecond to fail-over the system-this cuts the replication and fail-over latencies of the prior systems by at least 61% and 90%. Mu implements bona fide state machine replication/consensus (SMR) with strong consistency for a generic app, but it really shines on microsecond apps, where even the smallest overhead is significant. To provide this performance, Mu introduces a new SMR protocol that carefully leverages RDMA. Roughly, in Mu a leader replicates a request by simply writing it directly to the log of other replicas using RDMA, without any additional communication. Doing so, however, introduces the challenge of handling concurrent leaders, changing leaders, garbage collecting the logs, and more-challenges that we address in this paper through a judicious combination of RDMA permissions and distributed algorithmic design. We implemented Mu and used it to replicate several systems: a financial exchange app called Liquibook, Redis, Memcached, and HERD [33]. Our evaluation shows that Mu incurs a small replication latency, in some cases being the only viable replication system that incurs an acceptable overhead.