**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Personne# David Kozhaya

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Unités associées

Chargement

Cours enseignés par cette personne

Aucun résultat

Domaines de recherche associés

Chargement

Publications associées (8)

Chargement

Chargement

Chargement

Personnes menant des recherches similaires (85)

Domaines de recherche associés (3)

In a distributed computing system, a failure detector is a computer application or a subsystem that is responsible for the detection of node failures or crashes. Failure detectors were first introduce

La communication est l'ensemble des interactions avec un tiers humain ou animal qui véhiculent une ou plusieurs informations. En dehors de la communication animale, on distingue chez l'être humain,

En mathématiques et plus particulièrement en théorie des probabilités, une suite d'évènements, indexée par les entiers naturels, se réalise avec grande probabilité si la probabilité que le n-ième év

Unités associées (4)

The celebrated distributed computing approach for building systems and services using multiple machines continues to expand to new domains. Computation devices nowadays have additional sensing and communication capabilities, while becoming, at the same time, cheaper, faster and more pervasive. Consequently, areas like industrial control, smart grids and sensor networks are increasingly using such devices to control and coordinate system operations. However, compared to classic distributed systems, such real-world physical systems have different needs, e.g., real-time and energy efficiency requirements. Moreover, constraints that govern communication are also different. Networks become susceptible to inevitable random losses, especially when utilizing wireless and power line communication. This thesis investigates how to build various fundamental distributed computing abstractions (services) given the limitations, the performance and the application requirements and constraints of real-world control, smart grid and sensor systems. In quest of completeness, we discuss four distributed abstractions starting from the level of network links all the way up to the application level. At the link level, we show how to build an energy-efficient reliable communication service. This is especially important for devices with battery-powered wireless adapters where recharging might be unfeasible. We establish transmission policies that can be used by processes to decide when to transmit over the network in order to avoid losses and minimize re-transmissions. These policies allow messages to be reliably transmitted with minimum transmission energy. One level higher than links is failure detection, a software abstraction that relies on communication for identifying process crashes. We prove impossibility results concerning implementing classic eventual failure detectors in networks with probabilistic losses. We define a new implementable type of failure detectors, which preserves modularity. This means that existing deterministic algorithms using eventual failure detectors can still be used to solve certain distributed problems in lossy networks: we simply replace the existing failure detector with the one we define. Using failure detectors, processes might get information about failures at different times. However, to ensure dependability, environments such as distributed control systems (DCSs), require a membership service where processes agree about failures in real time. We prove that the necessary properties of this membership cannot be implemented deterministically, given probabilistic losses. We propose an algorithm that satisfies these properties, with high probability. We show analytically, as well as experimentally (within an industrial DCS), that our technique significantly enhances the DCS dependability, compared to classic membership services, at low additional cost. Finally, we investigate a real-time shared memory abstraction, which vastly simplifies programming control applications. We study the feasibility of implementing such an abstraction within DCSs, showing the impossibility of this task using traditional algorithms that are built on top of existing software blocks like failure detectors. We propose an approach that circumvents this impossibility by attaching information to the failure detection messages, analyze the performance of our technique and showcase ways of adapting it to various application needs and workloads.

Rachid Guerraoui, David Kozhaya

The failure detector approach for solving distributed computing problems has been celebrated for its modularity. This approach allows the construction of algorithms using abstract failure detection mechanisms, defined by axiomatic properties, as building blocks. The minimal synchrony assumptions on communication, which enable to implement the failure detection mechanism, are studied separately. Such synchrony assumptions are typically expressed as eventual guarantees that need to hold, after some point in time, forever and deterministically. But in practice, they never do. Synchrony assumptions may hold only probabilistically and temporarily. In this paper, we study failure detectors in a realistic distributed system N, with asynchrony inflicted by probabilistic synchronous communication. We address the following paradox: an implementation of "consensus with probability 1" is possible in N without using randomness in the algorithm itself, while an implementation of "lozenge S with probability 1" is impossible to achieve in N (lozenge S being the weakest failure detector to solve the consensus problem and many equivalent problems). We circumvent this paradox by introducing a new failure detector lozenge S*, a variant of lozenge S with probabilistic and temporal accuracy. We prove that lozenge S* is implementable in N and we provide an optimal lozenge S* algorithm. Interestingly, we show that lozenge S* can replace lozenge S, in several existing deterministic consensus algorithms using lozenge S, to yield an algorithm that solves "consensus with probability 1". In fact, we show that such result holds for all decisive problems (not only consensus) and also for failure detector lozenge P (not only lozenge S). The resulting algorithms combine the modularity of distributed computing practices with the practicality of networking ones.

Rachid Guerraoui, David Kozhaya, Yvonne Anne Pignolet-Oswald

The failure detector approach for solving distributed computing problems has been celebrated for its modularity. This approach allows the construction of algorithms using abstract failure detection mechanisms, defined by axiomatic properties, as building blocks. The minimal synchrony assumptions on communication, which enable to implement the failure detection mechanism, are studied separately. Such synchrony assumptions are typically expressed as eventual guarantees that need to hold, after some point in time, forever and deterministically. But in practice, they never do. Synchrony assumptions may hold only probabilistically and temporarily. In this paper, we study failure detectors in a realistic distributed system N, with asynchrony inflicted by probabilistic synchronous communication. We address the following paradox about the weakest failure detector to solve the consensus problem (and many equivalent problems), i.e., S: an implementation of “consensus with probability 1” is possible in N without using randomness in the algorithm itself, while an implementation of “S with probability 1” is impossible to achieve in N. We circumvent this paradox by introducing a new failure detector S*, a variant of S with probabilistic and temporal accuracy. We prove that S* is implementable in N and we provide an optimal S* implementation. Interestingly, we show that S* can replace S , in several existing deterministic consensus algorithms using S, to yield an algorithm that solves “consensus with probability 1”. In fact, we show that such result holds for all decisive problems (not only consensus) and also for failure detector P (not only S). The resulting algorithms combine the modularity of distributed computing practices with the practicality of networking ones.