**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Self-stabilization

Summary

Self-stabilization is a concept of fault-tolerance in distributed systems. Given any initial state, a self-stabilizing distributed system will end up in a correct state in a finite number of execution steps.
At first glance, the guarantee of self stabilization may seem less promising than that of the more traditional fault-tolerance of algorithms, that aim to guarantee that the system always remains in a correct state under certain kinds of state transitions. However, that traditional fault tolerance cannot always be achieved. For example, it cannot be achieved when the system is started in an incorrect state or is corrupted by an intruder. Moreover, because of their complexity, it is very hard to debug and to analyze distributed systems. Hence, it is very hard to prevent a distributed system from reaching an incorrect state. Indeed, some forms of self-stabilization are incorporated into many modern computer and telecommunications networks, since it gives them the ability to cope with faults that were not foreseen in the design of the algorithm.
Many years after the seminal paper of Edsger Dijkstra in 1974, this concept remains important as it presents an important foundation for self-managing computer systems and fault-tolerant systems. As a result, Dijkstra's paper received the 2002 ACM PODC Influential-Paper Award, one of the highest recognitions in the distributed computing community.
Moreover, after Dijkstra's death, the award was renamed and is now called the Dijkstra Award.
E.W. Dijkstra in 1974 presented the concept of self-stabilization, prompting further research in this area. His demonstration involved the presentation of self-stabilizing mutual exclusion algorithms. It also showed the first self-stabilizing algorithms that did not rely on strong assumptions on the system. Some previous protocols used in practice did actually stabilize, but only assuming the existence of a clock that was global to the system, and assuming a known upper bound on the duration of each system transition.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (7)

Related people (2)

Related units (1)

Rachid Guerraoui, Alexandre David Olivier Maurer

Shoals of small fishes can change their collective shape and form a specific pattern. They do so efficiently (in parallel) and without collision. In this paper, we study the analog problem of distributed pattern formation. A set of processes needs to move from a set of initial positions to a set of final positions. The processes are oblivious (no internal memory) and must preserve, at any time, a minimal distance between them. A naive solution would be to move the processes one by one, but this would take too long. The difficulty here is to move the processes simultaneously in clearly delimited phases, no matter how unfavorable the initial configuration may be. We solve this by treating the problem ``dimension by dimension'': the processes first form 1D trails, then gather into a 2D shape (this technique can be generalized to higher dimensions). We present an optimal algorithm which time complexity depends linearly on the radius of the smallest circle containing both initial and final positions. The algorithm is self-stabilizing, as the processes are oblivious and the initial positions are arbitrary.

2016Alexandre David Olivier Maurer

When a shoal of small fishes notices a predator, they typically change their collective shape and form a specific pattern. They do so efficiently (in parallel) and without collision. This motivates us to design algorithms with similar properties. In this paper, we study the problem of distributed pattern formation. A set of processes needs to move from a set of initial positions to a set of final positions. The processes are oblivious (no internal memory) and must preserve, at any time, a minimal distance between them. A naive solution would be to move the processes one by one, but this would have a poor time complexity. The difficulty here is to move the processes simultaneously in clearly delimited phases (as there is no internal memory), no matter how unfavorable the initial configuration may be. We solve this by treating the problem ``dimension by dimension'': the processes first form 1D trails, then gather into a 2D shape (this technique can be generalized to higher dimensions). We present an optimal algorithm which time complexity depends linearly on the radius of the smallest circle containing both initial and final positions. The algorithm is self-stabilizing, as the processes are oblivious and the initial positions are arbitrary.

Rachid Guerraoui, El Mahdi El Mhamdi, Alexandre David Olivier Maurer, Vladislav Tempez

Problems of pattern formation have been extensively studied in distributed computing. One of this problems is the gathering problem: agents must gather at a same position in a distributed manner. When gathering is not possible, a close problem is the convergence problem. In this article, we investigate the two following questions: (1) Can processes gather when each process cannot see more that one other process at the same time? (2) Can a gathering behavior be learned by processes? Regarding the first point, we introduce a new model with an extremely restricted visibility: each process can only see one other process (its closest neighbor). Our goal is to see if (and to what extent) the gathering and convergence problems can be solved in this setting. We first show that, surprisingly, the problem can be solved for a small number of processes (at most 5), but not beyond. This is due to indeterminacy in the case where there are several "closest neighbors" for a same process. By removing this indeterminacy with an additional hypothesis (choosing the closest neighbor according to an order on the positions of processes), we then show that the problem can be solved for any number of processes. We also show that up to one crash failure can be tolerated for the convergence problem. Regarding the second point, we present the first experimental evidence that a gathering behavior can be learned without explicit communication in a partially observable environment. The learned behavior has the same properties as a self-stabilizing distributed algorithm, as processes can gather from any initial state (and thus tolerate any transient failure). Besides, we show that it is possible to scale and then tolerate the brutal loss of up to 90% of agents without significant impact on the behavior.

2019