D2.4 Etude d`une approche de type “forward recovery” pour l

Transcription

D2.4 Etude d`une approche de type “forward recovery” pour l
D2.4 Etude d’une approche de type
“forward recovery” pour l’infrastructure
de gestion du Runtime Petascale.
VERSION
DATE
EDITORIAL MANAGER
AUTHORS STAFF
1.0
2010
Sylvain Peyronnet
Swan Dubois, Thomas Hérault, Toshimitsu Masuzawa, Olivier Pérès, Sylvain
Peyronnet et Sébastien Tixeuil.
Copyright
ANR SPADES. 08-ANR-SEGI-025.
D2.4
Contents
1 Préambule
2 Scalable Overlay for Address-based Networks with
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . .
2.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Pack algorithm . . . . . . . . . . . . . . . . .
2.2.2 List algorithm . . . . . . . . . . . . . . . . .
2.2.3 Ranking algorithm . . . . . . . . . . . . . . .
2.2.4 Routing Algorithm . . . . . . . . . . . . . . .
2.2.5 Convergence time of the global algorithm . .
2.3 Related Works . . . . . . . . . . . . . . . . . . . . .
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . .
4
Resource Discovery
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6
6
7
8
13
16
18
18
20
20
3 Stabilizing Locally Maximizable Tasks in Unidirectional Networks is Hard
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Impossibility Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Possibility Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Deterministic solution with identifiers . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.2 Probabilistic solution with unbounded memory in asynchronous anonymous networks
3.4.3 Probabilistic solution with bounded memory in synchronous anonymous networks
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
22
23
24
27
27
29
31
35
4 The
4.1
4.2
4.3
4.4
36
36
37
38
39
42
43
47
4.5
Impact of Topology on Byzantine Containment in Stabilization
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Distributed System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Self-Stabilizing Protocol Resilient to Byzantine Faults . . . . . . . . . .
Maximum Metric Tree Construction . . . . . . . . . . . . . . . . . . . .
4.4.1 Impossibility Result . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.2 Topology-Aware Strict Stabilizing Protocol . . . . . . . . . . . .
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ANR SPADES. 08-ANR-SEGI-025
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Page 3
D2.4
Chapter 1
Préambule
Ce document présente les travaux réalisés dans le cadre de la tâche 2.4 du projet SPADES. Il s’agit de
présenter des travaux afférents à la thématique d’un approche de type “forward recovery” pour la mise
en place d’une infrastructure de gestion d’un runtime petascale.
Parmi les approches “forward recovery”, l’auto-stabilisation occupe une place à part en raison de sa
simplicité apparente. En effet, une approche auto-stabilisante est par nature robuste aux défaillances et
attaques transitoires, sans qu’il ne semble y avoir de mécanisme ad-hoc mis en jeu dans son comportement.
Cependant, garantir la compatibilité d’une telle approche avec une architecture à l’échelle du petascale n’est pas simple. Ainsi l’impact d’une défaillance peut être très fort (retentissement sur l’ensemble
du système) et le temps de retour à un état légitime peut être en pratique trop long (même si il est
toujours théoriquement borné).
Dans tous les travaux de ce livrable, on trouve l’hypothèse sous-jacente que les défaillances sont
à priori décorrelées (pas de phénomènes épidémiques). Cela permet ainsi de s’affranchir du premier
problème mentionné ci-dessus.
Les quatres chapitres suivants correspondent à plusieurs articles réalisés par les participants au projet
SPADES au sein de l’équipe-projet Grand-Large à l’INRIA Saclay-Île-de-France et leurs collaborateurs.
Voici une description rapide de ces chapitres.
Chapitre 2. Il correspond à l’article (en cours de soumission) suivant : Scalable Overlay for Addressbased Networks with Resource Discovery par Olivier Pérès, Thomas Hérault et Sylvain Peyronnet.
Dans ce chapitre nous présentons un algorithme qui construit une structure similaire à un arbre
couvrant équilibré. Cela permet de garantir que si le système contient n processus, alors la distance
entre la racine et n’importe quelle feuille est au plus de dlog ne.
L’algorithme est construit par composition d’algorithmes auto-stabilisants (et l’est également lui
aussi [23]). Le premier algorithme utilisé regroupe les processus en groupes de taille bien choisie, puis
un second algorithme met en place un chaînage entre les groupes. Une fois cette structure globale
mise en place, un troisième algorithme va distribuer des identifiants (en ordre croissant) aux processus
qui constituent le système. A l’aide de cette structure un algorithme additionnel permet de router les
messages efficacement.
L’algorithme global construit donc exactement le type de structure qui nous intéresse dans le cadre
de systèmes à grande échelle : routage simple, robuste et rapide, nommage uniques des processus.
Chapitre 3. Il correspond à l’article (publié dans les actes de ICDCS 2010) suivant : Stabilizing Locally
Maximizable Tasks in Unidirectional Networks is Hard par Toshimitsu Masuzawa et Sébastien Tixeuil
[46].
Ce chapitre présente des avancées sur le problème de la construction auto-stabilisante de tâches
localement maximisables (comme par exemple la construction d’un ensemble indépendant maximum)
dans des réseaux unidirectionnels à topologie quelconques. Nous présentons tout d’abord des résultats
négatifs montrant l’impossibilité de la construction d’algorithmes au-stabilisants deterministe pour ce
ANR SPADES. 08-ANR-SEGI-025
Page 4
D2.4
problème dans un modèle très général. Puis nous présentons des algorithmes fonctionnels dans des
modèles sujets à des hypothèses plus restrictives.
Chapitre 4. Il correspond à l’article (publié dans les actes de DISC 2010) suivant : The Impact of
Topology on Byzantine Containment in Stabilization par Swan Dubois, Toshimitsu Masuzawa et Sébastien
Tixeuil [28].
La tolérance aux pannes bizantines est une caractéristique souhaitée des systèmes distribués
puisqu’elle permet de tolérer des comportements malicieux au sein du système (typiquement une corruption de mémoire). Ce chapitre aborde le problème de la construction d’arbres maximisant certaines
métriques (sous entendue comme maximisant pour chaque noeud du système la valeur de la métrique
respectivement à une relation d’ordre pre-établie). Le problème est réputé difficile. Nous montrons
tout d’abord un résultat d’impossibilité sur la contention des pannes bizantines dans un contexte autostabilisant pour ce problème. Nous présentons ensuite un contexte plus favorable qui permet de résoudre
une version plus faible de ce problème. D’autres travaux ont été réalisés dans un contexte similaire, ils
ont donnés lieu à publication [30] mais ne sont pas présentés dans ce livrable.
08-ANR-SEGI-025
Page 5
D2.4
Chapter 2
Scalable Overlay for Address-based
Networks with Resource Discovery
2.1
Introduction
Many systems, like peer-to-peer file sharing systems [1, 51] or runtime environments of parallel systems
[11, 12, 13, 9], rely on a resilient communication infrastructure to provide their service. This communication infrastructure is built on top of an existing network. In this chapter, we consider the prevalent
case of Address-Based networks. These are networks where each process possesses a unique address and
can communicate with any other process whose address is known. Addresses can be transmitted in messages, enabling processes to discover other processes and establish new communications. In this model,
opening a communication between two processes, and keeping this connection alive, is a major part of
the resources used by the processes. This model encompasses a realistic deployment of an application
over the Internet: any process can communicate with any other, as soon as it knows its IP address and
port, which can be communicated using existing connections. Connections consume significant resources
in kernel memory and processing, and induce some communication costs to establish and maintain.
The communication infrastructure that is built is an overlay network on the underlying, potentially
fully connected, network. The topology used to build this network has a significant impact on the
scalability of the infrastructure. The diameter of the overlay network must be small enough to guarantee
a small latency of communication from any point to any other; but at the same time, the number of
resources used by each process must also be constrained, to spare system resources for the application.
These two goals are contradictory, since increasing the number of connections reduces the diameter of
the network. In this work, we choose a simple tradeoff to bound the number of resources used as well as
the diameter of the system by log(n), where n is the number of processes in the system.
Another fundamental function of the communication infrastructure is to abstract out the system
enough to simplify the communications: the communication infrastructure provides names to processes,
and routing abilities to send a message from any process to any other, using the communication channels
of the overlay network. The last two algorithms that we present in this chapter use the overlay network
that is built to give each process a rank, i.e. a unique integer between 0 and n − 1, and provide efficient
routing using these ranks.
A major property of the communication infrastructure is that it must be reliable, even in case of
unexpected failures. All the algorithms that we present here are self-stabilizing, which means that they
will converge to an appropriate behavior starting from any configuration. As a consequence, if the
system is subject to any arbitrary transient failure (messages loss or replication, process crash, memory
corruption, anything that does not modify the code of the processes and only has an effect limited in
duration), after a convergence time, the system will rebuild a correct overlay network and route messages
on it as expected. This property makes these algorithms suitable for use in highly volatile systems, where
it is hard to predict the possible failure scenarios, such as the Internet.
ANR SPADES. 08-ANR-SEGI-025
Page 6
D2.4
The first algorithm that we present build packs of processes organized along complete binomial trees.
This algorithm uses a constant t = log D such that D is an upper bound on the number of processes in
the system. A value of t higher than necessary does not slow down the convergence of the algorithm.
Moreover, in a system where the domain I of process identifiers is known, if |I| is the number of bits of
I, then each process knows that D ≤ 2|I| : a tighter estimate of D is therefore not necessary. Using fair
composition of self-stabilizing algorithms [23], the second algorithm links the packs together, building a
single tree with the desired properties. The ranking per se operates on top of the global structure, for a
total of three composed algorithms. We present in the end a routing algorithm that allows the processes
to communicate efficiently with each other on the topology built by the other three algorithms.
As compared to our previous spanning tree algorithm [41], this one builds a less constrained topology.
Many processes are eligible to become root, its children may not have the immediately smaller identifiers,
etc. It also converges faster: Θ(nB), as opposed to Θ(n(nB)) for the previous algorithm.
2.2
Algorithms
In this section, we present in details the algorithms we introduced in the previous section.
As mentioned earlier, the top level algorithms
(ranking and routing) rely on the fair composiConstants:
tion of three composed algorithms that perform a
t : N {The upper bound on log n}
global task together.
my_id: I {The process unique identifier}
We express our algorithms in an asynchronous
Variables: neighbor[0..t] : I ∪ {⊥}
distributed algorithm model [41] where message
Definitions:
passing is used to communicate between processes.
active(0) ≡ true
It has two additional abstractions to cope with the
active(i) ≡ active(i − 1)∧
address-based concept of the underlying network:
neighbor[i − 1] 6= ⊥, ∀i ∈ J1, tK
an oracle that only gives a weak knowledge of the
level ≡ max{i | active(i)}
system, and a failure detector. When queried, the
leader ≡ (level = 0) ∨ neighbor[level − 1] < my_id
oracle gives one process identifier, which can be
the identifier of a valid process, or not. We assume
Figure 2.1: Pack Algorithm Constants, Varithat the oracle is weakly fair: if queried infinitely,
ables and Definitions
it will give all process identifiers infinitely. The
failure detector is necessary because of the lack of
synchronicity in the communications, and is an eventually perfect failure detector. This failure detector
is represented in the algorithms by a local function that any process p can call: S(q) is true if and only
if q is suspected by p to have failed at the time of the call.
The first algorithm builds a forest of binomial trees. The first step is to pair processes together:
basically, each process queries its oracle, looking for a neighbor, until it finds one that has no partner.
Since the algorithm is self-stabilizing, a mechanism takes care of the cases where pairs are not wellformed by exchanging keep-alive messages and negative acknowledges. The case of an initialization
with the identifier of a non-active process is handled using the failure detector. As a result, the system
eventually consists of a set of pairs. If there is an odd number of processes, one of them remains unpaired.
Now, the same mechanism that worked with individual processes can be applied to the pairs themselves, considering each of them has a leader that executes the algorithm. Pair of pairs are grouped
together, forming packs of four processes. Applying the same principle as long as larger groups can be
made results in a set of packs where no two packs have the same size. P
To describe the system after
log n
convergence, let us consider the number of processes, n, in base 2: n = j=0 aj .2j , with aj ∈ {0, 1}.
For all j ∈ J0; log nK, there exists a pack of processes of size 2j if and only if aj = 1. Any process in the
system is part of a pack, and there are no two packs of the same size (they would fuse if they existed).
In the general case, the topology is not connected when this first algorithm has converged. A second
algorithm then builds a doubly-linked list connecting the pack leaders together, which yields a single
spanning tree. Pack leaders are detected in the same way, using the oracle to discover processes.
08-ANR-SEGI-025
Page 7
D2.4
Then, there remains to give each process a unique identifier which is a number in J0, n − 1K. A
third algorithm, which assumes that the other two have converged, is responsible for allocating these
identifiers. This algorithm uses a simple weight propagating protocol, from the leaves up to the root, to
compute the weight (expressed as the number of processes) of each branches of the tree. Ranking is then
done by propagating name-assigning tokens along the spanning tree.
Lastly, it is possible to ensure an efficient routing using this topology. Each process can then send
a message to another process, knowing only its rank. The fourth algorithm solves this problem while
guaranteeing a maximum number of 2dlog ne hops.
We now present these four algorithms, along with their proof of self-stabilization and convergence
time.
2.2.1
Pack algorithm
The goal of this algorithm is to build groups
of processes, called packs, whose cardinality is a
power of two, and to elect a leader in each pack.
Each process has a vector of neighbors holding up
to t identifiers of a neighbor process or the special
value ⊥ that denotes no valid identifier and thus
the absence of a neighbor. If the vector of some
process p at index i holds a valid identifier q, we
say that the neighbor of p at level i is q.
A process can be active at some level or not.
All processes are active at level 0. Then, a process
is active at level i iff it is active at level i − 1 and
it has a neighbor at level i − 1. Being active at
level i means, for a process, that it is looking for
a neighbor at this level, or has an active neighbor
at this level. Using the active function, we can
define the level of a process: it is the highest level
at which the process is active. A level of i denotes
that the process has i − 1 neighbors.
A process that is active at its level l and that
has an identifier greater than the identifier of its
neighbor at level l − 1 is the leader of its pack (any
process of level 0 is also a leader). This process
continuously prospects to find a l + 1th neighbor
to increase the size of the pack. As a consequence,
the number of neighbors varies among processes:
processes at a high level will have more neighbors
than processes at a lower level. For a single pack,
this builds a binomial tree.
As an example of execution, each process p of
level 0 first uses its oracle to look for a neighbor q
that is also at level 0. When a neighbor is found,
which yields a pack of two processes, the process
that has the highest identifier becomes leader and
begins looking for a neighbor at level 1. This has
to be the leader of a pack of two processes. The
result is a graph defined recursively: a pack(0) is
a pair of processes, a pack(k, k > 0)) is a pair of
packs(k − 1).
Rules:
Rule Cleanup:
true −→
for all i ∈ J0, tK do
if
neighbor[i] 6= ⊥ ∧ (S(neighbor[i]) ∨
¬active(i) ∨ neighbor[i] = my_id) then
neighbor[i] ← ⊥
end if
end for
Rule Link Maintenance:
∃i ∈ J0, tKactive[i] ∧ neighbor[i] 6= ⊥ −→
send Hello(i) to neighbor[i]
Rule Prospection:
leader −→
v =getPeer()
if v >my_id then
send Exists(level) to v
end if
Rule Reaction to Exists:
reception of Exists(j) sent by v −→
if leader ∧level = j then
neighbor[j] ← v
end if
Rule Reaction to Hello:
reception of Hello(j) sent by v −→
if neighbor[j] = ⊥ ∨ v > neighbor[j] then
neighbor[j] ← v
else if neighbor[j] 6= v∨ =
6 active(j) then
send Goodbye(j) to v
end if
Rule Reaction to Goodbye:
reception of Goodbye(j) sent by v −→
if neighbor[j] = v then
neighbor[j] ← ⊥
end if
Figure 2.2: Pack Algorithm
08-ANR-SEGI-025
Rules
Page 8
D2.4
3
6
7
4
2
3
3
2
1
7
5
3
7
1
5
6
7
2
0
1
3
1
1
2
3
1
1
0
4
5
y
x
a
b
c
Node x is active at level y
neigh_x [0]=a
neigh_x [1]=b
neigh_x [2]=c
Figure 2.3: Structure of an 8-process pack
Figure 2.3 shows the structure of a pack. The
eight processes, identified 0 to 7, are represented
by circles. Each contains a vector of t = 3 neighbors, ordered from left to right. Colors are also used to
define the index of a neighbor: black is used for the index 0, green for index 1 and red for index 2. The
number outside the circle represents the level of each process, while the colors of the links express which
index in the neighbor vector define this link. Process 7 being the leader of this pack, it is represented
with a bold circle.
The tree that represents a pack(k) has a subtree of size 2i for each i ∈ J0, log kK. The longest distance,
in number of hops, between the root of a pack(k) and any process in the pack is thus log k.
To build the pack, and recover from potential failures, the protocol uses three messages types:
• Each process spontaneously sends Hello messages to each of its neighbors to allow them to check
that the links between the processes are symmetrical;
• When a process receives a Hello message, it can come from a neighbor at the same level (which
is correct and ignored), or from any other process. If it comes from any other process, it means
that the sender is incorrectly initialized, so it breaks the link by sending a Goodbye message to
the corresponding neighbor. A process receiving a Goodbye message removes the corresponding
process identifier from the neighbor vector.
• The processes looking for a neighbor send an Exists message to a process given by the oracle.
The Exists message hold the level of prospection and the sender’s identifier, and if the level of
prospection matches the level of the receiver, and the proposition of pairing is more advantageous
for the receiver (it replaces a ⊥ neighbor, thus increasing the level of activity of the receiver, or
comes from a neighbor with a higher identity, thus removing the burden of prospection on the
receiver), the receiver accepts the emitter as a neighbor at this level.
The formal version of the algorithm is given in algorithm 2.2.
Since the algorithm permanently tries to fuse packs, each pack eventually reaches its maximum size
in the system. For example, in a system comprising 18 processes ((18)10 = (10010)2 ), there is a pack of
16 processes and a pack of 2 processes. At this stage, in the general case, the topology is not connected
since the packs are not linked to one another.
08-ANR-SEGI-025
Page 9
D2.4
Proof of self-stabilization
To prove the self-stabilizing property of the Pack algorithm, we first define the set of legitmate configurations. To do so, we use the concept of stable processes, defined below.
Definition 1 (stable) Let p and q be two processes.A system σ is stable at level l if and only if the
following properties hold for all m ∈ J0, lK:
• if active(m)(p) ∧ active(m)(q) ∧ neighbor[m](p) = q then neighbor[m](q) = p.
• there are at most 2m+1 − 1 processes (pi ) s.t. ∀i, active(m)(pi ) and neighbor[m](pi ) = ⊥.
• if leader(p) ∧ leader(q) ∧ level(p) = level(q) then p = q.
• if Exists(m) ∈ cq→p then level(p) 6= m or ¬leader(p).
• Hello(m) ∈ cp→q ⇒ neighbor[m](p) = q.
• Goodbye(m) 6∈ cp→q .
Let p be a process of a system σ. If the pack leader of p has level l and σ is stable at level l, then p
is stable.
Definition 2 (Lp ) A system is in the set Lp of legitimate configurations if and only if it is stable at
level dlog ne.
Theorem 1 The pack algorithm is self-stabilizing to Lp .
Proof 1 This proof is divided into three parts: correction (Lemma 1), closure (Lemma 2) and convergence (Lemma 3).
Lemma 1 (correctness) Let σ be a system with n processes in a legitimate configuration. For any i,
there is one pack of size 2i if and only if n.[i], the ith binary digit of n, is 1.
Proof 2 For all i such that n.[i] = 1, we show that there exists a pack of size 2i , then we show that this
pack is unique.
First notice that the first point of the definition of stable indicates that the neighbor relationship is
reflexive.
Then, the second point of this definition implies that for any pack of size m, there cannot be m
processes without a neighbor at its level. That is, if the number of processes is sufficient to pair another
block with it, the processes are already in other blocks.
Lastly, because of the third point of this definition, there can be no two blocks of the same size.
Definition 3 (paired) A process p is paired at level m iff p is stable at level m − 1 and there is a
process q, stable at level m − 1, s.t. neighbor[m](p) = q.
Lemma 2 Lp is closed under the execution of the algorithm.
Proof 3 This is a consequence of the fact that a system stable at level l remains so throughout any
execution, which we now prove.
Here are the possible transitions. None of them can change neighbor[k](q) for a process q stable at
level k ≤ l.
• Cleanup: no process is suspect because the failure detectors have converged, no process is its own
neighbor in the initial configuration by definition of stable(l), and the possible correction affecting
an inactive process does not make the configuration illegitimate since the conditions only concern
active processes.
• Link maintenance and Prospecting: the only messages that can be sent are Hello to a neighbor,
which obeys the rule on Hello messages, and Exists to an already paired process, which verifies the
rule on Exists messages.
08-ANR-SEGI-025
Page 10
D2.4
• Reaction to Exists: by definition of stable, an Exists(m) message can only be received by a process
p s.t. neighbor[m](p) 6= ⊥, thus p does nothing.
• Reaction to Hello: by definition of stable, a Hello(m) message can only be sent by p to q s.t.
neighbor[m](p) = q, thus q does nothing.
• Reaction to Goodbye: by definition of stable, there is no such message in the channels linking stable
processes together.
Lemma 3 (convergence) The pack algorithm converges to Lp from any configuration.
Proof 4 It is enough to prove that any system σ stable at level l − 1 and unstable at level l ≥ 0, or
unstable at level l = 0, eventually becomes stable at level l.
First notice that the execution of the sanity checking rule eliminates the cases of suspect neighbors and
self-connections (m, p s.t. neighbor[m](p) = p). Since we suppose that the failure detectors are stabilized
at this point, we disregard crashed processes. Similarly, at no point in the algorithm is it possible for
a process to connect to itself. Also, all the messages present in the initial configuration are consumed
and all the processes have executed their sanity checking rule. Finally, the fact that σ is stable at level
l − 1 means that none of the values of active, level or leader can change in σ. This is because they only
depend on active itself and neighbor[m](p) for p stable at level m, and this cannot change.
Let z be the highest non-paired process at level l. Notice that no process can send Goodbye to z.
Thus, if z writes the identifier of a correct process p in its neighbor variable, then p and z become paired.
Suppose there is an execution of σ where no pair is formed at level l. Let p be a process distinct from z,
not paired at level l (p has to exist, or σ would be stable at level l). As part of its spontaneous prospection
rule, p sends out an infinite number of Exists messages and thus, because of the global condition on the
oracle, eventually sends Exists to z. Since z has ⊥ in its neighbor variable, it takes p as a neighbor:
contradiction.
Hence, eventually the number of process pairs at level l is maximal, which leaves at most 2m+1 − 1
unpaired processes.
Complexity of the Pack Algorithm
We evaluate the complexity of this algorithm using two significant measures: the convergence time and
the number of variables in I used.
Number of variables in I used. It is important to minimize this number since each variable
holding a different process identifier will request memory and processing time to establish, and maintain,
a communication channel with the destination.
With the vectorial notation used here, the algorithm needs at best O(log n) variables in I. However,
it is possible to implement this algorithm by dynamically allocating the memory. In this case, each
process only allocates a variable in I to record a non-⊥ value. The global leader still needs blog nc
variables for its neighbors, but the global memory usage is much lower.
Intuitively, since the topology is a tree, there are n − 1 edges, which means each process uses on
average two variables representing its neighbors. Formally, n processes have a neighbor at level 1, . . . , 2
processus have a neighbor at level blog nc. Thus, the average number of neighbors per process is:
Pk=blog nc
k=0
n
n
2k
k=blog nc
=
X
k=0
1
2k
∼
n7→+∞
2
Convergence time. Since the algorithm uses a resource discovery oracle, the convergence time of
the system depends on the time spent by the oracle to achieve its specification. We call this time B.
The self-stabilization proof allows to characterize the convergence of the system. As shown in
lemma 3, in any configuration in which convergence is not reached, the system is stable at a given
08-ANR-SEGI-025
Page 11
D2.4
level i − 1 and unstable at level i (or unstable at level 0). We thus calculate the maximum time the
system needs to go from unstable at level 0 to stable at level log n. As shown above, the system stabilizes
at worst level by level. At level 0, n processes participate in the stabilization of the system by looking
for a neighbor. At level i > 0, only the 2ni leaders look for a neighbor. When the system is stable at level
log n, convergence is achieved.
Lemma 4 Convergence is achieved, in the worst case, in Θ(nB) asynchronous rounds.
Proof 5 Let E be an un upper bound on the number of rounds necessary for two processes p and q to
become neighbors at a given level, once p has obtained the identifier of q.
We first prove (lemma 5 that the stabilization time of any level j is, in the worst case, Θ n(B+E)
.
j
2
The convergence time of the system is thus the sum of the convergence times of all the levels, i.e.
j=n
X
j=0
Θ
n(B + E)
2j
≈ Θ(B + E)
E is the number of asynchronous rounds necessary to send a message Exists and receive the answer
Hello, i.e. O(1). Therefore, the system converges in Θ(nB) asynchronous rounds.
Lemma 5 Consider a system stable at level j − 1 and unstable at level j, or unstable at level j = 0. In
the worst case, the system becomes stable at level j in Θ(B) asynchronous rounds.
Proof 6 Let P be the set of unstable processes at level j. Without loss of generality, we write P =
p1 , p2 , . . . , pi such that if a > b, then the identifier of pa is lower than that of pb . Since half the active
processes at a given level are active at the higher level, at worst,i = 2nj . By definition of unstable these
processes are leader at level j and do not have a neighbor at level j − 1, the guard of their prospection
rule is true. Thus, in B asynchronous rounds, all the processes in P obtain, by definition of the oracle,
all the identifiers in P .
The proof of convergence implies that at least one pair of processes is formed. We now show that it
is possible to form exactly one pair.
We build a first asynchronous round as follows : all the processes execute their prospection rule such
that for all i ∈ J1, nK, pi obtains the identifier of pi+1 (modn). Each process thus sends Exists to the
corresponding process.
In the second round, each process receives the Exists message that was sent to it during the previous
round and takes the sender as its neighbor at level j.
In the i − 1 subsequent asynchronous rounds, which are still in the first B rounds, each process
executes the following actions :
• it executes its prospection rule;
• it receives the identifier of a process p ∈ Π to which it has not yet sent an Exists message;
• it sends Exists to p.
All the Exists messages of these i − 1 asynchronous rounds are received and ignored, because each
process already has a neighbor.
During the next round, each process pi sends Hello to its neighbor pi+1 . Except in the case of pi , this
neighbor already has a neighbor with a higher identifier; it thus replies Goodbye.
In the last asynchronous round, upon reception of the Goodbye messages, each process replaces the
identifier of its neighbor at level i with ⊥. Finally, only one pair remains: pn−1 is paired with pn .
08-ANR-SEGI-025
Page 12
D2.4
Algorithm 2 List Algorithm
P3
⊥
Variables:
−∞
prev, next: I ∪ {⊥}
level, next level: !0, t" ∪ {−∞, +∞}
Algorithm 2 List Algorithm
• • •
• •prev
P1 P2 P3
P2 P3
⊥• P1 P2
⊥
Rules:
Variables:
• • • •
+∞ Cleanup3Prev:
−∞
5
2
3
Rule
prev, next: I ∪ {⊥}
S(prev) ∨ prev level ≤ level ∨¬ leader −→
prev
•
•
•
•
•
•
•
•
• level, next level: !0, t" ∪ {−∞, +∞}
(prev, prev level) ← (⊥, −∞)
Rules:
• • • •
• • • •
Rule Cleanup Prev:
Rule
P,4
• Cleanupt
• • • Next:
R
Q R,1
P
S(prev) ∨ prev level ≤ level ∨¬ leader −→
S(next)
∨
next
level
≥
level
∨¬
leader
−→
• • • •
2
8
16
Q,3 level)
(prev, prev
← (⊥, −∞)
Q,3
(next, next level) ← (⊥, +∞)
• • • •
Rule
Cleanupt
Next:
Rule
Link
maintenance:
• • • •
S(next) ∨ next level ≥ level ∨¬ leader −→
leader
• •−→
• •
Example of a list
(next, next level) ← (⊥, +∞)
send
to prev
• • ListHello(level)
• •
send ListHello(level) to next
Rule Link maintenance:
level 3), the last one is led by
leader −→
Rule Prospection:
Fig. 2. Example of a list
ocesses (level 2).
send ListHello(level) to prev
leader −→
send ListHello(level) to next
list building uses three mesv ← getPeer()
larly to the algorithm of pack
if ¬S(v)
then
contains
8 processes
(level 3), the last one is led by
Rule Prospection:
prospect continuously using
ListExists(level)
to v 2).
P3 andsend
contains
4 processes (level
leader −→
er other leaders with a level
end if
The
algorithm of list building uses three mesv ← getPeer()
Figure
2.4:
Exampleifof¬S(v)
a list
of the leader, but lower than
sages,
and works
to the
algorithm
of pack
then
Rule Reaction
to similarly
ListExists:
a successor, and other leaders
building:
will prospect
continuously
using
send ListExists(level) to v
receptionleaders
of ListExists(l)
sent by
v −→
an the level of the leader, but
theiforacle,
discover other leaders with a level
end if
leader tothen
ent one to be a predecessor.
higherif than
of the
leader,
levelthe
< llevel
< prev
level
thenbut lower than
Rule Reaction to ListExists:
sts is used to prospect,
while List
2.2.2
algorithm
the current
oneprev
to be
a successor,
(prev,
level)
← (v, l) and other leaders
reception of ListExists(l) sent by v −→
nd ListGoodBye are used to
with else
a level
than<the
the leader, but
if lower
next level
l <level
levelof then
leader
This
evaluate
definitions
read variablesiffrom
thethen
Pack algorithm, but cannot modify
and symmetry of the
prevalgorithm
/
highercan
than
the
one(v,tol)be and
a predecessor.
(next,
nextcurrent
level)
←
if level < l < prev level then
them. Being
with this
algorithm,
it allows
packs
together through a doubly linked
Thecomposed
message
is used
to prospect,
while to connect
end if ListExists
(prev, prev level) ← (v, l)
of this algorithm is list.
givenThe
in first
messages
are of
used
process
in theand
listListGoodBye
is the leader
theto highest-level
then
end
if ListHello
else if pack,
next level
< all
l < the
levelother
then leaders follow
ensure
the coherency
and symmetry of the prev /
in decreasing
order.
(next, next level) ← (v, l)
Rule
Reaction
to ListHello:
ization: We define as follow
next
variables.
end if the identifiers (namely prev and next)
reception
of ListHello(l)
sent by v adds
−→ two variables holding
purpose,
the algorithm
configurations for the To
Listserve this
The
formal
version
of this algorithm is given in
end
if
if
v
=
prev
then
that point Algorithm
to the successor
2. ← l and predecessor of the process in the doubly-linked list of packs for leaders.
prev level
Rule
Reaction
to ListHello:
y): Let m be the number
of with
Together
these output
variablesWethat
define
the linked list, we store
the level of the predecessor and
Proof
define
as follow
else if ofv self-stabilization:
= next then
reception of ListHello(l) sent by v −→
pack algorithm. Iff thesuccessor,
l lowest
the set
oflevel
legitimate
configurations
thet,List
which
a number
between 0for
and
plus two special values: +∞ and −∞ to handle the
nextare
←l
if v = prev then
he following conditions,
they cases
algorithm,
l:
special
of the L
first
and last leaders.
else
prev level ← l
eady and the system is steady
Definition
4 (steady):
Let
of
send ListGoodBye
towhere
vm be the
2.4
gives
an
example
n =number
26. Processes
P if, Q,
are leaders of their packs, of
else
v =and
next Rthen
leaders and their packs Figure
are
packs
by the pack algorithm. Iff the l lowest
endformed
if 8 and
next level ← lis represented on the left of the
respective sizes
16,
2
processes.
The
couple
(next,
next_level)
pack leaders verify the following conditions, they
elsethe right of the process. P has no successor,
Rule Reaction
to(prev,
ListGoodBye:
process,
is isrepresented
on
er of the smallest pack,
then while
and the
theircouple
packs are
steadyprev_level)
and the system
steady
send ListGoodBye to v
reception of ListGoodBye sent by v −→
d next level(p) = −∞,
else
while R hasat no
predecessor.
predecessor
of Qare
(resp. successor
of R) is R (resp. is Q).
level
l. The other The
leaders
and their packs
end if
if
leader
then
ch that q is the leaderThe
of List
unsteady.
algorithm
uses
three messages, and works similarly to the Pack algorithm: leaders continuif
v
=
prev
then
Rule Reaction to ListGoodBye:
smaller than that of p and
• if
pprev
is the
the
thediscover
smallest other
pack, then
ously prospect
using
oracle,of to
leadersreception
with a level
higher than
← ⊥leader
of ListGoodBye
sentthe
by vlevel
−→ of the leader,
evel(q).
next(p)
=
⊥
and
next
level(p)
=
−∞,
else
end
if
but
lower
than
the
current
one
to
be
a
successor,
and
other
leaders
with
a
level
lower
than the level
if leader then
er of the largest pack, then
next(p)
= q then
such that q is the leader of
if
v
=
next
if
v
=
prev
then
of
the
leader,
but
higher
than
the
current
one
to
be
a
predecessor.
The
message
ListExists
is used to
d prev level(p) = +∞, else
thenext
largest
← ⊥pack smaller than that of p and
prev
←
⊥
prospect, whilenext
messages
ListHello
and
ListGoodBye
are
used
to
ensure
the
coherency
and
symmetry
of
level(p) = level(q).
end if
end if
• if variables.
p
is
the
leader
of
the
largest
pack,
then
the prev / next
end if
if v = next then
level(p) is= given
+∞, else
= of
⊥ this
and prev
The formalprev(p)
version
algorithm
in Algorithmnext
0. ← ⊥
end if
end if
P1
5
P2
P3
2
P2
3
Proof of self-stabilization
We now define the set of legitimate configurations for the List algorithm, Ll :
Definition 4 (steady) Let m be the number of packs formed by the pack algorithm. Iff the l lowest
pack leaders verify the following conditions, they and their packs are steady and the system is steady at
level l. The other leaders and their packs are unsteady.
08-ANR-SEGI-025
Page 13
D2.4
Algorithm 0 List Algorithm
Variables:
prev, next: I ∪ {⊥}
prev_level, next_level: J0, tK ∪ {−∞, +∞}
Rules:
Rule Cleanup Prev :
S(prev) ∨ prev_level ≤ level ∨¬ leader −→
(prev, prev_level) ← (⊥, −∞)
Rule Cleanupt Next:
S(next) ∨ next_level ≥ level ∨¬ leader −→
(next, next_level) ← (⊥, +∞)
Rule Link maintenance:
leader −→
send ListHello(level) to prev
send ListHello(level) to next
Rule Prospection:
leader −→
v ← getPeer()
if ¬S(v) then
send ListExists(level) to v
end if
Rule Reaction to ListExists:
reception of ListExists(l) sent by v −→
if leader then
if level < l < prev_level then
(prev, prev_level) ← (v, l)
else if next_level < l < level then
(next, next_level) ← (v, l)
end if
end if
Rule Reaction to ListHello:
reception of ListHello(l) sent by v −→
if v = prev then prev_level ← l
else if v = next then next_level ← l
else send ListGoodBye to v end if
Rule Reaction to ListGoodBye:
reception of ListGoodBye sent by v −→
if leader then
if v = prev then prev ← ⊥ end if
if v = next then next ← ⊥ end if
end if
08-ANR-SEGI-025
Page 14
D2.4
• if p is the leader of the smallest pack, then next(p) = ⊥ and next_level(p) = −∞, else next(p) = q
such that q is the leader of the largest pack smaller than that of p and next_level(p) = level(q).
• if p is the leader of the largest pack, then prev(p) = ⊥ and prev_level(p) = +∞, else prev(p) = q such
that q is the leader of the smallest pack larger than that of p and prev_level(p) = level(q).
• in all other cases, prev_level(p) = level(prev(p)) and next_level(p)) = level(next(p)).
• no channel contains a ListGoodBye message.
• if the channel cp→q contains a message ListHello(l), then prev(p) = q or next(p) = q.
Definition 5 (Ll ) A configuration that is steady at level m and in which the leader p of the largest pack
is such that prev(p) = ⊥ and prev_level(p) = t is legitimate. The set of such configurations is called
Ll .
Theorem 2 The list algorithm is self-stabilizing to Ll .
Proof 7 This proof is divided into three parts: correction (Lemma 6), closure (Lemma 7) and convergence (Lemma 8).
Lemma 6 (correction) In any legitimate configuration, the doubly-linked list of the list algorithm includes the leaders of all the packs, from the largest down to the smallest.
Proof 8 For this algorithm, the definition of a legitimate configuration is the same as the one of a
correct configuration.
Lemma 7 (closure) The set Ll is closed under the execution of the list algorithm.
Proof 9 None of the possible transitions in a legitimate configuration yields to an illegitimate configuration.
• Cleanup (next and prev): all the conditions are false by definition of Ll .
• Link maintenance: the messages sent verify the condition on ListHello messages.
• Prospection: sending a ListeExists message cannot make the configuration illegitimate.
• Reaction to ListHello: the transition has no effect.
• Reaction to ListExists: the transition has no effect.
• Reaction to ListGoodBye: there is no such message in a legitimate configuration.
Lemma 8 The list algorithm converges to Ll from any configuration.
Proof 10 We consider that the failure detectors are stabilized, all initial messages are consumed and the
pack algorithm is stabilized. We first show that undesirable values are eventually eliminated in lemma 9,
then we show that the system eventually becomes steady, level by level, in lemma10. As a consequence,
the system eventually stabilizes to Ll .
Definition 6 (spurious) Let p be a pack leader. The value of prev(p) or next(p) is spurious if the
associated level does not match that of the corresponding process, i.e. next_level(p) 6= level( next(p)) or
prev_level(p) 6= level( prev(p)).
Lemma 9 All spurious values are eventually eliminated.
Proof 11 Suppose next_level(p) 6= level( next(p)). Eventually, p executes its sanity checking rule and
sends ListHello to q. If q is not a pack leader, it replies with ListGoodBye, on reception of which p
executes next ← ⊥. If q is a pack leader then level(q) > level(p) by definition of p, so q also replies with
ListGoodBye. The other case, prev_level(p) 6= level( prev(p)), is symmetric.
08-ANR-SEGI-025
Page 15
D2.4
The level variables are updated at the same time as prev and next, which makes it impossible to
introduce a spurious value in the system. We then suppose that no spurious value exists in the system.
An immediate consequence of this is that the system eventually becomes steady at level 0.
Lemma 10 A system steady at level l < t eventually becomes steady at level l + 1.
Proof 12 Let p be the leader of the smallest unsteady pack and q be the leader of the largest steady
pack. We prove that eventually, next(p) = q using the fairness of the oracle and next(p) cannot change
afterwards.
First, notice that if next(p) = q, then p cannot change the value of its next variable. This would
require one of the following:
• a process r such that level(p) > level(r) > level(q) sends ListHello to p, but by definition of p and
q, there is no such r.
• q sends ListGoodBye to p. Since q is a pack leader, this can only happen if q receives ListHello
from p, but in this case, since level(p) > level(q) and there is no level between those two, q does
not send a ListGoodBye message.
Now, consider an execution where next(p) is never q. By its prospection rule, q sends an infinite
number of times ListExists to all the processes returned by its oracle, including p (by the global contition
on oracles). Since level(p) > level(q) > next_level(p), p writes the identifier of q in its next variable.
This is a contradiction.
Convergence time
Theorem 3 After the pack algorihm is stabilized, the list algorithm converges in Θ(B) asynchronous
rounds in the worst case.
Proof 13 First, the variables prev of the first process and next of the last process take the value ⊥
during the first execution of the spontaneous rule.
Notice that, through the use of the prospection rule, all the processes obtain the identifiers of all
the other processes in B asynchronous rounds. Therefore, in B rounds, each pack leader receives the
identifier and level of all the other leaders.
Then, as seen in the self-stabilization proof, when a leader p receives the ListExists message from the
leader q whose level is immediately higher (resp. lower) than its own, it takes q for prev (resp. next).
Therefore, after B rounds, the list is stabilized.
2.2.3
Ranking algorithm
This algorithm, to be composed with the previous two, gives all the processes in the system consecutive
integer identifiers, starting with 0. To achieve its goal, it uses the tree structure of the packs (see
figure 2.3).
Each process has a variable called name, distinct from the constant my_id. The root, i.e. the leader
of the largest block, spontaneously sends Rank(0) to itself. Each pack being constituted of a regular
structure (a binomial tree), it is possible to recursively assign a unique rank to each of the sons of the
root, knowing only the depth of the tree (neighbor number i can take the name 2i ). It is also possible
to compute the size of the tree knowing its level: a pack whose root is of level l holds 2l processes.
Thus, when any process receives a Rank(r) message, it takes r as its name, then assigns the name
i
r + σj=0
2level−i to each neighbor i such that neighbor[i] 6= ⊥, by sending a corresponding Rank message
to its neighbors. Lastly, if the process is a leader and has a successor, it sends the next available rank to
its successor, which is r + 2level .
Note that name assignments can occur in parallel: knowing only the level of the process receiving the
Rank message and the next available identifier which is held in the Rank message, is enough to assign a
name to all of its direct neighbors.
08-ANR-SEGI-025
Page 16
D2.4
Algorithm 1 Ranking Algorithm
Variables:
name: J0, n − 1K
Rules:
Rule Spontaneous Rule:
previous = ⊥ −→
send Rank (0) to my_id
Rule Reaction to Rank :
reception of Rank(r) −→
name ← r
n←r+1
for all i = 1 to level do
send Rank (n) to neighbor[i − 1]
n ← n + 2level−i
end for
if next 6= ⊥ then
send Rank (n) to next
end if
0
0
0
3: Rank(2)
1
1: Rank(0)
P
3
0
4: Rank(13)
0
2: Rank(1)
2: Rank(5)
1
2
0
2
1
2: Rank(7)
10
3: Rank(4)
3: Rank(6)
0
0
4: Rank(3)
Q
2
0
2: Rank(8)
0
3: Rank(9)
1
3: Rank(12)
R
1
1
3: Rank(11)
0
0
4: Rank(10)
0
Figure 2.5: Example of Ranking Algorithm
08-ANR-SEGI-025
Page 17
D2.4
The formal algorithm is given in Algorithm 1.
Figure 2.5 shows a system comprising 14 processes. Thus, there are three packs: one of 8 processes,
lead by P, one of 4 processes lead by Q and one of 2 processes, lead by R. P is the root of the tree, and the
black arrows in the figure represent the next pointer of each process. The number inside the processes
represent their level. The number next to the links between processes inside a pack represents the index
of the process in the neighbor vector of the parent. Blue arrows and text represent Rank messages: first,
P sends Rank (0) to itself, according to the spontaneous rule. Then, it sends in phase 2 Rank (0 + 1 = 1)
to its neighbor at index 0, Rank (1 + 23−1 = 5) to its neighbor at index 1, Rank (5 + 23−2 = 7) to its
neighbor at index 2, and because it is a leader, Rank (7 + 23−3 = 8) to its next neighbor.
Self-stabilization
This algorithm is self-stabilizing because once the pack and list algrithms are stabilized, the topology
does not change anymore. Eventually, the root sends Rank(0) to itself, which launches a new distribution
of names. As soon as the wave has finished propagating along the tree, each process has a unique name
that is suitable for routing.
Convergence time
Let E be an upper bound on the transmission time of a message. Messages are created by the root of
the spanning tree and forwarded down to the leaves, which takes E log n asynchronous rounds. Since E
is normally Θ(1) for the system to be usable, the convergence time of this algorithm is Θ(log n).
2.2.4
Routing Algorithm
The routing algorithm provides a procedure post that takes a message and an integer which is the name of
the final destination. The goal of the routing algorithm is to deliver the message at the final destination,
using the shortest established route between the caller of post and the destination. The message is
delivered at the destination by calling the deliver function with the message as a parameter.
The routing algorithm is done with a single message Route, that takes the destination rank and the
message to deliver. The algorithm is straightforward: at the reception of a Route message, any process
that is the destination delivers the message. Otherwise, using its name and its level, the process computes
whether the message must be routed on one of the children (because of the ranking algorithm, children
of any process of rank r and level l are ranked r + 1 to r + 2l ), in which case the appropriate child is
computed by iterating on the size of the subtrees rooted in each child. If the message is not directed
to one of the children, then it must be forwarded to the parent in the tree (neighbor[level − 1]) if the
process is not a leader, to the leader pointed by next if the destination possess a higher rank than the
process, or by prev if the destination possess a lower rank than the process.
Because the resulting tree has a diameter of at most 2 log n, a message is routed in at most 2 log n
hops. The formal algorithm is given in Algorithm 2.
Since this algorithm uses no memory, it stabilizes instantly: no additional convergence time is needed
in order to be able to route when the ranking algorithm is converged. Note however that because of
the composition, since the global algorithm relies on regular self-stabilizing algorithms, routing is not
guaranteed to work before all the other algorithms have converged.
2.2.5
Convergence time of the global algorithm
In the worst case, the algorithms converge one after another. The global convergence time is thus
Θ(nB) + Θ(B) + Θ(log n) = Θ(nB). This improves over the previous fastest spanning tree algorithm
known in this model [41], which converges in Θ(n(Bn)) [50].
08-ANR-SEGI-025
Page 18
D2.4
Algorithm 2 Routing Algorithm
Definitions:
SendM essage(m, d) ≡
send Route(m, d) to my_id
Rules:
Rule Reaction to Route:
reception of Route(m, r) −→
if r = name then Deliver m else
if r ≥ name + 1 ∧ r < name + 2level then
n ← name + 1
i←0
f ound ← false
while ¬f ound do
if r ≥ n ∧ r < n + 2level−i−1 then
send Route(m, r) to neighbor[i]
f ound ← false
end if
i←i+1
n ← n + 2level−i−1
end while
else if ¬ leader then
send Route(m, r) to neighbor[level−1]
else if r <name then
send Route(m, r) to prev
else {r > name + 2level ∧ leader}
send Route(m, r) to next
end if
08-ANR-SEGI-025
Page 19
D2.4
2.3
Related Works
Many self-stabilizing algorithms that build spanning trees can be found in the literature, mostly using
the state reading model. They can build depth first search trees, or breadth first search trees, depending
on the algorithm. Many information can be found in the survey of Gärtner [35].
Sandeep Gupta and Pradip Srimani [39] presented an algorithm to build and maintain trees in an adhoc message-passing network. This algorithm assumes a fixed range of communication and the discovery
of the list of neighbors. In our case, this algorithm is not applicable, because we would have to model the
range of communication as infinity, and this would require each processes to know about all the processes
in the system.
Yehuda Afek and Anat Bremler [2] define the principle of power supply to build a spanning tree
over an unidirectional network and state reading. The algorithms presented here use a similar concept,
messages being sent permanently to supply the topology. This concept is common in self-stabilization
and peer-to-peer systems, where it is known as gossiping.
Vijay Garg and Anurag Agarwal [34] gave a self-stabilizing spanning tree algorithm for large scale
systems under the assumption that processes already posses consecutive ranks, and are joinable using
these ranks. Our algorithms provide this abstraction, and does not make the same assumption.
Brian Bourgon, Ajoy Datta and Viruthagiri Natarajan [10] introduced a self-stabilizing ranking
algorithm for a tree-based network. Their algorithm could be used on top of the tree built in this work,
instead of the ranking and routing algorithms. However, our ranking algorithm uses specificities of the
tree that is built, in order to improve the convergence time.
The self-stabilizing algorithm proposed by Dolev, Israeli and Moran [24] addresses the issue of ranking
in an anonymous network. It first builds a spanning tree, using random choices to break topological
symmetries. However, the algorithm is built for a shared memory / state reading model, and is thus not
fitted for large scale address-based networks.
In a previous work [41], we introduced the abstractions that we reuse in this chapter to model a
large-scale address-based network using message passing. We presented a first self-stabilizing algorithm
to build a spanning tree. This work improves on the previous result by giving a much more efficient
algorithm to build a tree, removing many constraints on the order of nodes in the tree that were necesary
in the previous algorithm. We also solve higher-level problems, using the tree that is built, the specificities
of the topology, and the fair composition of self-stabilizing algorithms.
2.4
Conclusion
We presented self-stabilizing algorithms for large scale address-based systems. Address-based systems,
such as the Internet, enable communication between any pair of processes, as long as one of them
knows the address of the other. Maintaining such addresses, and corresponding communication channels,
is costly at large scale, and the algorithms we presented limit the amount of such resources, while
still building a resilient and efficient communication infrastructure for classical peer-to-peer distributed
algorithms.
The first algorithm packs processes together in a forest of complete binomial trees. Composed with
the second algorithm that doubly links trees together, this creates a single tree whose diameter and depth
are both logarithmic in the number of processes in the system. Each process, moreover, has at most
log n communication channels to maintain, where n is the number of processes in the system.
The third algorithm assigns ranks (consecutive unique identifiers, ranging from 0 to n − 1) to the
processes of the tree, creating a higher-level abstraction for the communication layer. The fourth algorithm presents a routing mechanism using these ranks, thus completing the creation of a fully usable
communication infrastructure. Since the distance between two nodes is bounded by 2 log n hops this
infrastructure is efficient. It is also reliable since the whole structure is built in a self-stabilizing manner.
It means that in case of failures, the system would converge back to a normal behavior.
The algorithm built from the composition of the four aforementioned algorithms relies on a computation model that replaces traditional neighbor list with an oracle. This weakening of the system
08-ANR-SEGI-025
Page 20
D2.4
assumptions allows scaling up to very large systems. The algorithm converges in Θ(nB) asynchronous
rounds, which improves over the best previously known spanning tree algorithm in such settings.
08-ANR-SEGI-025
Page 21
D2.4
Chapter 3
Stabilizing Locally Maximizable Tasks
in Unidirectional Networks is Hard
3.1
Introduction
One of the most versatile techniques to ensure forward recovery of distributed systems is that of selfstabilization [20, 27, 53]. A distributed algorithm is self-stabilizing if after faults and attacks hit the
system and place it in some arbitrary global state, the system recovers from this catastrophic situation
without external (e.g. human) intervention in finite time.
The vast majority of self-stabilizing solutions in the literature [27] considers bidirectional communication capabilities, i.e. if a process u is able to send information to another process v, then v is always
able to send information back to u. This assumption is valid in many cases, but can not capture the
fact that asymmetric situations may occur, e.g. in wireless networks, it is possible that u is able to send
information to v yet v can not send any information back to u (u may have a wider range antenna than
v). Asymmetric situations, that we denote in the following under the term of unidirectional networks,
preclude many common techniques in self-stabilization from being used, such as preserving local predicates (a process u may take an action that violates a predicate involving its outgoing neighbors without
u knowing it, since u can not get any input from its outgoing neighbors).
Self-stabilizing solutions are considered easier to implement in bidirectional networks since detecting
incorrect situations requires less memory and computing power [5], recovering can be done locally [4],
and Byzantine containment can be guaranteed [44, 45, 48].
Investigating the possibility of self-stabilization in unidirectional networks was recently emphasized
in several papers [3, 8, 14, 16, 18, 19, 25, 32, 33]. However, topology or knowledge about the system
varies: [16] considers acyclic unidirectional networks, where erroneous initial information may not loop;
[3, 14, 19, 25] assume unique identifiers and strongly connected communication graph so that global
communication can be implemented; [18, 32, 33] make use of distinguished processes yet operate on
arbitrary unidirectional networks.
Tackling arbitrary uniform unidirectional networks in the context of self-stabilization proved to be
hard. In particular, [8, 7] studied the self-stabilizing vertex coloring problem in unidirectional uniform
networks (where adjacent nodes must ultimately output different colors). Deterministic and probabilistic
solutions to the vertex coloring problem [37, 47] in bidirectional networks have local complexity (∆
states per process are required, and O(∆) –resp. O(1)– actions per process are needed to recover from
an arbitrary state in the case of a deterministic –resp. probabilistic– algorithm, where ∆ denotes the
maximum degree of a process). By contrast, in unidirectional networks, [8] proves a lower bound of n
states per process (where n is the network size) and a recovery time of at least n(n − 1)/2 actions in total
(and thus Ω(n) actions per process) in the case of deterministic uniform algorithms, while [7] provides a
probabilistic solution that remains either local in space or local in time, but not both.
In this chapter, we consider the problem of constructing self-stabilizingly a locally maximizable task
ANR SPADES. 08-ANR-SEGI-025
Page 22
D2.4
(e.g. maximal independent set, maximal matching, grundy coloring) in uniform unidirectional networks
of arbitrary shape. It turns out that local maximization is strictly more difficult than local predicate
maintenance (i.e. vertex coloring). On the negative side, we present evidence that in uniform networks,
deterministic self-stabilization of this problem is impossible. Also, the silence property (i.e. having communication fixed from some point in every execution) is impossible to guarantee, either for deterministic
or for probabilistic variants of protocols.
On the positive side, we present a series of generic protocols that can be instantiated for all considered locally maximizable tasks. First, we design a deterministic protocol for arbitrary unidirectional
networks with unique identifiers that exhibits O(m log n) space complexity and O(D) time complexity in
asynchronous scheduling, where n (resp. m) is the number of processes (resp. links) in the network and
D is the network diameter. We complement the study with probabilistic generic protocols for the uniform
case: the first probabilistic protocol requires infinite memory but copes with asynchronous scheduling
(stabilizing in time O(log n + log ` + D), where ` denotes the number of fake identifiers in the initial
configuration), while the second probabilistic protocol has polynomial space complexity (in O(m log n))
but can only handle synchronous scheduling (stabilizing in time O(n log n + `)).
The remaining of the chapter is organized as follows: Section 3.2 presents the programming model and
problem specification. Section 3.3 presents our negative results, while Section 3.4 details the protocols.
Section 3.5 gives some concluding remarks and open questions.
3.2
Preliminaries
Program model A program consists of a set V of n processes. A process maintains a set of variables
that it can read or update, that define its state. A process contains a set of constants that it can read but
not update. A binary relation E of cardinality m is defined over distinct processes such that (i, j) ∈ E
if and only if j can read the variables maintained by i; i is a predecessor of j, and j is a successor of i.
The set of predecessors (resp. successors) of i is denoted by P.i (resp. S.i), and the union of predecessors
and successors of i is denoted by N.i, the neighbors of i. The ancestors of process i is recursively defined
as follows: i itself is an ancestor of i, and ancestors of each predecessor of i are also ancestors of i.
The descendants of i are similarly defined using successors (instead of predecessors). The relation E is
not necessarily symmetric, which reflects the assumption that the network we consider is unidirectional.
Another remarkable point that distinguishes our model from the ordinal unidirectional model is that
each process is aware of its predecessors but is unaware of its successors; each process knows how many
predecessors it has and can distinguish them but has no knowledge about its successors. Notice that the
unawareness of successors is inherent to some unidirectional networks such as wireless networks.
For processes i and j in V , d(i, j) denotes the distance (or the length of the shortest path) from i to
j in the directed graph (V, E). We define, for convenience, the distance as d(i, i) = 0 and d(i, j) = ∞ if
i is not reachable to j. The diameter D is defined as D = max{d(i, j) | (i, j) ∈ V × V, d(i, j) 6= ∞}. A
graph G = (V, E) is strongly connected if for any two vertices i and j, both d(i, j) 6= ∞ and d(j, i) 6= ∞
hold. The strongly connected components (abbreviated as SCC) of G are its maximal strongly connected
sugraphs.
An action has the form hnamei : hguardi −→ hcommandi. A guard is a Boolean predicate over
the variables of the process and its predecessors. A command is a sequence of statements assigning new
values to the variables of the process. Remind that a process is unaware of its successors so actions of a
process can depend on its predecessors but are completely independent of its successors. A parameter is
used to define a set of actions as one parameterized action.
A configuration of the program is the assignment of a value to every variable of each process from
its corresponding domain. Each process contains a set of actions. In some configuration, an action is
enabled if its guard is true in the configuration, and a process is enabled if it has at least one enabled
action in the configuration. A computation is a maximal sequence of configurations γ0 , γ1 , . . . such that
for each configuration γi , the next configuration γi+1 is obtained by executing the command of at least
one action that is enabled in γi . Maximality of a computation means that the computation is infinite
or it terminates in a configuration where none of the actions are enabled. A program that only has
08-ANR-SEGI-025
Page 23
D2.4
terminating computations is silent.
A scheduler is a predicate on computations, that is, a scheduler is a set of possible computations,
such that every computation in this set satisfies the scheduler predicate. We consider only weakly fair
schedulers, where no process can remain enabled in a computation without executing any action. We
distinguish three particular schedulers in the sequel of the chapter: the distributed scheduler corresponds
to predicate true (that is, all weakly fair computations are allowed). The locally central scheduler implies
that in any configuration belonging to a computation satisfying the scheduler, no two enabled actions
are executed simultaneously on neighboring processes. The synchronous scheduler implies that in any
configuration belonging to a computation satisfying the scheduler, every enabled process executes one of
its enabled actions.
The distributed and locally central schedulers model asynchronous distributed systems. In asynchronous distributed systems, time is usually measured by asynchronous rounds (simply called rounds).
Let C = γ0 , γ1 , . . . be a computation. The first round of C is the minimum prefix of C, C1 = γ0 , γ1 , . . . , γk ,
such that every enabled process in γ0 executes its action or becomes disabled in C1 . Round t (t ≥ 2) is
defined recursively, by applying the above definition of the first round to C 0 = γk , γk+1 , . . .. Intuitively,
every process has a chance to update its state in every round.
A configuration conforms to a predicate if this predicate is true in this configuration; otherwise the
configuration violates the predicate. By this definition every configuration conforms to predicate true
and none conforms to false. Let R and S be predicates over the configurations of the program. Predicate
R is closed with respect to the program actions if every configuration of the computation that starts in
a configuration conforming to R also conforms to R. Predicate R converges to S if R and S are closed
and any computation starting from a configuration conforming to R contains a configuration conforming
to S. The program deterministically stabilizes to R if and only if true converges to R. The program
probabilistically stabilizes to R if and only if true converges to R with probability 1.
Problem specification In this chapter we consider locally maximizable tasks, and instantiate them
using the following three classical problems:
UMIS Each process i defines a function mis.i that takes as input the states of i and its predecessors,
and outputs a value in {true, false}. The unidirectional maximal independent set (denoted by
UMIS in the sequel) predicate is satisfied if and only if for every i ∈ V , either mis.i = true ∧ ∀j ∈
N.i, mis.j = false or mis.i = false ∧ ∃j ∈ N.i, mis.j = true.
UGC Each process i defines a function col .i that takes as input the states of i and its predecessors,
and outputs a non-negative integer (or color). The unidirectional Grundy coloring (denoted by
UGC) predicate is satisfied if and only if for every i ∈ V , ∀j ∈ N.i, col .i 6= col .j and col .i =
min(Z≥0 − {col .j | j ∈ N.i}), where Z≥0 denotes the set of non-negative integers.
UMM Each process i defines a function match.i that takes as input the states of i and its predecessors, and outputs one of its predecessor (actually the local label of the incoming link from the
predecessor) or a symbol ⊥. The unidirectional maximal matching (denoted by UMM) predicate
is satisfied if and only if
• for any two distinct processes i and j such that match.i 6= ⊥ and match.j 6= ⊥, {i, match.i} ∩
{j, match.j} = ∅, and
• for any neighboring processes i and j such that match.i = match.j = ⊥, either ∃g ∈
N.i, match.g = i ∨ ∃h ∈ N.j, match.h = j.
3.3
Impossibility Results
In this section, we consider anonymous and uniform networks, where processes of the same in-degree
execute exactly the same code (note however that probabilistic protocols may exhibit different actual
behaviors when making use of a random variable).
08-ANR-SEGI-025
Page 24
D2.4
S''
S
c
S''
a
S'
S
f
S'
d
S'
b
(a) System A
g
S''
e
h
(b) System B
Figure 3.1: Impossibility of silent self-stabilizing localy maximizable task
Definition 7 (Local View) The local view of a process p consists of the states of its own and its
predecessors (with local labels on the incoming links). Two processes are called locally equivalent if they
have a same local view.
Since all the processes have the same program to execute and the program depends only on the
states of its own and its predecessors, locally equivalent processes make the same action when activated.
Especially, a process is disabled at a configuration, all the processes locally equivalent to it are also
disabled at any configuration. Using the concept of local equivalence, we can characterize the problems
for which no silent self-stabilizing solution exists.
Theorem 4 A problem allows no silent self-stabilizing solution if it satisfies the following condition:
From any configuration, say γ, satisfying the problem predicate, a configuration γ 0 (possibly of a network
different from that of γ) can be constructed such that
S1. γ 0 cannot satisfy the problem predicate, and
S2. every process at γ 0 is locally equivalent to some process at γ.
Proof 14 Assume a silent self-stabilizing solution A for contradiction. Starting from any configuration,
A reaches a silent configuration, say γ, satisfying the problem predicate. Now construct a configuration
γ 0 satisfying the conditions S1 and S2. The configuration γ 0 is silent, since all the processes are disabled
at γ and are also at γ 0 (from S2). Thus A forever remains at γ 0 when starting from γ 0 . This contradicts
the assumption that A is a self-stabilizing solution since γ 0 does not satisfy the problem predicate (from
S1).
Notice that the impossibility results of Theorem 4 holds even for probabilistic potential solutions.
Impossibility results for the UMIS, the UGC and the UMM problems are easily obtained from Theorem
4.
Corollary 1 There exists no silent self-stabilizing solution for the UMIS problem.
Proof 15 Consider System A as depicted in Figure 3.1 (a). In any configuration γ satisfying the UMIS
predicate, exactly one of the three processes, says a, has mis.a = true. Now consider System B in
Figure 3.1 (b) and construct configuration γ 0 , where process d has the same state as a, e and g have
the same state as b, and f and h have the same state as c. Configuration γ 0 does not satisfy the UMIS
predicate since only d has mis.d = true (or S1 holds). It is easy to see d is locally equivalent to a, e and
g are locally equivalent to b, and f and h are locally equivalent to c (or S2 holds). Thus, the corollary
holds from Theorem 4
Corollary 2 There exists no silent self-stabilizing solution for the UGC problem.
08-ANR-SEGI-025
Page 25
D2.4
Proof 16 Consider System A as depicted in Figure 3.1 (a). In any configuration γ satisfying the UGC
predicate, col .a, col .b and col .c return mutually distinct colors drawn from {0, 1, 2}. Without loss of
generality, assume col .a = 0. Now consider System B in Figure 3.1 (b) and construct configuration γ 0 ,
where process d has the same state as a, and e and g have the same state as b, and f and h have the same
state as c. Configuration γ 0 does not satisfy the UGC predicate since h does not satisfy the requirement
of the minimum color (or S1 holds); col .g and col .h return mutually distinct colors drawn from {1, 2}
and thus the minimum color 0 is not used at h or its neighbor g. It is easy to see d is locally equivalent
to a, e and g are locally equivalent to b, and f and h are locally equivalent to c (or S2 holds). Thus, the
corollary holds from Theorem 4
Corollary 3 There exists no silent self-stabilizing solution for the UMM problem.
Proof 17 Consider System A as depicted in Figure 3.1 (a). In any configuration γ satisfying the UMM
predicate, exactly one of the three processes, say a, has match.a 6= ⊥ (actually match.a = c). Now
consider System B in Figure 3.1 (b) and construct configuration γ 0 , where process d has the same state
as a, and e and g have the same state as b, and f and h have the same state as c. Configuration γ 0 does
not satisfy the UMM predicate since match.g = match.h = ⊥ and match.d 6= g (or S1 holds). It is easy
to see d is locally equivalent to a, e and g are locally equivalent to b, and f and h are locally equivalent
to c (or S2 holds). Thus, the corollary holds from Theorem 4
Initial symmetry can be broken self-stabilizingly by a probabilistic approach [37, 38, 47] however
deterministic protocols prevent using such techniques. Thus, relaxing the silence property still prevents
the existence of deterministic solutions since symmetry breaking is impossible in some situations (e.g. a
ring where all the processes are initially in the same state). Thus we can obtain the following theorem.
First we introduce the unidirectional view that is simply a unidirectional version of the view introduced in [54].
Definition 8 (Unidirectional View) The unidirectional view Vp1 at distance 1 of a node p is the
local view of p. The unidirectional view at distance k of p is a tree Vpk of height k that contains one
unidirectional view Vqk−1 as a subtree of p for each predecessor q of p.
In the below, a unidirectional view is simply called a view. The following theorem is derived from
the result of [54]. Intuitively, the view at infinite distance of process p is the information p can use at
the best. Thus, processes with the same view cannot make their states distinct from each other’s when
all the processes are activated every time. It is also known that two processes have the same view at
infinite distance if their views at distance n are the same.
Theorem 5 A problem allows no deterministic self-stabilizing solution if the following configuration γ
exists at some network G: for any configuration γ 0 of G satisfying the problem predicate, there exist
processes p and q that have the same unidirectional view of distance n in γ but have different states at
γ 0 , where n is the number of processes in G.
Theorem 5 implies impossibility of symmetry breaking. Thus, the following corollary obviously holds
from Theorem 5 by considering as γ, for example, a ring network consisting of processes with a same
state.
Corollary 4 There exists no deterministic self-stabilizing solution for the UMIS , the UGC and the
UMM problems.
08-ANR-SEGI-025
Page 26
D2.4
3.4
Possibility Results
The previous impossibility results yield that for the deterministic case, only non uniform networks admit
a self-stabilizing solution for the UMIS, the UGC and the UMM problems. In section 3.4.1, we present
such a deterministic solution.
For anonymous and uniform networks, there remains the probabilistic case. We proved that probabilistic yet silent solutions are impossible, so both our solutions are non-silent. The one that is presented
in Section 3.4.2 performs in asynchronous networks but requires unbounded memory, while the one that
is presented in Section 3.4.3 performs in synchronous networks and uses O(m log n) memory per process.
3.4.1
Deterministic solution with identifiers
This subsection deals with networks where each process has a unique identifier. First, we present a
deterministic scheme of self-stabilizing solutions and then characterize a problem class that can be solved
by the scheme. The problem class contains the UMIS, the UGC and the UMM problems as explained
later.
The intuition of the scheme is as follows. Every process collects the predecessor information from all
of its ancestors using the self-stabilizing approach given in [19, 22, 33]. From the collected information,
each process i can reconstruct the exact topology of the subgraph consisting of all ancestors of i. In the
case that each process has a given input value of the problem to be solved, the input values of all the
ancestors are also collected. Then, using the topology and the input values, each process locally solves
the problem and changes its state according to the solution.
The details of the scheme is given in Algorithm 1. Each process i maintains a variable Topologyi to
store tuples of the form (id , ID, inp, d ) where id is a process identifier, ID is the (identifier) set of the
predecessors of process id, inp is the input value of id, and d is the distance from id to i. For Topologyi ,
G(Topology i ) denotes a directed graph G = (V, E) obtained from the predecessor information contained
in Topologyi : V = {id | (id, ∗, ∗, ∗) ∈ Topology i }, and E = {(j, k) ∈ V × V | ∃(k, ID, ∗, ∗) ∈ Topology i
s.t. j ∈ ID}.
Lemma 11 Let i be any process. At the end of the k-th round (k ≥ 1) and later, variable Topologyi
stores correct tuples up to distance k − 1:
{(id , ID, inp, d ) ∈ Topology i | d ≤ k − 1} = {(j, P.j, inp j , d(j, i)) | d(j, i) ≤ k − 1}
The following corollary is derived from Lemma 11.
Corollary 5 Let i be any process and D(i) be the maximum distance to i from all the ancestors of i.
At the end of the (D(i) + 1)-th round and later, Topologyi exactly stores only the correct tuples of all the
ancestors of i.
Corollary 5 shows that the scheme of Algorithm 1 eventually provides each process i with the topology
of its ancestors including their identifiers and input values. This is the maximum information that process
i can use in the unidirectional network, which intuitively implies that the scheme can allow us to solve
any problem that is solvable in our model. The following theorem characterizes the problem that can be
solved by the scheme of Algorithm 1.
Theorem 6 In asynchronous networks with identifiers, the scheme of Algorithm 1 can provide a selfstabilizing solution to any problem such that each process can find its final state (or its output) solely
from the topology of its ancestors including their identifiers and input values. Its convergence time is
D + 1 rounds and the memory space required at each process is O(m log n) bits.
08-ANR-SEGI-025
Page 27
D2.4
Algorithm 1 A generic deterministic scheme in asynchronous networks with identifiers
constants of process i
idi : identifier of i;
Pi : identifier set of its predecessors P.i;
inp i : input value of i (of the problem to be solved);
variables of process i
Topologyi : set of (id , ID, inp, d ) tuples;
// topology that i is currently aware of.
// id: process identifier
// ID: identifier set of P.(id)
// inp: input value of i
// d: distance from id to i in Topologyi .
function
update(Topologyi )
TopologyiS:= {(idi , Pi , inp i , 0)}∪
j∈P.i {(id , ID, inp, d + 1) |
(id , ID, inp, d ) ∈ Topologyj };
while ∃(id , ID, inp, d ), (id 0 , ID 0 , inp 0 , d 0 ) ∈ Topologyi
s.t. id = id0 and d < d0
remove (id 0 , ID 0 , inp 0 , d 0 ) from Topologyi ;
while ∃(id , ID, inp, d ), (id 0 , ID 0 , inp 0 , d 0 ) ∈ Topologyi
s.t. id = id0 and (ID 6= ID 0 or inp 6= inp 0 )
remove all the tuples (id, ∗, ∗, ∗) from Topologyi ;
while ∃(id , ID, inp, d ) ∈ Topologyi
s.t. id is unreachable to i in G(Topology i )
remove (id , ID, inp, d ) from Topologyi ;
solve(Topologyi )
change the state of i to the task-dependent solution using Topologyi
actions of process i
true −→ update(Topologyi ); solve(Topologyi );
08-ANR-SEGI-025
Page 28
D2.4
Modification to a silent protocol: Algorithm 1 can be easily modified to be silent. For simplicity
of our presentation, every process always has an enabled action with guard true, and thus, Algorithm 1
is not silent. But, Algorithm 1 becomes silent by changing the guard so that the action becomes enabled
only when Topologyi needs to be updated. Precisely, the guard is changed to
Topology i
6= {(idi , Pi , inp i , 0)} ∪
[
{(id , ID, inp, d + 1) |
j∈P.i
(id , ID, inp, d ) ∈ Topology j }.
The rest of this subsection shows the UMIS, the UGC, and the UMM problems are contained in the
problem class of Theorem 6.
Corollary 6 The scheme of Algorithm 1 can provide self-stabilizing deterministic solutions for the
UMIS, the UGC and the UMM problems in asynchronous networks with identifiers.
Algorithm 2 A task-dependent function at process i for the UMIS problem
function
UMIS(Topologyi )
WorkingTpi := Topologyi ;
UMISi := ∅
while ∃(idi , Pi , 0) ∈ WorkingTpi {
Let W be a source SCC of WorkingTpi ;
for each id ∈ W in the descending order of identifiers
if UMISi ∪ {id} is an independent set
UMISi := UMISi ∪ {id};
WorkingTpi := WorkingTpi − W ;
}
if idi ∈ UMISi
output true;
else
output false;
3.4.2
Probabilistic solution with unbounded memory in asynchronous anonymous networks
In this subsection, we present a probabilistic scheme of self-stabilizing solutions for locally maximizable
tasks in asynchronous anonymous networks. The scheme is based on a probabilistic unique naming of
processes, which allows each process, in the same way as Algorithm 1, to deterministically find the exact
topology consisting of all of its ancestors. In the naming algorithm, each process is given a name variable
that can be arbitrary large (thus the unbounded memory requirement). The naming is unique with
probability 1 after a bounded number of new name draws. The new name draw consists in appending a
random bit at the end of the current identifier. Each time the process is activated, a new random bit is
appended. In parallel, we essentially run the deterministic algorithm to find the topology consisting of
all ancestors of the process.
The main difference from Algorithm 1 is in handling the process identifiers. The variable Topology
(similar to that of Algorithm 1) of a particular process may contain several different identifiers of a same
process since the identifier of the process continues to get longer and longer in every execution of the
algorithm. To circumvent the problem, we consider two distinct identifiers to be the same if one is a
prefix of the other, and anytime such same identifiers conflict, only the longest one is retained.
08-ANR-SEGI-025
Page 29
D2.4
Another difference is that we do not need the distance information. The distance information is used
in Algorithm 1 to remove the fake tuples (i, ID, inp, d) of process i such that i is an identifier of a
non-existing process or ID 6= P.i, which may exist in the initial configuration. In our scheme, fake tuples
with identifiers that are prefixes of identifiers of real processes are eventually removed since the correct
identifier eventually becomes longer than any fake identifier. Notice that tuples with fake identifiers are
eventually disconnected from the constructed subgraph topology and thus are removed.
The details of the algorithm are given in Algorithm 3.
Algorithm 3 A probabilistic scheme in asynchronous anonymous networks
constants of process i
inp i : input value of i (of the problem to be solved);
variables of process i
id i : identifier (binary string) of i;
Pi : identifier set of P.i;
Topology i : set of (id , ID, inp) tuples;
// topology that i is currently aware of.
// id: a process identifier
// ID: identifier set of P.(id)
// inp: input value of id
function
update(Topologyi )
idi := append(idi , random_bit);
// append a random bit to the current id
Pi := identifier set of P.i;
// update the identifier setSof i’s predecessors
Topology i := {(id i , Pi , inp i )} ∪ j∈P.i Topology j ;
while ∃(id , ID, inp), (id 0 , ID 0 , inp 0 ) ∈ Topology i
s.t. id’ is a prefix of id
remove (id 0 , ID 0 , inp 0 ) from Topology i ;
while ∃(id , ID, inp) ∈ Topology i
s.t. id is unreachable to i in G(Topology i )
remove (id , ID, inp) from Topology i ;
solve(Topology i )
change the state of i to the task-dependent solution using Topologyi
actions of process i
true −→ update(Topology i ); solve(Topology i );
Theorem 7 In asynchronous anonymous networks, the scheme of Algorithm 3 can provide a selfstabilizing probabilistic solution to any problem such that each process can find its final state (or its
output) solely from the following information:
(i) the topology of its ancestors including their input values, and
(ii) a total order over its ancestors that is consistent among an arbitrarily given total order over all the
processes.
Its expected convergence time is O(log n + log ` + D) rounds where ` is the number of fake identifiers
in the initial configuration.
By a similar argument of Corollary 6, we can obtain the following corollary from Theorem 7.
Corollary 7 The scheme of Algorithm 3 can provide self-stabilizing probabilistic solutions for the UMIS,
the UGC and the UMM problems in asynchronous anonymous networks.
08-ANR-SEGI-025
Page 30
D2.4
3.4.3
Probabilistic solution with bounded memory in synchronous anonymous networks
The scheme in Algorithm 3 is based on global unique naming, however, self-stabilizing global unique
naming in unidirectional networks inherently requires unbounded memory. The goal of this subsection
is to present a scheme, with bounded memory, of self-stabilizing solutions. To avoid usage of unbounded
memory space, the scheme attains and utilizes a local unique naming instead of the global one. The local
unique naming guarantees that two processes have distinct identifiers whenever one is reachable from
the other. Indeed, such a local naming is sufficient for each process to recognize the strongly connected
component it belongs to. Once the component is recognized, some problems such as the UMIS, the UGC
and the UMM problems can be solved by a method similar to that in Section 3.4.2.
In our scheme to achieve local unique naming, each process extends its identifier by appending
a random bit when it finds an ancestor with the same identifier as its own. To be able to perform
such a detection, a process needs to distinguish any of its ancestors from itself even when they have
the same identifier. The detection mechanism is basically executed as follows: each process draws a
random number, and disseminates its identifier together with the random number to its descendants.
When process i receives the same identifier as its own, it checks whether the attached random number
is same as its own. If they are different, the process detects that this is a distinct process (that is, a
real ancestor) with the same identifier as its own current identifier. When the process receives the same
identifier with the same random number as its own for a given period of time, it draws a new random
number and repeats the above procedure. Hence, as two different processes eventually draw different
random numbers, eventually every process is able to detect an ancestor with the same identifier if such
an ancestor exists.
The above method may cause false detection (or false positive) when a process receives its own
identifier but with an old random number. To avoid such false detection, each identifier is relayed with
a distance counter and is removed when the counter becomes sufficiently large. Moreover, the process
repeats the detection checks while keeping sufficiently long periods of time between them. The details
of the self-stabilizing probabilistic algorithm for the local naming are presented in Algorithm 4.
Lemma 12 Algorithm 4 presents a self-stabilizing probabilistic local naming algorithm in synchronous
anonymous networks. Its expected convergence time is O(n log n + `) rounds where ` is the number of
fake identifiers in the initial configuration.
Proof Sketch: First we show that the algorithm is a self-stabilizing probabilistic local naming algorithm.
For contradiction, we assume that two processes i and j (where j is an ancestor of i) keep a same identifier
after a configuration. Without loss of generality, the distance from j to i is minimum among process
pairs keeping same identifiers. Let j, u1 , u2 , . . . , um , i be the shortest path from j to i. Since all processes
in the path have mutually distinct identifiers except for a pair i and j, (id j , rnd j , k) is not discarded in
any intermediate process uk (1 ≤ k ≤ m) (because of k ≤ |{id | (id , ∗, ∗) ∈ ID uk }) and is delivered to i.
Thus, eventually i detects id i = id j and rnd i 6= rnd j . Then i extends its identifier by adding a random
bit, which is a contradiction.
We evaluate the expected convergence time of the algorithm. By similar argument to the proof of
Theorem 7, we can show that the expected number of bits added to a process identifier is O(log n).
Notice that the number ` of fake identifiers has no influence to the evaluation, for the distance d of a
fake identifier is larger than the timer value (once the timer is reset) and thus is removed (because of
dist > |{id | (id , ∗, ∗) ∈ ID i }|) when function naming is executed. Actually we can show that all the
fake identifiers existing in the initial configuration are removed in O(n + `) rounds. On the other hand,
the time between two executions of function naming at a process depends on the number of currently
existing identifiers (including the fake ones), which is initially O(n+`) and becomes O(n) within O(n+`)
rounds. Thus, the expected convergence time is O(n log n + `) rounds.
Algorithm 5 presents a scheme of self-stabilizing solutions in networks with local naming. Thus, the
fair composition[27] with the local-naming algorithm in Algorithm 4 provides probabilistic self-stabilizing
algorithms in synchronous anonymous networks. For simplicity, we omit the code for removing fake
08-ANR-SEGI-025
Page 31
D2.4
Algorithm 4 Probabilistic local naming in synchronous anonymous networks
variables of process i
id i : identifier (binary string) of i;
rnd i : random number selected from {1, 2, . . . , k};
// k (≥ 2) is a constant
ID i : set of (id , rnd , d ) tuples;
// identifiers that i is currently aware of.
// id: a process identifier
// rnd: random number of id
// d: distance that id traverses
function
update(ID i )
ID i :=
S {(id i , rnd i , 0)}∪
j∈P.i {(id , rnd , d + 1) | (id , rnd , d ) ∈ ID j };
while ∃(id , rnd , d ) ∈ ID i s.t. d > |{id | (id , ∗, ∗) ∈ ID i }|;
remove (id , rnd , d ) from ID i ;
if timer > |{id | (id , ∗, ∗) ∈ ID i }|
// timer is incremented by one every round
naming(ID i )
naming(ID i )
if ∃(id i , rnd , ∗) ∈ ID i s.t. rnd 6= rnd i
id i := append (id i , random_bit);
// append a random bit to the current id
rnd i := number randomly selected from {1, 2, . . . , k};
reset_timer; // reset timer to 0
update(ID i );
actions of process i
true −→ update(ID i );
08-ANR-SEGI-025
Page 32
D2.4
initial information in Algorithm 5 since such fake initial information can be removed in a similar way to
Algorithm 4.
Similar to previous schemes, each process i has variable Topology i to store the topology of ancestors of
i. However, distinct from previous algorithms, the exact topology of ancestors of i cannot be constructed
because two distinct ancestors may have a same identifier when they are mutually unreachable. This may
make it difficult or impossible for each process to find the solution of the problem solely from the topology
information. Instead, as shown in Lemma 13, each process i can exactly construct the topology of the
strongly-connected component it belongs to. To supplement the weakness of the topology information, a
tuple stored in Topology i is of the form (id,ID,inp,lview,d) and has an additional entry lview to store the
local view of process id (other entries id, ID, inp and d are same as those in Algorithm 1). The final states
of external predecessors of the strongly connected component (or the processes that are predecessors of
processes in the component but are not in the component) can be obtained from the local view and can
be utilized to find the solution of the problem.
Algorithm 5 A scheme in networks with local naming
constants of process i
id i : identifier of i; // distinct from that of any ancestor
Pi : identifier set of P.i;
inp i : input value of i (of the problem to be solved);
variables of process i
st i : state of i to be communicated (task-specific)
lview i : set of (id,st,label) tuples; // local view of i
// id: an identifier of a predecessor of i
// st: a state of id
// label: a link label at i assigned to the incoming link from id
Topologyi : set of (id,ID,inp,lview,d) tuples;
// topology that i is currently aware of.
// id: a process identifier
// ID: identifier set of P.(id )
// inp: input value of id
// lview: local view of id
// d: distance from id to i
function
update(Topology
S i)
lview i := j∈P.i {(id j , st j , label i (j))}
// label i (j): local label at i for link (j, i)
Topology
S i := {(id i , Pi , inp i , lview i , 0)}∪
j∈P.i {(id, ID, inp, lview, d + 1)|
(id, ID, inp, lview, d) ∈ Topology j };
while ∃(id , ID, inp, lview , d ) ∈ Topology i ,
∃(id 0 , ID 0 , inp 0 , lview 0 , d 0 ) ∈ Topology i
s.t. id = id0 and d < d0
remove (id 0 , ID 0 , inp 0 , lview 0 , d 0 ) from Topology i ;
solve(Topology i )
change the state of i to the task-dependent solution using Topologyi
actions of process i
true −→ update(Topology i ); solve(Topology i );
Lemma 13 In synchronous locally-named networks, the scheme presented in Algorithm 5 allows each
process to exactly recognize the topology of the strongly connected component it belongs to in O(D) rounds.
Proof Sketch: It is obvious that variable Topologyi of each process i after D rounds consists of tuples
(id , P.(id ), inp id , lview id , d(id , i)) from all the ancestors id of i. Notice that the local naming allows two
08-ANR-SEGI-025
Page 33
D2.4
7
5
7
10
5
7
4
10
5
4
1
1
3
8
3
8
9
9
11
11
(a) An actual graph G
(b) Graph G1 (and G3 , G8 , and G9 )
Figure 3.2: An actual locally-named graph G and Graph G1 constructed at process 1.
distinct processes to have a same identifier if they are mutually unreachable. Thus, Topology i may contain
a same tuple (id , P.(id ), inp id , lview id , d(id , i)) of two or more distinct processes and/or may contain two
tuples (id , P , inp, lview , d ) and (id , P 0 , inp 0 , lview 0 , d ) with same id and d but different values in some
other entry.
Each process i recognizes the topology of the strongly connected component i belongs to as the one in
the following graph Gi = (Vi , Ei ): Vi = {id | (id , ∗, ∗, ∗, ∗) ∈ Topology i } and Ei = {(u, v) | (v, P, ∗, ∗, ∗) ∈
Topology i s.t. u ∈ P } (see Figure 3.2). In other words, Gi can be obtained from the actual graph G
as follows: First consider the subgraph G0i induced by all ancestors of i, and then merge the processes
with a same identifier into a single process. What we have to show is that Gi and G0i are the same with
respect to the topology of the strongly connected component i belongs to.
It is obvious that all processes in Gi and G0i are reachable to i. What we have to show is that process
j is reachable from i in Gi (or j belongs to the strongly connected component of i) if and only if j is
also reachable from i in G0i . The if part is obvious since Gi is obtained from G0i by merging processes.
The only if part holds as follows. Consider two distinct processes j and j 0 with a same identifier if exist.
Since they are mutually unreachable but are reachable to i, they are unreachable from i in G0i (otherwise
one of them is reachable from the other). This implies that, in construction of Gi from G0i , merging is
applied only to processes unreachable from i, that is, the merging has no influence on reachability from
i. Thus, any process unreachable from i in G0i remains unreachable from i in Gi .
Theorem 8 In synchronous anonymous networks, fair composition of the local-naming algorithm of
Algorithm 4 and the scheme of Algorithm 5 can provide a probabilistic self-stabilizing solution to any
problem such that each process can find its final state (or its output) solely from the following information:
(i) the topology of its strongly connected component including their identifiers and input values, and
(ii) the final states of external predecessors of its strongly connected component (given as local views of
processes in the component).
08-ANR-SEGI-025
Page 34
D2.4
Its expected convergence time is O(n log n + `) rounds where ` is the number of fake identifiers in the
initial configuration. The expected space complexity of the resulting algorithm is O(m log n).
Corollary 8 Fair composition of the local-naming algorithm of Algorithm 4 and the scheme of Algorithm
5 can provide probabilistic self-stabilizing solutions for the UMIS, the UGC and the UMM problems in
synchronous anonymous networks.
Algorithm 6 A problem-dependent function at a process i for the UMIS problem
function
UMIS (Topology i )
UMIS i := ∅;
Let W be the SCC of Topology i that i belongs to;
for each id ∈ W in the descending order of identifiers
if (6 ∃(∗, true, ∗) ∈ lview id ) and (P.id ∩ UMIS i = ∅)
for (id , P.id , inp id , lview id , d) ∈ Topology i then
UMISi := UMISi ∪ {id};
if idi ∈ UMISi
output true;
else
output false;
3.5
Conclusion
Although in bidirectional networks, self-stabilizing maximal independent set is as difficult as (nonGrundy) vertex coloring [37], this work proves that in unidirectional networks, the computing power and
memory that is required to solve the problem varies greatly. Silent solutions to unidirectional uniform
networks coloring do exist and require Θ(n2 ) (resp. Θ(1)) stabilization time when deterministic (resp.
probabilistic) solutions are considered. By contrast, deterministic maximal independent set construction
in uniform unidirectional networks is impossible, and silent maximal independent set construction is impossible, regardless of the deterministic or probabilistic nature of the protocols. Similar differences can
be observed for maximal matching and grundy coloring.
The self-stabilizing probabilistic naming techniques (defining equivalence classes over identifiers)
that we introduced here could be of independent interest for solving other tasks in cases where similar
impossibility results hold.
While we presented positive results for the deterministic case with identifiers, and the non-silent
probabilistic cases, there remains the immediate open question of the possibility to devise a probabilistic
solution with bounded memory in asynchronous setting.
Another interesting issue for further research related to global tasks. The global unique naming that
we present in section 3.4.2 solves a truly global problem in networks where global communication is not
feasible, by defining proper equivalences classes between various identifiers. The case of other classical
global tasks in distributed systems (e.g. leader election) is worth investigating.
08-ANR-SEGI-025
Page 35
D2.4
Chapter 4
The Impact of Topology on Byzantine
Containment in Stabilization
4.1
Introduction
The advent of ubiquitous large-scale distributed systems advocates that tolerance to various kinds of
faults and hazards must be included from the very early design of such systems. Byzantine Faulttolerance [43, 49] is traditionally used to mask the effect of a limited number of malicious faults. Making
distributed systems tolerant to both transient and malicious faults is appealing yet proved difficult [26,
15, 48] as impossibility results are expected in many cases.
Two main paths have been followed to study the impact of Byzantine faults in the context of
self-stabilization:
- Byzantine fault masking (any correct processes eventually satisfy its specification). In completely
connected synchronous systems, one of the most studied problems in the context of self-stabilization
with Byzantine faults is that of clock synchronization. In [6, 26], probabilistic self-stabilizing protocols
were proposed for up to one third of Byzantine processes, while in [21, 42] deterministic solutions tolerate
up to one fourth and one third of Byzantine processes, respectively.
- Byzantine containment. For local tasks (i.e. tasks whose correctness can be checked locally, such as
vertex coloring, link coloring, or dining philosophers), the notion of strict stabilization was proposed [48,
52, 45, 31]. Strict stabilization guarantees that there exists a containment radius outside which the
effect of permanent faults is masked, provided that the problem specification makes it possible to break
the causality chain that is caused by the faults. As many problems are not local, it turns out that it
is impossible to provide strict stabilization for those. Note that a strictly stabilizing algorithm with a
radius of 0 which runs on a completely connected system provides a masking approach.
In this chapter, we investigate the possibility of Byzantine containment in a self-stabilizing setting
for tasks that are global (i.e. for which there exists a causality chain of size r, where r depends on
n the size of the network), and focus on a global problem, namely maximum metric tree construction
(see [40, 36]). As strict stabilization is impossible with such global tasks, we weaken the containment
constraint by relaxing the notion of containment radius to containment area, that is Byzantine processes
may disturb infinitely often a set of processes which depends on the topology of the system and on the
location of Byzantine processes.
The main contribution of this chapter is to present new possibility results for containing the influence
of unbounded Byzantine behaviors. In more details, we define the notion of topology-aware strict stabilization as the novel form of the containment and introduce containment area to quantify the quality of
the containment.
The notion of topology-aware strict stabilization is weaker than the strict stabilization but is stronger
than the classical notion of self-stabilization (i.e. every topology-aware strictly stabilizing protocol is
ANR SPADES. 08-ANR-SEGI-025
Page 36
D2.4
self-stabilizing, but not necessarily strictly stabilizing).
To demonstrate the possibility and effectiveness of our notion of topology-aware strict stabilization,
we consider maximum metric tree construction. It is shown in [48] that there exists no strictly stabilizing protocol with a constant containment radius for this problem. In this chapter, we provide a
topology-aware strictly stabilizing protocol for maximum metric tree construction and we prove that the
containment area of this protocol is optimal.
4.2
Distributed System
A distributed system S = (V, E) consists of a set V = {v1 , v2 , . . . , vn } of processes and a set E of
bidirectional communication links (simply called links). A link is an unordered pair of distinct processes.
A distributed system S can be regarded as a graph whose vertex set is V and whose link set is E, so we
use graph terminology to describe a distributed system S.
Processes u and v are called neighbors if (u, v) ∈ E. The set of neighbors of a process v is denoted
by Nv , and its cardinality (the degree of v) is denoted by ∆v (= |Nv |). The degree ∆ of a distributed
system S = (V, E) is defined as ∆ = max{∆v | v ∈ V }. We do not assume existence of a unique
identifier for each process. Instead we assume each process can distinguish its neighbors from each
other by locally arranging them in some arbitrary order: the k-th neighbor of a process v is denoted by
Nv (k) (1 ≤ k ≤ ∆v ). The distance between two processes u and v is the length of the shortest path
between u and v.
In this chapter, we consider distributed systems of arbitrary topology. We assume that a single
process is distinguished as a root, and all the other processes are not distinguishable.
We adopt the shared state model as a communication model in this chapter, where each process can
directly read the states of its neighbors.
The variables that are maintained by processes denote process states. A process may take actions
during the execution of the system. An action is simply a function that is executed in an atomic manner
by the process. The actions executed by each process are described by a finite set of guarded actions
of the form hguardi −→ hstatementi. Each guard of process u is a boolean expression involving the
variables of u and its neighbors.
A global state of a distributed system is called a configuration and is specified by a product of states
of all processes. We define C to be the set of all possible configurations of a distributed system S. For a
R
process set R ⊆ V and two configurations ρ and ρ0 , we denote ρ 7→ ρ0 when ρ changes to ρ0 by executing
an action of each process in R simultaneously. Notice that ρ and ρ0 can be different only in the states
of processes in R. For completeness of execution semantics, we should clarify the configuration resulting
from simultaneous actions of neighboring processes. The action of a process depends only on its state at
ρ and the states of its neighbors at ρ, and the result of the action reflects on the state of the process at
ρ0 .
A schedule of a distributed system is an infinite sequence of process sets. Let Q = R1 , R2 , . . . be a
schedule, where Ri ⊆ V holds for each i (i ≥ 1). An infinite sequence of configurations e = ρ0 , ρ1 , . . .
Ri
is called an execution from an initial configuration ρ0 by a schedule Q, if e satisfies ρi−1 7→ ρi for
each i (i ≥ 1). Process actions are executed atomically, and we also assume that a distributed daemon
schedules the actions of processes, i.e. any subset of processes can simultaneously execute
their actions. A more constrained daemon is the central one which must choose only one enabled
process at each step. Note that, as the central daemon allows executions that are also allowed under
the distributed daemon, an impossibility result under the central daemon is stronger than one under the
distributed one. In the same way, a possibility result under the distributed daemon is stronger than one
under the central one.
The set of all possible executions
from ρ0 ∈ C is denoted by Eρ0 . The set of all possible executions
S
is denoted by E, that is, E = ρ∈C Eρ . We consider asynchronous distributed systems where we can
make no assumption on schedules except that any schedule is weakly fair : every process is contained in
infinite number of subsets appearing in any schedule.
08-ANR-SEGI-025
Page 37
D2.4
In this chapter, we consider (permanent) Byzantine faults: a Byzantine process (i.e. a Byzantinefaulty process) can make arbitrary behavior independently from its actions. In other words, a Byzantine
process has always an enabled rule and the daemon arbitrarily chooses a new state for this process when
this process is activated.
If v is a Byzantine process, v can repeatedly change its variables arbitrarily. The only restriction we
do on Byzantine processes is that the root process can never be Byzantine.
4.3
Self-Stabilizing Protocol Resilient to Byzantine Faults
Problems considered in this chapter are so-called static problems, i.e. they require the system to find static
solutions. For example, the spanning-tree construction problem is a static problem, while the mutual
exclusion problem is not. Some static problems can be defined by a local specification predicate (shortly,
specification), spec(v), for each process v: a configuration is a desired one (with a solution) if every
process v ∈ V satisfies spec(v) in this configuration. A specification spec(v) is a boolean expression on
variables of Vv (⊆ V ) where Vv is the set of processes whose variables appear in spec(v). The variables
appearing in the specification are called output variables (shortly, O-variables). In what follows, we
consider a static problem defined by a local specification predicate.
Self-Stabilization. A self-stabilizing protocol ([20]) is a protocol that eventually reaches a legitimate
configuration, where spec(v) holds at every process v, regardless of the initial configuration. Once
it reaches a legitimate configuration, every process never changes its O-variables and always satisfies
spec(v). From this definition, a self-stabilizing protocol is expected to tolerate any number and any type
of transient faults since it can eventually recover from any configuration affected by the transient faults.
However, the recovery from any configuration is guaranteed only when every process correctly executes
its action from the configuration, i.e., we do not consider existence of permanently faulty processes.
Strict stabilization. When (permanent) Byzantine processes exist, Byzantine processes may not satisfy
spec(v). In addition, correct processes near the Byzantine processes can be influenced and may be unable
to satisfy spec(v). Nesterenko and Arora [48] define a strictly stabilizing protocol as a self-stabilizing
protocol resilient to unbounded number of Byzantine processes.
Given an integer c, a c-correct process is a process defined as follows.
Definition 9 (c-correct process) A process is c-correct if it is correct ( i.e. not Byzantine) and located
at distance more than c from any Byzantine process.
Definition 10 ((c, f )-containment) A configuration ρ is (c, f )-contained for specification spec if,
given at most f Byzantine processes, in any execution starting from ρ, every c-correct process v always
satisfies spec(v) and never changes its O-variables.
The parameter c of Definition 10 refers to the containment radius defined in [48]. The parameter
f refers explicitly to the number of Byzantine processes, while [48] dealt with unbounded number of
Byzantine faults (that is f ∈ {0 . . . n}).
Definition 11 ((c, f )-strict stabilization) A protocol is (c, f )-strictly stabilizing for specification spec
if, given at most f Byzantine processes, any execution e = ρ0 , ρ1 , . . . contains a configuration ρi that is
(c, f )-contained for spec.
An important limitation of the model of [48] is the notion of r-restrictive specifications. Intuitively,
a specification is r-restrictive if it prevents combinations of states that belong to two processes u and
v that are at least r hops away. An important consequence related to Byzantine tolerance is that the
containment radius of protocols solving those specifications is at least r. For any global problem, such
as the spanning tree construction we consider in this chapter, r can not be bounded to a constant. The
results of [48] show us that there exists no (o(n), 1)-strictly stabilizing protocol for these problems and
especially for the spanning tree construction.
08-ANR-SEGI-025
Page 38
D2.4
Topology-aware strict stabilization. In the former paragraph, we saw that there exist a number of
impossibility results on strict stabilization due to the notion of r-restrictive specifications. To circumvent
this impossibility result, we define here a new notion, which is weaker than the strict stabilization: the
topology-aware strict stabilization (denoted by TA-strict stabilization for short). Here, the requirement
to the containment radius is relaxed, i.e. the set of processes which may be disturbed by Byzantine ones
is not reduced to the union of c-neighborhood of Byzantine processes but can be defined depending on
the topology of the system and on Byzantine processes location.
In the following, we give formal definition of this new kind of Byzantine containment. From now,
B denotes the set of Byzantine processes and SB (which is a function of B) denotes a subset of V
(intuitively, this set gathers all processes which may be disturbed by Byzantine processes).
Definition 12 (SB -correct node) A node is SB -correct if it is a correct node ( i.e. not Byzantine)
which does not belong to SB .
Definition 13 (SB -legitimate configuration) A configuration ρ is SB -legiti- mate for spec if every
SB -correct node v is legitimate for spec ( i.e. if spec(v) holds).
Definition 14 ((SB , f )-topology-aware containment) A configuration ρ0 is (SB , f )-topology-aware
contained for specification spec if, given at most f Byzantine processes, in any execution e = ρ0 , ρ1 , . . .,
every configuration is SB -legitima- te and every SB -correct process never changes its O-variables.
The parameter SB of Definition 14 refers to the containment area. Any process which belongs to this
set may be infinitely disturbed by Byzantine processes. The parameter f refers explicitly to the number
of Byzantine processes.
Definition 15 ((SB , f )-topology-aware strict stabilization) A protocol is (SB , f )-topology aware
strictly stabilizing for specification spec if, given at most f Byzantine processes, any execution e =
ρ0 , ρ1 , . . . contains a configuration ρi that is (SB , f )-topology-aware contained for spec.
Note that, if B denotes the set of Byzantine processes and SB = {v ∈ V |min{d(v, b), b ∈ B} ≤ c},
then a (SB , f )-topology-aware strictly stabilizing protocol is a (c, f )-strictly stabilizing protocol. Then,
a TA-strictly stabilizing protocol is generally weaker than a strictly stabilizing one, but stronger than
a classical self-stabilizing protocol (that may never meet its specification in the presence of Byzantine
processes).
The parameter SB is introduced to quantify the strength of fault containment, we do not require
each process to know the actual definition of the set. Actually, the protocol proposed in this chapter
assumes no knowledge on this parameter.
4.4
Maximum Metric Tree Construction
In this work, we deal with maximum (routing) metric spanning trees as defined in [36] (note that [40]
provides a self-stabilizing solution to this problem). Informally, the goal of a routing protocol is to
construct a tree that simultaneously maximizes the metric values of all of the nodes with respect to
some total ordering ≺. In [36], authors give a general definition of a routing metric and provide a
characterization of maximizable metrics, that is metrics which always allow to construct a maximum
(routing) metric spanning trees. In the following, we recall all definitions and notations introduced in
[36].
Definition 16 (Routing metric) A routing metric is a five-tuple (M, W, met, mr, ≺) where:
- M is a set of metric values,
08-ANR-SEGI-025
Page 39
D2.4
- W is a set of edge weights,
- met is a metric function whose domain is M × W and whose range is M ,
- mr is the maximum metric value in M with respect to ≺ and is assigned to the root of the system,
- ≺ is a less-than total order relation over M that satisfies the following three conditions for arbitrary
metric values m, m0 , and m00 in M :
- irreflexivity: m 6≺ m,
- transitivity : if m ≺ m0 and m0 ≺ m00 then m ≺ m00 ,
- totality: m ≺ m0 or m0 ≺ m or m = m0 .
Any metric value m ∈ M \ {mr} satisfies the utility condition (that is, there exist w0 , . . . , wk−1 in
W and m0 = mr, m1 , . . . , mk−1 , mk = m in M such that ∀i ∈ {1, . . . , k}, mi = met(mi−1 , wi−1 )).
For instance, we provide the definition of three classical metrics with this model: the shortest path
metric (SP), the flow metric (F), and the reliability metric (R).
SP
where
=
(M1 , W1 , met1 , mr1 , ≺1 )
M1 = N
W1 = N
met1 (m, w) = m + w
mr1 = 0
≺1 is the classical > relation
R
where
=
F
where
=
(M2 , W2 , met2 , mr2 , ≺2 )
mr2 ∈ N
M2 = {0, . . . , mr2 }
W2 = {0, . . . , mr2 }
met2 (m, w) = min{m, w}
≺2 is the classical < relation
(M3 , W3 , met3 , mr3 , ≺3 )
M3 = [0, 1]
W3 = [0, 1]
met3 (m, w) = m ∗ w
mr3 = 1
≺3 is the classical < relation
Definition 17 (Assigned metric) An assigned metric over a system S is a six-tuple (M, W, met,
mr, ≺, wf ) where (M, W, met, mr, ≺) is a metric and wf is a function that assigns to each edge of S a
weight in W .
Let a rooted path (from v) be a simple path from a process v to the root r. The next set of definitions
are with respect to an assigned metric (M, W, met, mr, ≺, wf ) over a given system S.
Definition 18 (Metric of a rooted path) The metric of a rooted path in S is the prefix sum of met
over the edge weights in the path and mr.
For example, if a rooted path p in S is vk , . . . , v0 with v0 = r, then the metric of p is mk =
met(mk−1 , wf ({vk , vk−1 })) with ∀i ∈ {1, k − 1}, mi = met(mi−1 , wf ({vi , vi−1 })) and m0 = mr.
Definition 19 (Maximum metric path) A rooted path p from v in S is called a maximum metric
path with respect to an assigned metric if and only if for every other rooted path q from v in S, the
metric of p is greater than or equal to the metric of q with respect to the total order ≺.
08-ANR-SEGI-025
Page 40
D2.4
Definition 20 (Maximum metric of a node) The maximum metric of a node v 6= r (or simply
metric value of v) in S is defined by the metric of a maximum metric path from v. The maximum metric
of r is mr.
Definition 21 (Maximum metric tree) A spanning tree T of S is a maximum metric tree with respect to an assigned metric over S if and only if every rooted path in T is a maximum metric path in S
with respect to the assigned metric.
The goal of the work of [36] is the study of metrics that always allow the construction of a maximum
metric tree. More formally, the definition follows.
Definition 22 (Maximizable metric) A metric is maximizable if and only if for any assignment of
this metric over any system S, there is a maximum metric tree for S with respect to the assigned metric.
Note that [40] provides a self-stabilizing protocol to construct a maximum metric tree with respect
to any maximizable metric. Moreover, [36] provides a fully characterization of maximazable metrics as
follow.
Definition 23 (Boundedness) A metric (M, W, met, mr, ≺) is bounded if and only if: ∀m ∈ M, ∀w ∈
W, met(m, w) ≺ m or met(m, w) = m
Definition 24 (Monotonicity) A metric (M, W, met, mr, ≺) is monotonic if and only if: ∀(m, m0 ) ∈
M 2 , ∀w ∈ W, m ≺ m0 ⇒ (met(m, w) ≺ met(m0 , w) or met(m, w) = met(m0 , w))
Theorem 9 (Characterization of maximizable metrics [36]) A metric is maximizable if and only
if this metric is bounded and monotonic.
Given a maximizable metric M = (M, W, mr, met, ≺), the aim of this work is to construct a maximum
metric tree with respect to M which spans the system in a self-stabilizing way in a system subject to
permanent Byzantine failures. It is obvious that these Byzantine processes may disturb some correct
processes. It is why, we relax the problem in the following way: we want to construct a maximum metric
forest with respect to M. The root of any tree of this forest must be either the real root or a Byzantine
process.
Each process v has three O-variables: a pointer to its parent in its tree (prntv ∈ Nv ∪ {⊥}), a level
which stores its current metric value (levelv ∈ M ), and a variable which stores its distance to the root
of its tree (distv ∈ {0, . . . , D}). Obviously, Byzantine process may disturb (at least) their neighbors. We
use the following specification of the problem.
We introduce new notations as follows. Given an assigned metric (M, W, met, mr, ≺, wf ) over the
system S and two processes u and v, we denote by µ(u, v) the maximum metric of node u when v plays
the role of the root of the system. If u and v are two neighor processes, we denote by wu,v the weight of
the edge {u, v} (that is, the value of wf ({u, v})).
Definition 25 (M-path) Given an assigned metric M = (M, W, mr, met, ≺, wf ) over a system S,
a M-path is a path (v0 , . . . , vk ) (k ≥ 1) such that: (i) prntv0 = ⊥, levelv0 = mr, distv0 = 0, and
v0 ∈ B ∪ {r}, (ii) ∀i ∈ {1, . . . , k}, prntvi = vi−1 , levelvi = met(levelvi−1 , wvi ,vi−1 ), and distvi = i,
(iii) ∀i ∈ {1, . . . , k}, met(levelvi−1 , wvi ,vi−1 ) = max≺ {met(levelu , wvi ,u ), u ∈ Nvi }, and (iv) levelvk =
µ(vk , v0 ).
We define the specification predicate spec(v) of the maximum metric tree construction with respect
to a maximizable metric M as follows.
(
prntv = ⊥, levelv = mr, and distv = 0 if v is the root r
spec(v) :
there exists a M-path (v0 , . . . , vk ) such that vk = v otherwise
08-ANR-SEGI-025
Page 41
D2.4
Figure 4.1: Examples of containment areas for flow spanning tree construction.
Figure 4.2: Examples of containment areas for reliability spanning tree construction.
Following discussion of Section 4.3 and results from [48], it is obvious that there exists no strictly
stabilizing protocol for this problem. It is why we consider the weaker notion of topology-aware strict
stabilization. First, we show an impossibility result in order to define the best possible containment area.
Then, we provide a maximum metric tree construction protocol which is (SB , f )-TA-strictly stabilizing
where f ≤ n − 1 which matches these optimal containment area. From now, SB denotes this optimal
containment area, i.e.:
SB = {v ∈ V \ B |µ(v, r) max≺ {µ(v, b), b ∈ B} } \ {r}
Intuitively, Byzantine faults may disturb only processes that are (non strictly) closer from a Byzantine process than the root with respect to the metric. Figures 4.1 and 4.2 provide some examples of
containment areas with respect to two maximizable metrics.
We introduce here a new definition that is used in the following.
Definition 26 (Fixed point) A metric value m is a fixed point of a metric M = (M, W, mr, met, ≺)
if m ∈ M and if for all value w ∈ W , we have: met(m, w) = m.
4.4.1
Impossibility Result
In this section, we show that there exist some constraints on the containment area of any topology-aware
strictly stabilizing protocol for the maximum metric tree construction depending on the metric.
Theorem 10 Given a maximizable metric M = (M, W, mr, met, ≺), even under the central daemon,
there exists no (AB , 1)-TA-strictly stabilizing protocol for maximum metric spanning tree construction
with respect to M where AB SB .
Proof 18 Let M = (M, W, mr, met, ≺) be a maximizable metric and P be a (AB , 1)-TA-strictly stabilizing protocol for maximum metric spanning tree construction protocol with respect to M where AB SB .
We must distinguish the following cases:
Case 1: |M | = 1.
Denote by m the metric value such that M = {m}. For any system and for any process v 6= r, we
have µ(v, r) = min≺ {µ(v, b), b ∈ B} = m. Consequently, SB = V \ (B ∪ {r}) for any system.
Consider the following system: V = {r, u, v, b} and E = {{r, u}, {u, v}, {v, b}} (b is a Byzantine
process). As SB = {u, v} and AB
SB , we have: u ∈
/ AB or v ∈
/ AB . Consider now the following
configuration ρ00 : prntr = prntb = ⊥, prntv = b, prntu = v, levelr = levelu = levelv = levelb = m,
distr = distb = 0, distv = 1 and distu = 2 (other variables may have arbitrary values). Note that ρ00 is
AB -legitimate for spec (whatever AB is).
Assume now that b behaves as a correct process with respect to P. Then, by convergence of P in
a fault-free system starting from ρ00 which is not legitimate (remember that a strictly-stabilizing protocol
is a special case of a self-stabilizing protocol), we can deduce that the system reaches in a finite time a
configuration ρ01 in which: prntr = ⊥, prntu = r, prntv = u, prntb = v, levelr = levelu = levelv =
08-ANR-SEGI-025
Page 42
D2.4
levelb = m, distr = 0, distu = 1, distv = 2 and distb = 3. Note that processes u and v modify their
O-variables in this execution. This contradicts the (AB , 1)-TA-strict stabilization of P (whatever AB is).
Case 2: |M | ≥ 2.
By definition of a bounded metric, we can deduce that there exist m ∈ M and w ∈ W such that
m = met(mr, w) ≺ mr. Then, we must distinguish the following cases:
Case 2.1: m is a fixed point of M.
Consider the following system: V = {r, u, v, b}, E = {{r, u}, {u, v}, {v, b}}, wr,u = wv,b = w, and
wu,v = w0 (b is a Byzantine process). As for any w0 ∈ W , met(m, w0 ) = m (by definition of a fixed
point), we have: SB = {u, v}. Since AB SB , we have: u ∈
/ AB or v ∈
/ AB . Consider now the following
configuration ρ10 : prntr = prntb = ⊥, prntv = b, prntu = v, levelr = levelb = mr, levelu = levelv = m,
distr = distb = 0, distv = 1 and distu = 2 (other variables may have arbitrary values). Note that ρ10 is
AB -legitimate for spec (whatever AB is).
Assume now that b behaves as a correct process with respect to P. Then, by convergence of P in
a fault-free system starting from ρ10 which is not legitimate (remember that a strictly-stabilizing protocol
is a special case of a self-stabilizing protocol), we can deduce that the system reaches in a finite time a
configuration ρ11 in which: prntr = ⊥, prntu = r, prntv = u, prntb = v, levelr = mr, levelu = levelv =
levelb = m (since m is a fixed point), distr = 0, distu = 1, distv = 2 and distb = 3. Note that processes
u and v modify their O-variables in this execution. This contradicts the (AB , 1)-TA-strict stabilization
of P (whatever AB is).
Case 2.2: m is not a fixed point of M.
This implies that there exists w0 ∈ W such that: met(m, w0 ) ≺ m (remember that M is bounded).
Consider the following system: V = {r, u, v, v 0 , b}, E = {{r, u}, {u, v}, {u, v 0 }, {v, b}, {v 0 , b}}, wr,u =
wv,b = wv0 ,b = w, and wu,v = wu,v0 = w0 (b is a Byzantine process). We can see that SB = {v, v 0 }.
Since AB
SB , we have: v ∈
/ AB or v 0 ∈
/ AB . Consider now the following configuration ρ20 : prntr =
prntb = ⊥, prntv = prntv0 = b, prntu = r, levelr = levelb = mr, levelu = levelv = levelv0 = m,
distr = distb = 0, distv = distv0 = 1 and distu = 1 (other variables may have arbitrary values). Note
that ρ20 is AB -legitimate for spec (whatever AB is).
Assume now that b behaves as a correct process with respect to P. Then, by convergence of P in
a fault-free system starting from ρ20 which is not legitimate (remember that a strictly-stabilizing protocol
is a special case of a self-stabilizing protocol), we can deduce that the system reaches in a finite time
a configuration ρ21 in which: prntr = ⊥, prntu = r, prntv = prntv0 = u, prntb = v (or prntb = v 0 ),
levelr = mr, levelu = m levelv = levelv0 = met(m, w0 ) = m0 , levelb = met(m0 , w) = m00 , distr = 0,
distu = 1, distv = distv0 = 2 and distb = 3. Note that processes v and v 0 modify their O-variables in
this execution. This contradicts the (AB , 1)-TA-strict stabilization of P (whatever AB is).
4.4.2
Topology-Aware Strict Stabilizing Protocol
In this section, we provide our self-stabilizing protocol that achieves optimal containment areas to
permanent Byzantine failures for constructing a maximum metric tree for any maximizable metric
M = (M, W, met, mr, ≺). More formally, our protocol is (SB , f )-strictly stabilizing, that is optimal
with respect to the result of Theorem 10. Our protocol is borrowed from the one of [40] (which is
self-stabilizing). The key idea of this protocol is to use the distance variable (upper bounded by a given
constant D) to detect and break cycles of processes which have the same maximum metric. The main
modifications we bring to this protocol follow. In the initial protocol, when a process modifies its parent,
it chooses arbitrarily one of the "better" neighbors (with respect to the metric). To achieve the (SB , f )TA-strict stabilization, we must ensure a fair selection along the set of its neighbor. We perform this
fairness with a round-robin order along the set of neighbors. The second modification is to give priority
to rules (R2 ) and (R3 ) over (R1 ) for any correct non root process (that is, such a process which has
(R1 ) and another rule enabled in a given configuration always executes the other rule if it is activated).
Our solution is presented as Algorithm 3.
In the following, we provide a sketch1 of the proof of the TA-strict stabilization of SSMAX .
1
Due to the lack of place, formal proofs are omitted. A full version of this work is available in the companion
08-ANR-SEGI-025
Page 43
D2.4
Algorithm 3 SSMAX : A TA-strictly stabilizing protocol for maximum metric tree construction.
Data:
Nv : totally ordered set of neighbors of v.
D: upper bound of the number of processes in a simple path.
Variables:
(
prntv
= ⊥ if v = r
∈ Nv if v 6= r
: pointer on the parent of v in the tree.
levelv ∈ {m ∈ M |m mr}: metric of the node.
distv ∈ {0, . . . , D}: distance to the root.
Macro:
For any subset A ⊆ Nv , choose(A) returns the first element of A which is bigger than prntv (in a round-robin
fashion).
Rules:
(Rr ) :: (v = r) ∧ ((levelv 6= mr) ∨ (distv 6= 0)) −→ levelv := mr; distv := 0
(R1 ) :: (v 6= r) ∧ (prntv ∈ Nv )∧
((distv 6= min(distprntv + 1, D)) ∨ (levelv 6= met(levelprntv , wv,prntv )))
−→ distv := min(distprntv + 1, D); levelv := met(levelprntv , wv,prntv )
(R2 ) :: (v 6= r) ∧ (distv = D) ∧ (∃u ∈ Nv , distu < D − 1)
−→ prntv := choose({u ∈ Nv |distv < D − 1}); distv := distprntv + 1;
levelv := met(levelprntv , wv,prntv )
(R3 ) :: (v 6= r) ∧ (∃un
∈ Nv , (dist
u < D − 1) ∧ (levelv ≺ met(levelu , wu,v )))
u ∈ Nv −→ prntv := choose
(levelu < D − 1) ∧ (met(levelu , wu,v ) =
max≺
q∈Nv /levelq <D−1
o
{met(levelq , wq,v )}) ;
levelv := met(levelprntv , wprntv ,v ); distv := distprntv + 1
08-ANR-SEGI-025
Page 44
D2.4
Remember that the real root r can not be a Byzantine process by hypothesis. Note that the
subsystem whose set of nodes is (V \ SB ) \ B is connected respectively by boundedness of the
metric.
Given ρ ∈ C and m ∈ M , let us define the following predicate:
IMm (ρ) ≡ ∀v ∈ V, levelv max≺ {m, max≺ {µ(v, u), u ∈ B ∪ {r}}}
If we take a configuration ρ ∈ C such that IMm (ρ) holds for a given m ∈ M , then we can
prove that the boundedness of M implies that, for any step ρ 7→ ρ0 of SSMAX , IMm (ρ0 ) holds.
Hence, we can deduce that:
Lemma 14 For any metric value m ∈ M , the predicate IMm is closed by actions of SSMAX .
Given an assigned metric to a system S, observe that the set of metrics value M is finite and
that we can label elements of M by m0 = mr, m1 , . . . , mk such that ∀i ∈ {0, . . . , k − 1}, mi+1 ≺
mi . We introduce the following notations:
∀mi ∈ M, Pmi
∀mi ∈ M,
Vmi
=
v ∈ (V \ SB ) \ B µ(v, r) = mi
=
i
S
Pm j
j=0
∀mi ∈ M,
Imi = v ∈ Vmax≺ {µ(v, u), u ∈ B ∪ {r}} ≺ m
i
∀mi ∈ M, LC mi = ρ ∈ C (∀v ∈ Vmi , spec(v)) ∧ (IMmi (ρ))
LC = LC mk
If we consider a configuration ρ ∈ LC mi for a given metric value mi and a process v ∈ Vmi ,
then we can show from the closure of IMmi (established in Lemma 14), the boundedness of M
and the construction of the protocol that v is not enabled in ρ. Then, the closure of IMmi is
sufficient to conclude that:
Lemma 15 For any mi ∈ M , the set LC mi is closed by actions of SSMAX .
Lemma 15 applied to LC = LC mk gives us the following result:
Lemma 16 Any configuration of LC is (SB , n − 1)-TA contained for spec.
This lemma establishes the closure of SSMAX . To prove the TA strict stabilization of
SSMAX , it remains to prove its convergence. In this goal, we prove that any execution starting
from an arbitrary configuration of C converges to LC m0 = LC mr and then to LC m1 and so on
until LC mk = LC.
Note that IMmr is satisfied by any configuration of C and that if all processes of Pmr are
not enabled in a configuration then this configuration belongs to LC mr . Then, we can prove
that any process of Pmr takes only a finite number of steps in any execution. This implies the
following result:
Lemma 17 Starting from any configuration of C, any execution of SSMAX reaches in a finite
time a configuration of LC mr .
technical report (see [29]).
08-ANR-SEGI-025
Page 45
D2.4
Given a metric value mi ∈ M and a configuration ρ0 ∈ LC mi , assume that e = ρ0 , ρ1 , . . . is
an execution of SSMAX starting from ρ0 . We define then the following variant function. For
any configuration ρj of e, we denote by Aj the set of processes v of Imi such that levelv = mi
in ρj . Then, we define f (ρj ) = min{distv , v ∈ Aj }. We can prove that there exists an integer k
such that f (ρk ) = D. This implies the following lemma:
Lemma 18 For any mi ∈ M and for any configuration ρ ∈ LC mi , any execution of SSMAX
starting from ρ reaches in a finite time a configuration such that ∀v ∈ Imi , levelv = mi ⇒
distv = D.
Given a metric value mi ∈ M , consider a configuration ρ0 ∈ LC mi such that ∀v ∈
Imi , levelv = mi ⇒ distv = D. Assume that e = ρ0 , ρ1 , . . . is an execution of SSMAX starting
from ρ0 . For any configuration ρi of e, we define the following set Eρi = {v ∈ Imi |levelv = mi }.
First, we prove that there exists an integer k such that for any integer j ≥ k, we have Eρj+1 ⊆ Eρj .
In other words, there exists a point of the execution afterwards the set E can not grow. Moreover, we prove that if a process of Eρj (j ≥ k) is activated during the step ρj 7→ ρj+1 , then it
satisfies v ∈
/ Eρj+1 . Finally, we observe that any process v ∈ Imi such that distv = D is activated
in a finite time. In conclusion, we obtain that there exists an integer j such that Eρj = ∅. In
other words, we have:
Lemma 19 For any mi ∈ M and for any configuration ρ ∈ LC mi such that
∀v ∈ Imi , levelv = mi ⇒ distv = D, any execution of SSMAX starting from ρ reaches in a
finite time a configuration such that ∀v ∈ Imi , levelv ≺ mi .
A direct consequence of Lemmas 18 and 19 is the following.
Lemma 20 For any mi ∈ M and for any configuration ρ ∈ LC mi , any execution of SSMAX
starting from ρ reaches in a finite time a configuration ρ0 such that IMmi+1 (ρ0 ) holds.
Given a metric value mi ∈ M , consider a configuration ρ ∈ LC mi . We know by Lemma 20
that any execution starting from ρ reaches in a finite time a configuration ρ0 such that IMmi+1 (ρ0 )
holds. Denote by e an execution starting from ρ0 . Now, we can observe that, if all processes of
Pmi+1 are not enabled in a configuration of e, then this configuration belongs to LC mi+1 . Then,
we can prove that any process of Pmi+1 takes only a finite number of steps in any execution
starting from ρ0 . This implies the following result:
Lemma 21 For any mi ∈ M and for any configuration ρ ∈ LC mi , any execution of SSMAX
starting from ρ reaches in a finite time a configuration of LC mi+1 .
Let ρ be an arbitrary configuration. We know by Lemma 17 that any execution starting
from ρ reaches in a finite time a configuration of LC mr = LC m0 . Then, we can apply at most
k times the result of Lemma 21 to obtain that any execution starting from ρ reaches in a finite
time a configuration of LC mk = LC, that proves the following result:
Lemma 22 Starting from any configuration, any execution of SSMAX reaches a configuration
of LC in a finite time.
Lemmas 16 and 22 imply respectively the closure and the convergence of SSMAX . We can
summarize our results with the following theorem.
Theorem 11 SSMAX is a (SB , n − 1)-TA strictly stabilizing protocol for spec.
08-ANR-SEGI-025
Page 46
D2.4
4.5
Conclusion
We introduced a new notion of Byzantine containment in self-stabilization: the topology-aware
strict stabilization. This notion relaxes the constraint on the containment radius of the strict
stabilization to a containment area. In other words, the set of correct processes which may be
infinitely often disturbed by Byzantine processes is a function depending on the topology of the
system and on the actual location of Byzantine processes. We illustrated the relevance of this
notion by providing a topology-aware strictly stabilizing protocol for the maximum metric tree
construction problem which does not admit strictly stabilizing solution. Moreover, our protocol
performs the optimal containment area with respect to the topology-aware strict stabilization.
Our work raises some opening questions. Number of problems do not accept strictly stabilizing solution. Does any of them admit a topology-aware strictly stabilizing solution ? Is it
possible to give a necessary and/or sufficient condition for a problem to admit a topology-aware
strictly stabilizing solution ? What happens if we consider only bounded Byzantine behavior ?
08-ANR-SEGI-025
Page 47
D2.4
Bibliography
[1] Eytan Adar and Bernardo A. Huberman. Free riding on gnutella. In First Monday, 2000.
[2] Yehuda Afek and Anat Bremler. Self-stabilizing unidirectional network algorithms by powersupply. In SODA ’97: Proceedings of the eighth annual ACM-SIAM symposium on Discrete
algorithms, pages 111–120. Society for Industrial and Applied Mathematics, 1997.
[3] Yehuda Afek and Anat Bremler-Barr. Self-stabilizing unidirectional network algorithms by
power supply. Chicago J. Theor. Comput. Sci., 1998, 1998.
[4] Yehuda Afek and Shlomi Dolev. Local stabilizer. J. Parallel Distrib. Comput., 62(5):745–
765, 2002.
[5] Joffroy Beauquier, Sylvie Delaët, Shlomi Dolev, and Sébastien Tixeuil. Transient fault
detectors. Distributed Computing, 20(1):39–51, June 2007.
[6] Michael Ben-Or, Danny Dolev, and Ezra N. Hoch. Fast self-stabilizing byzantine tolerant
digital clock synchronization. In Rida A. Bazzi and Boaz Patt-Shamir, editors, PODC,
pages 385–394. ACM, 2008.
[7] Samuel Bernard, Stéphane Devismes, Katy Paroux, Maria Potop-Butucaru, and Sébastien
Tixeuil. Probabilistic self-stabilizing vertex coloring in unidirectional anonymous networks.
In Proceedings of ICDCN 2010, Lecture Notes in Computer Science, Kolkata, India, January
2010. Springer Berlin / Heidelberg.
[8] Samuel Bernard, Stéphane Devismes, Maria Gradinariu Potop-Butucaru, and Sébastien
Tixeuil. Optimal deterministic self-stabilizing vertex coloring in unidirectional anonymous
networks. In Proceedings of the IEEE International Conference on Parallel and Distributed
Processing Systems (IPDPS 2009), Rome, Italy, May 2009. IEEE Press.
[9] George Bosilca, Camille Coti, Thomas Herault, Pierre Lemarinier, and Jack Dongarra. Constructing resiliant communication infrastructure for runtime environments. In Proceedings
of the Abstracts of Parco’09, page to appear, 2009.
[10] B. Bourgon, A.K. Datta, and V. Natarajan. A self-stabilizing ranking algorithm for tree
structured networks. In Computers and Communications, 1995. Conference Proceedings of
the 1995 IEEE Fourteenth Annual International Phoenix Conference on, pages 23–28, Mar
1995.
[11] Darius Buntinas, George Bosilca, Richard L. Graham, Geoffroy Vall?, and Gregory R.
Watson. A scalable tools communications infrastructure. High Performance Computing
Systems and Applications, Annual International Symposium on, 0:33–39, 2008.
ANR SPADES. 08-ANR-SEGI-025
Page 48
D2.4
[12] Ralph Butler, William Gropp, and Ewing Lusk. A scalable process-management environment for parallel programs. In In Euro PVM/MPI, pages 168–175. Springer-Verlag, 2000.
[13] R. H. Castain, T. S. Woodall, D. J. Daniel, J. M. Squyres, B. Barrett, and G. E. Fagg.
The open run-time environment (openrte): A transparent multicluster environment for
high-performance computing. Future Gener. Comput. Syst., 24(2):153–157, 2008.
[14] Jorge Arturo Cobb and Mohamed G. Gouda. Stabilization of routing in directed networks.
In Ajoy Kumar Datta and Ted Herman, editors, WSS, volume 2194 of Lecture Notes in
Computer Science, pages 51–66. Springer, 2001.
[15] Ariel Daliot and Danny Dolev. Self-stabilization of byzantine protocols. In Ted Herman
and Sébastien Tixeuil, editors, Self-Stabilizing Systems, volume 3764 of Lecture Notes in
Computer Science, pages 48–67. Springer, 2005.
[16] Sajal K. Das, Ajoy Kumar Datta, and Sébastien Tixeuil. Self-stabilizing algorithms in dag
structured networks. Parallel Processing Letters, 9(4):563–574, December 1999.
[17] Ajoy Kumar Datta and Maria Gradinariu, editors. Stabilization, Safety, and Security of
Distributed Systems, 8th International Symposium, SSS 2006, Dallas, TX, USA, November
17-19, 2006, Proceedings, volume 4280 of Lecture Notes in Computer Science. Springer,
2006.
[18] Sylvie Delaët, Bertrand Ducourthial, and Sébastien Tixeuil. Self-stabilization with roperators revisited. Journal of Aerospace Computing, Information, and Communication,
2006.
[19] Sylvie Delaët and Sébastien Tixeuil. Tolerating transient and intermittent failures. Journal
of Parallel and Distributed Computing, 62(5):961–981, May 2002.
[20] Edsger W. Dijkstra. Self-stabilizing systems in spite of distributed control. Commun. ACM,
17(11):643–644, 1974.
[21] Danny Dolev and Ezra N. Hoch. On self-stabilizing synchronous actions despite byzantine
attacks. In Andrzej Pelc, editor, DISC, volume 4731 of Lecture Notes in Computer Science,
pages 193–207. Springer, 2007.
[22] Shlomi Dolev and Ted Herman. Superstabilizing protocols for dynamic distributed systems.
Chicago J. Theor. Comput. Sci., 1997, 1997.
[23] Shlomi Dolev, Amos Israeli, and Shlomo Moran. Self-stabilization of dynamic systems. In
Proceedings of the MCC Workshop on Self-Stabilizing Systems, MCC Technical Report No.
STP-379-89, 1989.
[24] Shlomi Dolev, Amos Israeli, and Shlomo Moran. Uniform dynamic self-stabilizing leader
election. IEEE Transactions on Parallel and Distributed Systems, 8(4):424–440, 1997.
[25] Shlomi Dolev and Elad Schiller. Self-stabilizing group communication in directed networks.
Acta Inf., 40(9):609–636, 2004.
[26] Shlomi Dolev and Jennifer L. Welch. Self-stabilizing clock synchronization in the presence
of byzantine faults. J. ACM, 51(5):780–799, 2004.
08-ANR-SEGI-025
Page 49
D2.4
[27] SC Douglas. Self-stabilized gradient algorithms for blind source separation with orthogonality constraints. IEEE Transactions on Neural Networks, 11(6):1490–1497, 2000.
[28] Swan Dubois, Toshimitsu Masuzawa, and Sebastien Tixeuil. The impact of topology on
byzantine containment in stabilization. In Proceedings of DISC 2010, Lecture Notes in
Computer Science, Boston, Massachusetts, USA, September 2010. Springer Berlin / Heidelberg.
[29] Swan Dubois, Toshimitsu Masuzawa, and Sébastien Tixeuil. The Impact of Topology on Byzantine Containment in Stabilization.
Research report inria-00481836
(http://hal.inria.fr/inria-00481836/en/), 05 2010.
c
[30] Swan Dubois, Toshimitsu Masuzawa, and SÃbastien
Tixeuil. On byzantine containment
properties of the min+1 protocol. In Proceedings of SSS 2010, Lecture Notes in Computer
Science, New York, NY, USA, September 2010. Springer Berlin / Heidelberg.
[31] Swan Dubois, Maria Gradinariu Potop-Butucaru, Mikhail Nesterenko, and Sébastien
Tixeuil. Self-stabilizing byzantine asynchronous unison. CoRR, abs/0912.0134, 2009.
[32] Bertrand Ducourthial and Sébastien Tixeuil. Self-stabilization with r-operators. Distributed
Computing, 14(3):147–162, July 2001.
[33] Bertrand Ducourthial and Sébastien Tixeuil. Self-stabilization with path algebra. Theoretical Computer Science, 293(1):219–236, February 2003. Extended abstract in Sirocco
2000.
[34] Vijay K. Garg and Anurag Agarwal. Self-stabilizing spanning tree algorithm with a new
design methodology. Technical Report TR-PDS-2004-001, University of Texas at Austin,
PDS Laboratory Technical Reports, 2004.
[35] Felix C. Gärtner. A survey of self-stabilizing spanning-tree construction algorithms. Technical Report IC/2003/38, Ecole Polytechnique Federale de Lausanne, Technical Reports in
Computer and Communication Sciences, 2003.
[36] Mohamed G. Gouda and Marco Schneider. Maximizable routing metrics. IEEE/ACM
Trans. Netw., 11(4):663–675, 2003.
[37] Maria Gradinariu and Sébastien Tixeuil. Self-stabilizing vertex coloring of arbitrary graphs.
In International Conference on Principles of Distributed Systems (OPODIS’2000), pages
55–70, Paris, France, December 2000.
[38] Maria Gradinariu and Sébastien Tixeuil. Conflict managers for self-stabilization without
fairness assumption. In Proceedings of the International Conference on Distributed Computing Systems (ICDCS 2007), page 46. IEEE, June 2007.
[39] Sandeep K. S. Gupta and Pradip K. Srimani. Self-stabilizing multicast protocols for ad hoc
networks. J. Parallel Distrib. Comput., 63(1):87–96, 2003.
[40] SKS Gupta and PK Srimani. Mobility tolerant maintenance of multi-cast tree in mobile
multi-hop radio networks. In Proceedings of the 1999 International Conference on Parallel
Processing, pages 490–497, 1999.
08-ANR-SEGI-025
Page 50
D2.4
[41] Thomas Herault, Pierre Lemarinier, Olivier Peres, Laurence Pilard, and Joffroy Beauquier.
A model for large scale self-stabilization. In 21st IEEE International Parallel & Distributed
Processing Symposium (IPDPS), 2007.
[42] Ezra N. Hoch, Danny Dolev, and Ariel Daliot. Self-stabilizing byzantine digital clock synchronization. In Datta and Gradinariu [17], pages 350–362.
[43] Leslie Lamport, Robert E. Shostak, and Marshall C. Pease. The byzantine generals problem.
ACM Trans. Program. Lang. Syst., 4(3):382–401, 1982.
[44] Toshimitsu Masuzawa and Sébastien Tixeuil. Bounding the impact of unbounded attacks
in stabilization. In Datta and Gradinariu [17], pages 440–453.
[45] Toshimitsu Masuzawa and Sébastien Tixeuil. Stabilizing link-coloration of arbitrary networks with unbounded byzantine faults. International Journal of Principles and Applications of Information Science and Technology (PAIST), 1(1):1–13, December 2007.
[46] Toshimitsu Masuzawa and Sébastien Tixeuil. Stabilizing locally maximizable tasks in unidirectional networks is hard. In Procedings of ICDCS 2010. IEEE Press, June 2010.
[47] Nathalie Mitton, Bruno Séricola, Sébastien Tixeuil, Eric Fleury, and Isabelle GuérinLassous. Self-stabilization in self-organized multihop wireless networks. Ad hoc and Sensor
Wireless Networks, January 2010.
[48] Mikhail Nesterenko and Anish Arora. Tolerance to unbounded byzantine faults. In 21st
Symposium on Reliable Distributed Systems, page 22. IEEE Computer Society, 2002.
[49] Mikhail Nesterenko and Sébastien Tixeuil. Discovering network topology in the presence of
byzantine nodes. IEEE Trans. Parallel Distrib. Syst., October 2009.
[50] Olivier Peres. Construction de topologies autostabilisante dans les systèmes à grande échelle.
PhD thesis, Univ. Paris Sud 11, 2008.
[51] Antony I. T. Rowstron and Peter Druschel. Pastry: Scalable, decentralized object location,
and routing for large-scale peer-to-peer systems. In Rachid Guerraoui, editor, Middleware,
volume 2218 of Lecture Notes in Computer Science, pages 329–350. Springer, 2001.
[52] Yusuke Sakurai, Fukuhito Ooshita, and Toshimitsu Masuzawa. A self-stabilizing linkcoloring protocol resilient to byzantine faults in tree networks. In Principles of Distributed
Systems, 8th International Conference, OPODIS 2004, volume 3544 of Lecture Notes in
Computer Science, pages 283–298. Springer, 2005.
[53] Sébastien Tixeuil. Algorithms and Theory of Computation Handbook, Second Edition, chapter Self-stabilizing Algorithms, pages 26.1–26.45. Chapman & Hall/CRC Applied Algorithms and Data Structures. CRC Press, Taylor & Francis Group, November 2009.
[54] Masafumi Yamashita and Tsunehiko Kameda. Computing on anonymous networks: Part
i-characterizing the solvable cases. IEEE Trans. Parallel Distrib. Syst., 7(1):69–89, 1996.
08-ANR-SEGI-025
Page 51