Combining Techniques Application for Tree Search Structures

Transcription

Combining Techniques Application for Tree Search Structures
RAYMOND AND BEVERLY SACKLER
FACULTY OF EXACT SCIENCES
BLAVATNIK SCHOOL OF COMPUTER SCIENCE
Combining Techniques
Application for Tree Search
Structures
Thesis submitted in partial fulfillment of requirements
for the M. Sc. degree in the School of Computer Science,
Tel-Aviv University
by
Vladimir Budovsky
The research work for this thesis has been carried out
at Tel-Aviv University under the supervision of
Prof. Yehuda Afek and Prof. Nir Shavit
June 2010
CONTENTS
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Flat Combining . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Skip Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
5
7
11
13
. . . .
JDK
. . . .
. . . .
16
4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
2. The
2.1
2.2
2.3
Flat Combined Skip Lists . . . . . . .
Naive Flat Combined Skip List . . . .
Flat Combined Skip List with Multiple
Flat Combined Skip List with ”Hints”
. . . . . . .
. . . . . . .
Combiners
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 Performance Comparison of Flat Combined Skip Lists vs
ConcurrentSkipListSet . . . . . . . . . . . . . . . . . . . .
3.2 Flat Combining Mechanism Experimental Verifications. .
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
17
22
LIST OF FIGURES
1.1
1.2
2.1
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
Skip list of heights 4. May be considered either as collection of
”fat” nodes or 2-d list . . . . . . . . . . . . . . . . . . . . . . . .
Skip list traversal with key 12. Traversed predecessors are shown.
start level is 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Multi-combiner skip list. Every node with height ≥ 3 is a combiner node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2
Naive FC skip list implementation vs JDK lock-free ConcurrentSkipListSet, uniform keys distribution . . . . . . . . . . . . . 18
Naive FC skip list implementation vs JDK lock-free ConcurrentSkipListSet, high access locality . . . . . . . . . . . . . . . . 19
Hints FC skip list implementation vs JDK lock-free ConcurrentSkipListSet, uniform keys distribution . . . . . . . . . . . . . . . . . . . . 20
Hints FC skip list implementation vs JDK lock-free ConcurrentSkipListSet, high access locality . . . . . . . . . . . . . . . . . . . . . . . 21
FC skip list implementation vs multi-lock one, naive implementations, uniform keys distribution . . . . . . . . . . . . . . . . . . 24
FC skip list implementation vs multi-lock one, naive implementations, high access locality . . . . . . . . . . . . . . . . . . . . . 25
FC skip list implementation vs multi-lock one, hints implementations, uniform keys distribution . . . . . . . . . . . . . . . . . . 26
FC skip list implementation vs multi-lock one, hints implementations, high access locality . . . . . . . . . . . . . . . . . . . . . 27
Ideal hints FC skip list implementation vs JDK lock-free ConcurrentSkipListSet, uniform keys distribution . . . . . . . . . . . . . 28
Ideal hints FC skip list implementation vs JDK lock-free ConcurrentSkipListSet, high access locality . . . . . . . . . . . . . . . . 29
Hints mechanism success rate for pure update workloads . . . . . 30
The connection between FC intensity to throughput per thread
for pure update workloads . . . . . . . . . . . . . . . . . . . . . . 30
Lock-free skip list CAS per update, CAS success rate and throughput per thread for pure update workloads . . . . . . . . . . . . . 31
LISTINGS
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
2.10
3.1
Set of Integers Interface . . . . . . . . . . . . . . . . . . . .
Flat combining definitions . . . . . . . . . . . . . . . . . . .
Node definition . . . . . . . . . . . . . . . . . . . . . . . . .
Wait free contains is the same for all skip lists . . . . . . .
add Naive implementation . . . . . . . . . . . . . . . . . . .
scanAndCombine common implementation . . . . . . . . . .
Physical add and remove Naive implementation . . . . . . .
Multi-combiner remove implementation . . . . . . . . . . .
Optimistic (hinted) FCrequest and add implementation . .
Optimistic (hinted) doAdd and verify implementation . . .
Optimistic (hinted) multi-lock add method implementation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
6
6
7
8
8
9
12
13
14
22
ACKNOWLEDGEMENTS
I would like to thank all those who made this thesis possible.
I am extremely grateful to my advisors Prof. Yehuda Afek and Prof. Nir
Shavit who introduced me to the world of multiprocessors and distributed algorithms and whose supervision and support enabled me to advance my understanding of the subject.
My sincere thanks to Ms. Moran Tzafrir for teaching me what everyday
researcher’s work is about and for supplying me with the arsenal of essential
tools for my work.
Finally, I am grateful to my family and especially to my sister Elena for the
patience and encouragement.
ABSTRACT
Flat combining (FC) is a new synchronization paradigm allowing to reduce
dramatically the synchronization costs. Use of this technique, as it was recently shown, brings significant performance gain for several popular parallel
data structures, such as stacks, queues, shared counters, etc. Besides, the combining paradigm application makes a code as simple as one synchronized via
single global lock. However, the question about applicability for other classes
of parallel data structures has not been answered yet.
This work deals with FC paradigm application to binary tree-like data structures. As it is shown below, combining is hardly suitable for these cases. The
limits for FC uses have been studied, and criterion for its applicability has been
justified.
1. INTRODUCTION
Multi and many core computers appear more and more common these days. We
witness recent developments of computer chips with tens of cores that consume
no more space and energy than a desktop processor. In the light of this trend,
the development of scalable and correct data structures becomes extremely important. The most simple and straightforward solution is to devise concurrent
data structure from sequential one using global lock as synchronization primitive. Unfortunately, this solution does not scale even for relatively small number
of cores. Another approach is to design fine-grained synchronization schemes
using multiple locks or non-blocking read-modify-write atomic operations. This
method usually requests full algorithm redesign and implementation. Additional
drawback of fine-grained and, especially, lock-free synchronization is theirs high
complexity. It is very difficult to formally prove the correctness of such data
structures (See, for example, [3] and [4] proofs).
1.1 Flat Combining
Flat combining [7] programming paradigm allows to achieve high level of concurrency while preserving of code simplicity. The main idea behind the flat
combining is to attach the public actions registry to existing sequential data
structure. Each thread, before accessing to shared data, publishes its action
request in the registry, and then tries to access the global lock. The winning
thread becomes ”combiner”, scans the registry and performs all found requests.
Other threads simply wait for theirs fulfilled actions results, spinning on thread
local Done flag.
There are several benefits of this strategy:
• Low synchronization cost, comparing to global lock since there is only
one competition round for acquiring the shared lock, and every thread winning or missing - returns with its request performed.
• The combiner can use its knowledge about all requests and fulfill part of
them without access to data structure. For stack, for example, the combiner may collect push/pop pairs and to return the results to appropriate
callers. This technique is well-known and called elimination. For shared
counter, the combiner can calculate the total counter change and update
data structure only once. This technique, called combining, is also widely
used.
The variants of FC algorithm are described in details in Chapter 2 (The Flat
Combined Skip Lists). The flat combining is proven very efficient for data
structures with ”hot spots”, such as stack head, queue ends, priority queue head,
and so on. It also shows good results when synchronization costs are high. For
1. Introduction
2
Fig. 1.1: Skip list of heights 4. May be considered either as collection of ”fat” nodes
or 2-d list
example, lock-free synchronous queues [16] demonstrate good throughput but
moderate scalability, which can be improved using elimination or FC techniques.
However, the question of FC usefulness for data structures without emphasized
bottlenecks and high synchronization costs remains opened. This work studies
the flat combining applicability for binary tree-like data structures, the ones
with O(log n) access time, allowing range operations.
1.2 Skip Lists
Tree search structures are, probably, the most popular and widespread ones. It
is hard to find computer science or software programming area that does not use
them. Their practical applications start with the most popular red-black tree
[6], which is used nearly in any algorithms library, including C++ STL library
[17] and JavaTM SDK [18], and AVL tree [1], which is very popular for search
dominated workloads, then continue with various B-trees [2], which are useful
for block-organized memories, and finish with specialized suffix tries, splay trees,
spatial search trees, persistent trees, etc. Since all of the above algorithms deal
with large amount of data, and many of them run inside various operating
systems or used as various search indexes inside databases, the distributed and
multi-threaded decisions for search trees are in focus of many researches and
commercial projects. The comprehensive survey of concurrent binary search
trees is given in [13]. The common problem with all search trees mentioned
above is that they either static (do not allow add/remove without full rebuild)
or need re-balancing mechanism after updates in order to preserve logarithmic
access time. In most of the cases, the re-balance scope is unknown prior to
update action, and that makes the design of fine-grained synchronization for
binary search trees very complicated task. That is why the skip lists were
chosen as the basic data structure for the research. There were several reasons
for the decision:
• Skip list is simple and has no re-balancing overheads, which simplifies
measures.
1. Introduction
3
Fig. 1.2: Skip list traversal with key 12. Traversed predecessors are shown. start level
is 3.
• Skip list is the only known concurrent lock-free binary search structure.
Skip list was invented [15] in 1990 as a probabilistic alternative for binary
search trees. Skip list is a linked list of ”fat” nodes (Figure 1.1), where each
one has randomly chosen height (number of levels). Every node has a unique
key, and the nodes appear ordered in the list. Each node is connected at each
level with the successor at the same level. The random level is chosen using
geometrical distribution: the probability that the node has layer i, i ≥ 0 is
1
pi , p > 1. So, every node has layer 0, and, if node has layer i, it, with the
probability of p1 , has also layer i+1. In practice, p is usually chosen between 2 to
4. Such distribution gives O(log N ) skip list maximal node height expectation,
and between every two nodes of height k, p-1 nodes of height k-1 are expected
to appear. It is useful to add two immutable nodes head and tail with highest
possible level and to manage real highest level (start level ) on every add or
delete. Alternatively, the skip list may be represented as a collection of sorted
lists with unique keys L1 , L2 , ..., Lk , such as i > j ⇒ Li ⊆ Lj and all nodes with
equal keys form ”vertical” lists. The later representation is especially convenient
for lock-free implementations, where all of the updates are implemented through
atomic read-and-updates operations.
Denote the next node to node n at level l as nextl (n), and the key of n as
key(N) The simple sequential list works in the following way:
• Initially, empty list contains head and tail with keys of −∞ and +∞
correspondingly. The head node is connected to tail at every possible
level, and actual start level is 0.
• List traversal with key k starts from node n = head at level l = start level,
and proceeds at this level searching the pair of nodes (pred, succ), such
that nextl (pred) == succ and key(pred) < k ≤ key(succ). Set l = l − 1
and n = pred and repeat the search. The process continues until 0 level
achieved. Figure 1.2 illustrates the pred nodes observed during traversal
with key 12.
• contains(k) simply calls the traversal with key k. It is unnecessary to
1. Introduction
4
proceed to the bottom, once the desired key is found, the traversal is
interrupted and found node is returned.
• add(k) starts from generating random height h, as described above. After
that, the traversal algorithm is performed, collecting h bottom pred and
suss nodes. Once the node is not found (for pure set implementation), the
new node of height h with key k is linked to collected nodes.
• remove(k) starts from the traversal run. Once, the node suss with
key(suss) == k is observed on the highest level h, all traversed pred nodes
are collected. After reaching the bottom level, all collected nexti (pred)
references are set to nexti (suss), and suss node memory is freed.
• After every update operation, start level is verified and updated, if needed.
There are two cases - when adding the node with height h > start level,
start level is set to h, and, when removing the node of start level height,
to find the highest level h such that nexth ( head) 6= tail, and to set
start level to h.
Note, that the traversal algorithm performs O(1) expected steps at each
level, and that the number of levels expected to be logarithmic to nodes’ number, and therefore, skip list has expected logarithmic access time. The above
schema, short of some small variations, is used in the most of lock based concurrent skip lists, and our implementations use it as well. The differences in
the implementations ([14], [8], [11]) are concerning locking schemes and state
flags devised for consistency, linearizability [10] and skip list invariants preserving. Lock free skip lists, in contrast, cannot maintain skip list invariants - this
approach needs multiple locations read-and-update atomic operations, unsupported on the most of existing platforms. The lock-free implementations ([5],
[9]) use relaxed skip list algorithms, where the question about node existence is
answered only on the bottom list level, and the other levels are regarded as sort
of index allowing to reach the bottom level in expected logarithmic time, and
skip list structure can be violated at the particular execution moments.
2. THE FLAT COMBINED SKIP LISTS
All our FC skip list variants are implemented both in Java and C++ with
minimal differences. C++ implementations require memory management and
explicit memory barriers, while in Java implementations the memory barriers
are introduced implicitly through volatile flags’ store/load operations. We have
chosen to present only Java implementations in order to avoid memory management issues and to have clear and standard competitor - all performance
comparisons use Java SDK lock-free ConcurrentSkipListSet [18]. The flat combined skip list implements the simplest integers’ set interface:
Listing 2.1: Set of Integers Interface
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public i n t e r f a c e S i m p l e I n t S e t {
/∗ ∗
∗ Add item t o map
∗ @param key − key t o add ;
∗ @return t r u e i f added ,
∗
f a l s e i f t h e key a l r e a d y e x i s t s on t h e map
∗/
boolean add ( int key ) ;
/∗ ∗
∗ Removes item from t h e map
∗ @param key − key t o remove ;
∗ @return t r u e i f removed ,
∗ f a l s e i f t h e key d o e s not e x i s t on t h e map
∗/
boolean remove ( int key ) ;
/∗ ∗
∗ V e r i f y i f t h e item i s on t h e map
∗ @param t h r e a d i d
∗ @param key
∗ @return t r u e i f item e x i s t s , f a l s e o t h e r w i s e
∗/
boolean c o n t a i n s ( int key ) ;
}
The add and remove methods use flat combining paradigm, while contains
method is implemented wait free. The coexistence of flat combining and wait
free methods requires special treatment for linearization points, since flat combining data is invisible for lock-free contains.
Define FCData and FCRequest:
2. The Flat Combined Skip Lists
6
Listing 2.2: Flat combining definitions
1
2
3
4
5
6
7
8
9
10
c l a s s FCRequest {
int k e y ;
// Key
boolean r e s p o n s e ;
// O p e r a t i o n r e s u l t
v o l a t i l e int o p c o d e = NONE; // Action
}
c l a s s FCData {
public FCRequest r e q u e s t s [ ] ; // S u b m i t t e d r e q u e s t s
public A t o m i c I n t e g e r l o c k ;
// FC node l o c k
}
The FCData may be attached to one or several skip list nodes The skip list
node class is:
Listing 2.3: Node definition
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
c l a s s Link {
. . .
public Link n e x t ;
public Node node ;
public Link up ;
public Link down ;
}
c l a s s Node {
. . .
public int numLevels ( ) { // Node h e i g h t
return l i n k s . l e n g t h ;
}
// Node i s FC when i t has FC d a t a
public boolean isFCNode ( ) {
return f c d a t a != null ;
}
public Link a t ( int i n d e x ) { // Get l i n k a t l e v e l
return l i n k s [ i n d e x ] ;
}
public Link bottom ( ) { // The bottom l i n k
return l i n k s [ 0 ] ;
}
public Link top ( ) { // The t o p l i n k
return l i n k s [ l i n k s . l e n g t h − 1 ] ;
}
public f i n a l int k e y ;
public v o l a t i l e boolean d e l e t e d = f a l s e ;
public v o l a t i l e boolean f u l l y c o n n e c t e d = f a l s e ;
public FCData f c d a t a ;
// 2D l i s t o f l i n k s w i t h random a c c e s s
// Link c o n t a i n s r e f e r e n c e t o next , up and down l i n k s
private Link [ ]
links ;
}
2. The Flat Combined Skip Lists
7
Till now, the skip list is the regular single threaded one, save for two details
- deleted and fully connected flags and FCData reference (which is not null
for flat combining nodes). The contais method is also very similar to single
threaded implementation:
Listing 2.4: Wait free contains is the same for all skip lists
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public boolean c o n t a i n s ( ) {
int l e v e l = s t a r t l e v e l ; // A d o p t a b l e s t a r t l e v e l
Link pred = head . a t ( l e v e l ) ;
Link c u r r = null ;
f or ( ; l e v e l >= 0 ; −−l e v e l , pred = pred . down ) {
c u r r = pred . n e x t ;
while ( inKey > c u r r . node . k e y ) {
pred = c u r r ;
c u r r = pred . n e x t ;
}
i f ( inKey == c u r r . node . key )
return ( ! c u r r . node . d e l e t e d &&
c u r r . node . f u l l y c o n n e c t e d ) ;
}
return f a l s e ;
}
The only distinguishing detail is the check of deleted and fully connected flags.
The difference comes with add and remove implementations. We will present
implementations for several flat combined lists variants.
2.1 Naive Flat Combined Skip List
The first simplest implementation is Naive FC list. It has exactly one combiner
node (the head one). The thread performing add or remove action:
1. Puts its FCRequest into head node FCData.
2. Tries to acquire lock.
3. If succeeded, scans and fulfills the requests
4. Else, the thread spins on its own request completion flag and checks lock
state. If request fulfilled, the thread returns with desired result, otherwise,
if lock is unlocked, continue from 2.
The Listing 2.5 presents add method implementation.
2. The Flat Combined Skip Lists
8
Listing 2.5: add Naive implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
public boolean add ( int key ) {
// Put my r e q u e s t t o node ’ s f c d a t a
FCRequest m y r e q u e s t =
head . f c d a t a . r e q a r y [ ThreadId . g e t T h r e a I d ( ) ] ;
m y r e q u e s t . key = key ;
// V o l a t i l e w r i t e , from h e r e combiner s e e s i t
m y r e q u e s t . o p c o d e = ADD;
AtomicInteger lock = fc node . f c d a t a . l o c k ;
do {
i f ( 0 == l o c k . g e t ( ) &&
// TTAS l o c k
l o c k . compareAndSet ( 0 , 0xFF ) ) {
// Perform a l l found r e q u e s t s
scanAndCombine ( f c n o d e ) ;
l o c k . s e t ( 0 ) ; // Unlock
return m y r e q u e s t . r e s p o n s e ;
} else {
do {
Thread . y i e l d ( ) ; // Give up p r o c e s s o r
// Somebody d i d my work
i f ( m y r e q u e s t . o p c o d e == NONE)
return m y r e q u e s t . r e s p o n s e ;
} while ( 0 != l o c k . g e t ( ) ) ;
}
} while ( true ) ;
}
The remove method differs from the above one only by REMOVE opcode All
the work is performed within scanAndCombine method, which is the same for
all following implementations:
Listing 2.6: scanAndCombine common implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
protected void scanAndCombine ( Node f c n o d e ) {
f or ( FCRequest c u r r r e q : f c n o d e . f c d a t a . r e q u e s t s ) {
switch ( c u r r r e q . o p c o d e ) {
case ADD:
c u r r r e q . r e s p o n s e = doAdd ( f c n o d e , c u r r r e q . key ,
curr req . pred ary , curr req . succ ary ) ;
c u r r r e q . o p c o d e = NONE; // R e l e a s e w a i t i n g t h r e a d
break ;
case REMOVE:
c u r r r e q . r e s p o n s e=doRemove ( f c n o d e , c u r r r e q . key ,
curr req . pred ary , curr req . succ ary ) ;
c u r r r e q . o p c o d e=NONE; // R e l e a s e w a i t i n g t h r e a d
break ;
}
}
}
2. The Flat Combined Skip Lists
9
Here, the combiner thread scans all requests and performs modifications. Both
doAdd and doRemove methods receive the containers for predecessors and successors nodes - technical detail which allows reusing of the memory in case of
Naive list, but which is used in different way in other implementations. Beside
this, fc node parameter indicates the start node for search - it is not relevant
for single combiner list, but important to multi-combiner one, described below.
The doAdd/doRemove methods act exactly as in case of single threaded skip
list:
Listing 2.7: Physical add and remove Naive implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
private boolean doAdd ( Node f c n o d e , int key ,
RandomAccessList<Link> p r e d a r y ,
RandomAccessList<Link> s u c c a r y ) {
// New node h e i g h t has t o be known i n advance
// i n o r d e r t o r e s t r i c t nodes ’ c o l l e c t i o n .
int t o p l e v e l = randomLevel ( ) ;
// Find p l a c e m e n t and nodes t o c o n n e c t .
Node f o u n d n o d e = f i n d ( f c n o d e , key , p r e d a r y ,
s u c c a r y , t o p l e v e l , true ) ;
i f ( f o u n d n o d e == null ) { // Node not on map
Node new node = new Node ( key ,
top level , false ) ;
Link n e w l i n k = new node . bottom ( ) ;
RandomAccessList<Link >. B i D i r I t e r a t o r p r e d I t e r =
pred ary . begin ( ) ;
RandomAccessList<Link >. B i D i r I t e r a t o r s u c c I t e r =
succ ary . begin ( ) ;
// Connect new node
f or ( int l e v e l = 0 ; l e v e l < t o p l e v e l ; ++l e v e l ,
n e w l i n k = n e w l i n k . up ) {
new link . next = s u c c I t e r . data ;
predIter . data . next = new link ;
p r e d I t e r = p r e d I t e r . next ( ) ;
s u c c I t e r = s u c c I t e r . next ( ) ;
}
// L i n e a r i z a t i o n p o i n t
new node . f u l l y c o n n e c t e d = true ;
return true ;
}
return f a l s e ;
}
private boolean doRemove ( Node f c n o d e , int key ,
RandomAccessList<Link> p r e d a r y ,
RandomAccessList<Link> s u c c a r y ) {
// Find node t o d e l e t e and i t s p r e d e c e s s o r s .
Node f o u n d n o d e = f i n d ( f c n o d e , key , p r e d a r y ,
succ ary , fc node . num levels () , false ) ;
i f ( f o u n d n o d e != null ) {
int t o p l e v e l = f o u n d n o d e . n u m l e v e l s ( ) ;
2. The Flat Combined Skip Lists
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
10
// Get l i n k on t o p l e v e l
Link l n k = f o u n d n o d e . top ( ) ;
// Topmost p r e d e c e s s o r
RandomAccessList<Link >. B i D i r I t e r a t o r p r e d I t e r =
pred ary . rbegin ( ) ;
f o u n d n o d e . d e l e t e d = true ; // L o g i c a l d e l e t e
f or ( int l e v e l = 0 ; l e v e l < t o p l e v e l ; ++l e v e l ,
l n k = l n k . down , p r e d I t e r = p r e d I t e r . prev ( ) ) {
// P h y s i c a l d e l e t e
predIter . data . next = lnk . next ;
}
return true ;
}
return f a l s e ;
}
In this implementation we use fast random number generator described in [12],
the similar one is adopted in JDK’s lock-free list.
Consider the properties of the above skip list implementation.
Property 2.1.1. Naive skip list is deadlock free.
Proof. The implementation uses only one lock. Therefore, the deadlock free
implementation of the lock implies deadlock freedom of the data structure.
Property 2.1.2. Naive skip list update operations do not overlap each other
and have strict total order.
Proof. Consider two arbitrary update operations on the list. All modification
are performed by the combiner thread during combining session (Listing 2.6).
The combining sections are strictly ordered by single lock and do not overlap,
and, so, if the operations belong to different sessions, the order is defined by
the lock acquiring order. Otherwise, if the updates belong to the same session,
the order is defined by combine algorithm - the combiner performs updates
sequentially, and any two modifications do not overlap.
Proposition 2.1.1. Naive skip list is linearizable.
Proof. Select linearization points for skip list updates:
• For add : the row 27 (Listing 2.7), where fully connected flag is set to
true.
• For remove: the row 46 (Listing 2.7), where deleted flag is set to true.
Use linearizability of OptimisticSkipList proved in [8]. Note, that by Property
2.1.2, all updates performed on our skip list may be regarded as performed by
single dedicated thread. Therefore, since initial preconditions are identical for
both OptimisticSkipList and Naive one, modifications of the next references
and deleted and fully connected flags appear in program order exactly as in
OptimisticSkipList, the Naive skip list state may be considered exactly equal
to OptimisticSkipList one, where all modifications on the least are performed
by single thread. Then, for each possible concurrent run on Naive skip list,
2. The Flat Combined Skip Lists
11
Fig. 2.1: Multi-combiner skip list. Every node with height ≥ 3 is a combiner node
there is a run on OptimisticSkipList, where both skip lists’ states defined by
the next references and flags are identical at every point of time, and so, the
OptimisticSkipList linearization order is applicable to Naive skip list
As expected, the flat combining in this implementation exposes the sequential bottleneck, very comparable to the global lock. In Section 3 (Performance)
this estimation is verified.
2.2 Flat Combined Skip List with Multiple Combiners
The second attempt is the introduction of several combiners, that allow to make
several modifications simultaneously and, therefore, to improve scalability. The
multi-combiner skip list is implemented with statically distributed immutable
combiners. The idea is to divide the skip list into non-intersecting parts, such
that every part is managed by some combiner node. The multi-combiner skip
list is shown on Figure 2.1. Suppose, that we start from initially filled skip
list of size N and have to add c < N combiners. We choose some heights hc
such that number of nodes with height h ≥ hc is at least c, and make them to
be combiner nodes by adding FCData to each one. In this work, only static
multi-combiner skip lists are studied. The dynamic lists may be devised by
alternating hc value - the process requires consecutive locking of all FC nodes
layers, converting of needed layer to combiners/non-combiners and re-scheduling
of all pending combining requests. Since, by its essence, flat combining has to
use a very small number of combiners (otherwise, it does not differ from sort of
fine-grained synchronization), the process is rare and do not expensive.
Multi-combiner skip list acts very similar to single-combiner one. As it was
mentioned early, the contains method is exactly the same, while add/remove
single difference is that the requests are placed to appropriate combiner nodes
instead of head one. The updating thread:
1. Finds combiner node fc node responsible to modification area.
2. Puts its FCRequest into fc node’s FCData.
2. The Flat Combined Skip Lists
12
3. Tries to acquire FCData lock.
4. If succeeded, scans and fulfills the requests
5. Else, spins on its own request completion flag and checks lock state. If request is fulfilled, returns with desired result, otherwise, if lock is unlocked,
continue from 3.
Listing 2.8: Multi-combiner remove implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
public boolean remove ( int key ) {
// Get r e s p o n s i b l e combiner
Node f c n o d e = findCombiner ( key ) ;
// Put my r e q u e s r t t o node ’ s f c d a t a
FCRequest m y r e q u e s t =
f c n o d e . f c d a t a . r e q a r y [ ThreadId . g e t T h r e a I d ( ) ] ;
m y r e q u e s t . key = key ;
// V o l a t i l e w r i t e , from h e r e combiner s e e s i t
m y r e q u e s t . o p c o d e = REMOVE;
AtomicInteger lock = fc node . f c d a t a . l o c k ;
do {
// TTAS l o c k
i f ( 0 == l o c k . g e t ( ) &&
l o c k . compareAndSet ( 0 , 0xFF ) ) {
// Perform a l l found r e q u e s t s
scanAndCombine ( f c n o d e ) ;
// Unlock
lock . set ( 0) ;
return m y r e q u e s t . r e s p o n s e ;
} else {
do {
Thread . y i e l d ( ) ;
// Somebody d i d my work
i f ( m y r e q u e s t . o p c o d e == NONE)
return m y r e q u e s t . r e s p o n s e ;
} while ( 0 != l o c k . g e t ( ) ) ;
}
} while ( true ) ;
}
The method findCombiner is wait-free and is implemented similar to contains.
It has three differences 1. The search goes down to the lowest combiners level and does not proceed
to the bottom.
2. The search returns the lowest combiner predecessor of the key.
3. Since combiners are immutable, there is no need to check their deleted
flag.
The multi-combiner skip list properties are similar to Naive list ones.
2. The Flat Combined Skip Lists
13
Property 2.2.1. Multi-combiner skip list is deadlock free.
Proof. As it follows from the algorithm, no thread try to hold more than one
lock simultaneously. Then, the deadlock is impossible.
Practically, the multi-combiner design divides the data structure into disjoint set of single combiner Naive lists. Call these lists combining clusters and
the combiner, responsible for the cluster cluster head. Then, the properties of
Naive FC lists are applicable for every combining cluster. Instead of strict total
order, all updates operations of multi-combiner list form strict partial order,
where operations on different clusters are commutative - the operations can be
reordered without affecting the final state of data structure.
Proposition 2.2.1. Multi-combiner skip list is linearizable.
Proof. Follows from linearizability of each cluster and the fact that linearizability is compositional (Theorem 1 from [10])
The multi-combiner skip list scales much better than single-combiner one,
but still perform a lot of work sequentially. The next try is to reduce this part
of the execution by hints mechanism.
2.3 Flat Combined Skip List with ”Hints”
Hints mechanism is inspired by optimistic skip list [8]. The idea is to collect
in wait-free ”optimistic” manner the links that have to be updated, to acquire
the lock, verify (and re-find, if needed) the links and then to perform update.
The Listing 2.9 shows FCrequest structure supplemented with ”hints” and add
method.
Listing 2.9: Optimistic (hinted) FCrequest and add implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
c l a s s FCRequest {
int key ;
// Key
boolean r e s p o n s e ;
// O p e r a t i o n r e s u l t
v o l a t i l e int opcode = NONE; // Action
int t o p l e v e l
// h i n t s s i z e
RandomAccessList<Link> p r e d a r y ; // C o l l e c t e d h i n t s
RandomAccessList<Link> s u c c a r y ; // C o l l e c t e d h i n t s
}
public boolean add ( int key ) {
// Get r e s p o n s i b l e combiner
Node f c n o d e = findCombiner ( key ) ;
FCRequest m y r e q u e s t =
f c n o d e . f c d a t a . r e q a r y [ ThreadId . g e t T h r e a I d ( ) ] ;
// We have t o know l e v e l p r i o r t o f i n d i n o r d e r
// t o r e s t r i c t h i n t s s i z e
int t o p l e v e l = randomLevel ( ) ;
Node f o u n d n o d e ;
do{
// Find p l a c e m e n t and f i l l h i n t s d a t a
2. The Flat Combined Skip Lists
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
14
f o u n d n o d e = f i n d ( f c n o d e , key , m y r e q u e s t . p r e d a r y ,
m y r e q u e s t . s u c c a r y , t o p l e v e l , true , true ) ;
} while ( f o u n d n o d e != null && f o u n d n o d e . d e l e t e d ) ;
// Node a l r e a d y e x i s t s
i f ( f o u n d n o d e != null )
return f a l s e ;
// Put my r e q u e s t t o node ’ s f c d a t a
my request . t o p l e v e l = t o p l e v e l ;
m y r e q u e s t . k e y = key ;
// V o l a t i l e w r i t e , from h e r e combiner s e e s i t
m y r e q u e s t . o p c o d e = ADD;
AtomicInteger lock = fc node . f c d a t a . l o c k ;
do {
// TTAS l o c k
i f ( 0 == l o c k . g e t ( ) &&
l o c k . compareAndSet ( 0 , 0xFF ) ) {
// Perform a l l found r e q u e s t s
scanAndCombine ( f c n o d e ) ;
// Unlock
lock . set ( 0) ;
return m y r e q u e s t . r e s p o n s e ;
} else {
do {
Thread . y i e l d ( ) ;
// Somebody d i d my work
i f ( m y r e q u e s t . o p c o d e == NONE)
return m y r e q u e s t . r e s p o n s e ;
} while ( 0 != l o c k . g e t ( ) ) ;
}
} while ( true ) ;
}
The internal doAdd and doDelete (Listing 2.10) methods are also slightly modified, since we have to verify and re-fill, if needed, the collections of the predecessors and the successors. The verify method checks if all collected nodes are
correct, i. e. they are non-deleted and connected, and each predecessor’s next
reference points to the appropriate successor, and collected nodes keys suit the
requested key.
Listing 2.10: Optimistic (hinted) doAdd and verify implementation
1
2
3
4
5
6
7
8
9
10
private boolean doAdd ( Node f c n o d e , int key , int t o p l e v e l
RandomAccessList<Link> p r e d a r y ,
RandomAccessList<Link> s u c c a r y ) {
Node f o u n d n o d e = null ;
// V e r i f y d a t a and re− f i l l i f needed
i f ( ! v e r i f y ( key , p r e d a r y , s u c c a r y ,
top level )){
f o u n d n o d e = f i n d ( f c n o d e , key , p r e d a r y ,
s u c c a r y , t o p l e v e l , true , f a l s e ) ;
}
// From here , as i n \ t e x t i t { Naive } l i s t
2. The Flat Combined Skip Lists
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
15
. . .
}
protected boolean v e r i f y ( int key ,
RandomAccessList<Link> predAry ,
RandomAccessList<Link> succAry ,
int t o p l e v e l )
{
RandomAccessList<Link >. B i D i r I t e r a t o r p r e d I t e r
= predAry . b e g i n ( ) ;
RandomAccessList<Link >. B i D i r I t e r a t o r s u c c I t e r
= succAry . b e g i n ( ) ;
f or ( int i L e v e l = 0 ; i L e v e l < t o p l e v e l ; ++i L e v e l ,
p r e d I t e r = p r e d I t e r . next ( ) , s u c c I t e r = s u c c I t e r . next ( ) ) {
Link pred = p r e d I t e r . d a t a ;
Link next = s u c c I t e r . d a t a ;
i f ( pred . node . d e l e t e d | | next . node . d e l e t e d | |
! pred . node . f u l l y c o n n e c t e d | |
! next . node . f u l l y c o n n e c t e d | |
pred . n e x t != next | |
pred . node . k e y >= key | | next . node . k e y < key )
return f a l s e ;
}
return true ;
}
As its predecessors, the hinted skip list is deadlock free and linearizable.
The deadlock freedom is obvious, since this implementation uses exactly the
same locking scheme as previous ones. The linearizability may be devised from
the fact that if verify fails, the hints skip list algorithm is identical to naive
one. Otherwise verify success guarantees that the state of all memory that has
to be updated is identical to one when data was collected, and therefore, all
preconditions, mentioned in linearizability proof for OptimisticSkipList hold,
and the proof is applicable also for hints skip list.
The hints mechanism is applied to both single- and multi-combiners lists. As
it is shown in Chapter 3 (Performance), the optimistic approach is very efficient,
especially when update rate is not high.
3. PERFORMANCE
For the performance verifications, we use the skip lists described above and
several additional data structures designed to verify flat combining impact. The
JDK ConcurrentSkipListSet by Doug Lea is used as a main competitor - by
now, it is a one of the most efficient and scalable skip list implementations.
R Enterprise T5140 server
Computations were performed on SunTM SPARC
powered by two UltraSPARC T2 Plus processors. Each processor contains eight
cores running eight hardware threads, which gives 128 total hardware threads
per system.
The benchmarked algorithms notation is:
FC-Naive-0 - ”Naive” FC-list with 0 non-head combiners.
FC-Hints-64 - ”Hinted” FC-list with at least 64 non-head combiners - the combiners distribution algorithm was described in Section 2.2
JDK - JDK ConcurrentSkipListSet (based on ConcurrentSkipListMap).
ML-0, ML-64 - ”Multi-lock” skip lists with 0 and 64 non-head locks correspondingly - the data structure, designed to isolate combining effect from combiners distribution one. Generally, it is multi-combiners skip list, where
the FCData structures are substituted with simple locks. The updating
thread locks appropriate ”locking” node, makes the update and releases
lock - instead of making all the combining algorithm.
ML-hints-0, ML-hints-64 - ”Multi-lock” optimistic skip lists with 0 and 64 nonhead locks correspondingly using hints mechanism exactly as flat combining one does.
FC-Ideal-64 - The artificial FC-list made from FC-list with hints. Here, we
assumed, that hints are always successful, and the combiner only work is
to update the next references. This data structure gives an indication
about maximal FC skip list performance, when the combiner fulfills all its
requests sequentially.
Experiments were performed on data structures with initial size of about
20000 keys. Actually, before selecting this size, the base skip list implementations were roughly benchmarked for wide range of sizes - from one hundred to
few millions. The relations between run times for different skip list implementations were very similar for different sizes, and therefore, every initial size was
representative enough to show qualitative differences between algorithms. The
access locality factor was introduced to simulate different workloads. Suppose
that the experiment is performed for keys space S = {1, 2, ..., N }. The access
locality factor k, 1 ≤ k ≤ N is defined in the following way: the keys in the
3. Performance
17
benchmark are uniformly selected from the Sk = {t, t + 1, ..., t + N/k}, where
t is selected uniformly from S at the start of the run, and is changed slowly
during the execution. The access locality factor of 1, therefore, corresponds to
uniformly distributed keys from S. The factor increase means that the keys are
selected from the smaller interval, and so the contention increases.
3.1 Performance Comparison of Flat Combined Skip Lists vs
JDK ConcurrentSkipListSet
The first group of benchmarks compares the flat combining skip list implementations throughput with JDK ConcurrentSkipListSet’s one.
Figure 3.1 presents the benchmark results for ”Naive” flat combining using
uniformly distributed values. The graphs show that single combiner implementation fails to compete with SDK list even for read-dominate loads, when
implementation with 64 combiners shows scalability even for write only loads.
The picture changes dramatically when workload locality increases. Figure 3.2
depicts the same data structures, where all requests are selected from 1/128 of
total keys space. In this case, naive FC skip list lose to SDK one even for read
dominated workloads - when number of running threads increases enough, and
multiple combiners do not help.
The next group of runs deals with improved optimistic skip list, using ”hints”
mechanism described above in Chapter 2 (The Flat Combined Skip Lists). Figure 3.3 shows the benchmark results for uniformly distributed requests, when
Figure 3.4 depicts the runs with high locality access. The presented graphs show
significant performance gain due to optimistic approach. For read-dominated
workloads, both single- and multi-combiner lists perform better than SDK for
all workload localities. For higher update operations rate, multi-combiners list
competes well with SDK data structure, while single combiner one shows lack
of scalability, especially for high access locality.
So far, we can conclude that at least ”hinted” variant of combining skip
list is simple and effective alternative to SDK decision. It is clear enough that
for read-dominated workloads lock-free list performs worse than ones with lock
protected updates and lock-free contains. The first reason for more effective
read is that FC lists contains (Listing 2.4) performs only two volatile reads,
while lock-free implementations require all next references to be volatile, and
therefore, need log N volatile reads. The second reason is that all known lockfree skip list implementations conclude about node presence only after reaching
the bottom skip list level, when our implementation stops if node with desired
key is found on any level. However, it remains not clear yet what the combiner
mechanism impact on the presented results.
3. Performance
18
Fig. 3.1: Naive FC skip list implementation vs JDK lock-free ConcurrentSkipListSet,
uniform keys distribution
3. Performance
19
Fig. 3.2: Naive FC skip list implementation vs JDK lock-free ConcurrentSkipListSet,
high access locality
3. Performance
20
Fig. 3.3: Hints FC skip list implementation vs JDK lock-free ConcurrentSkipListSet,
uniform keys distribution
3. Performance
21
Fig. 3.4: Hints FC skip list implementation vs JDK lock-free ConcurrentSkipListSet,
high access locality
3. Performance
22
3.2 Flat Combining Mechanism Experimental Verifications.
In this section we experimentally verify in depth the FC impact on skip list
behavior.
The first experiments compare flat combining implementations with especially designed multi-lock skip list. Multi-lock skip list is devised from flat
combining one by replacing FCData by simple lock. It has single- and multilocks implementation, exactly as FC skip list has, and may be extended with
”hints” mechanism as well. The multi-lock skip list with hints add method is
shown at Listing 3.1. The method doAdd called at row 26 is identical to flat
combined one presented at Listing 2.9
Listing 3.1: Optimistic (hinted) multi-lock add method implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
public boolean add ( int key ) {
// Get r e s p o n s i b l e combiner
Node l o c k n o d e = findLockNode ( key ) ;
// We have t o know l e v e l p r i o r t o f i n d i n o r d e r
// t o r e s t r i c t h i n t s s i z e
int t o p l e v e l = randomLevel ( ) ;
// Thread l o c a l h i n t s l i s t s
int t h r e a d i d = ThreadId . getThreadId ( ) ;
RandomAccessList<f c j a v a . MultiLockSkipListFH . Link>
succ ary = succ ary [ thread id ] ;
RandomAccessList<f c j a v a . MultiLockSkipListFH . Link>
pred ary = pred ary [ thread id ] ;
Node f o u n d n o d e ;
do{
f o u n d n o d e = f i n d ( l o c k n o d e , key , p r e d a r y ,
s u c c a r y , t o p l e v e l , true , true ) ;
} while ( f o u n d n o d e != null && f o u n d n o d e . d e l e t e d ) ;
i f ( f o u n d n o d e != null )
return f a l s e ;
// A c q u i r e l o c k and perform m o d i f i c a t i o n
AtomicInteger lock = lock node . node lock ;
do { // TTAS l o c k
i f ( 0 == l o c k . g e t ( ) && l o c k . compareAndSet ( 0 , 0xFF ) ) {
doAdd ( t h r e a d i d , l o c k n o d e , key , p r e d a r y ,
succ ary , t o p l e v e l ) ;
// R e l e a s e l o c k
lock . set ( 0) ;
return true ;
} e l s e // Give up p r o c e s s o r
Thread . y i e l d ( ) ;
} while ( true ) ;
}
Instead of placing the request and running the flat combining algorithm, the
3. Performance
23
updating thread finds appropriate lock node, acquires the lock and performs the
change.
The following graphs compare between multi-lock to FC Naive skip lists.
We can see that for both low (Figure 3.5) and high (Figure 3.6) localities, and
for any update rates both lists behave very similar. The multi-lock skip list even
tends to perform slightly better for low access locality than its FC counterpart.
It may be explained by additional overheads that flat combining introduces the combiner thread has to read and maintain the FC registry and to write
back the operations result. All this, if not compensated by FC gains that were
described above, leads to performance decrease.
The benchmarks of Hints versions of multi-lock and FC skip lists are shown
on Figures 3.7 and 3.8 for low and high access locality. The hints mechanism
introduction improves performance of both lists, but does not change the ratio
between algorithms - both behave very similar with light preference to multi-lock
skip list for low access locality.
As it is mentioned before, flat combining, besides opening contention bottleneck, allows using the knowledge about all pending request for optimizing data
structure updates. For tree-like data structures, and for skip lists in particular,
the elimination and combining techniques can be applied for optimizing the data
structure traversal, but it is very hard to use them for optimizing data structure
update. For the next group experiments, we assumed that the traversal is perfectly optimized, i.e. our hints mechanism never fails. In practice, we replaced
the verify method in Listing 2.10 with one always returning true, and supplied
every nodes with additional dummy next references. The combiner, instead
of writing to real next references, updates the equal quantity of dummy next
ones. These benchmarks are presented on Figures 3.9 and 3.9, and show that
FC skip list with ideal hints mechanism competes well with lock-free one, and
fails only for high access locality and more than 50% update rate, and, so, hints
mechanism verification and improvement makes sense.
The next graph (Figure 3.11) shows our hints mechanism efficiency. As
it follows from the graph, the hints are very close to ideal for uniform access
and fall to about 50% failures, when threads number grows to 64. This result
explains the scalability turning point between 16 to 32 threads for high access
locality and high update rate. Note, that for ideal hints list the turning point
also exists, but appears slightly later and is not so sharp. So, the problematic
scalability of FC list caused, probably, by the flat combining itself.
3. Performance
24
Fig. 3.5: FC skip list implementation vs multi-lock one, naive implementations, uniform keys distribution
3. Performance
25
Fig. 3.6: FC skip list implementation vs multi-lock one, naive implementations, high
access locality
3. Performance
26
Fig. 3.7: FC skip list implementation vs multi-lock one, hints implementations, uniform keys distribution
3. Performance
27
Fig. 3.8: FC skip list implementation vs multi-lock one, hints implementations, high
access locality
3. Performance
28
Fig. 3.9: Ideal hints FC skip list implementation vs JDK lock-free ConcurrentSkipListSet, uniform keys distribution
3. Performance
29
Fig. 3.10: Ideal hints FC skip list implementation vs JDK lock-free ConcurrentSkipListSet, high access locality
3. Performance
30
Fig. 3.11: Hints mechanism success rate for pure update workloads
Fig. 3.12: The connection between FC intensity to throughput per thread for pure
update workloads
The next two benchmarks, performed for pure update workload, intend to
answer the question why lock-free list scales better than FC one. For flat combining loading estimation we introduce the FC intensity - the factor showing
3. Performance
31
additional combiner work. It is calculated in following way:
< F C intensity >=
< F ulf illed requests per F C session > −1
< N umber of threads >
This number is 0 for single threaded execution, and tend to 1 for large number
of threads, when one combiner fulfills the requests of all other threads. The
Figure 3.12 shows FC intensity together with throughput per thread for different
number of combiners and workload localities, and the FC intensity increase is
followed by throughput decrease (note, that the ideal scalability is horizontal
line). The jump of intensity between 16 to 32 threads corresponds well with
graphs on Figures 3.3 and 3.4 for 50% add / 50% remove workload. The jump
may be explained in following way: starting from some number of threads, the
combiner has no time to complete all the requests during the period, when the
released thread prepares the new request, and so, the competition for lock newer
interrupts. On the other hand, for 64 combiners and low locality, the jump has
not happened, and algorithm is scalable. The Figure 3.13 shows lock-free list
statistics for pure update workload. As it follows from the graphs, the CAS
success rate never drops below 75% and CAS number is as small as 1.5 - 2.5
CAS per update, which explains good algorithm scalability.
Fig. 3.13: Lock-free skip list CAS per update, CAS success rate and throughput per
thread for pure update workloads
4. CONCLUSIONS
We studied several approaches for flat combining technique application to skip
list based maps. As it was shown on skip list example, for the structures allowing
concurrent updates, the fine-grained and especially lock-free synchronizations
are preferable to FC. This conclusion does not completely deny usefulness of the
FC application for such structures since for read dominated workloads and for
several update request distributions flat combining behaves better than lock-free
synchronization. It is also possible that for different hardware the FC approach
will show better scalability. The breakthrough can also come from FC algorithm
improvements. It is possible, for example, to transform FC into some sort of job
dispatcher: having all the requests, it can form mutually non-conflicting groups,
so the waiting threads can execute them without synchronization. Such design
faces the problems with additional FC overhead for sorting and analyzing the
requests, but may be applicable for NUMA or client-server architectures.
It is interesting also to study the FC implementation for other popular data
structures - such as B-trees or Red-Black trees, where lock-free alternatives do
not exist, and fine-grained locking requires complicated read-write locks. The
FC’s benefit of simplicity and proved linearizability may be valuable for these
cases.
Another, albeit auxiliary, data structure - multi-lock skip list - may be interesting by itself. It showed characteristics as good as FC skip list, but it is
simpler, needs less memory and gives more uniform latency for update requests.
The idea to build the small index, protected by locks (locked or FC layers), and
entirely wait-free data structure body can replace hand-by-hand fine-grained
synchronization schemes for tree-like structures.
BIBLIOGRAPHY
[1] Adelson-Velskii, G. M., and Landis, E. M. An algorithm for the
organization of information. Soviet Math. Doklady, 3 (1962), 1259–1263.
[2] Bayer, R., and McCreight, E. Organization and maintenance of large
ordered indices. In SIGFIDET ’70: Proceedings of the 1970 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control (New York, NY, USA, 1970), ACM, pp. 107–141.
[3] Colvin, R., Groves, L., Luchangco, V., and Moir, M. Formal
verification of a lazy concurrent list-based set algorithm. In CAV (2006),
pp. 475–488.
[4] Doherty, S., Groves, L., Luchangco, V., and Moir, M. Formal
verification of a practical lock-free queue algorithm. In In FORTE (2004),
Springer, pp. 97–114.
[5] Fraser, K. Practical lock freedom. PhD thesis, Cambridge University
Computer Laboratory, 2003. Also available as Technical Report UCAMCL-TR-579.
[6] Guibas, L. J., and Sedgewick, R. A dichromatic framework for balanced trees. In SFCS ’78: Proceedings of the 19th Annual Symposium on
Foundations of Computer Science (Washington, DC, USA, 1978), IEEE
Computer Society, pp. 8–21.
[7] Hendler, D., Incze, I., Shavit, N., and Tzafrir, M. Flat combining
and the synchronization-parallelism tradeoff. In SPAA (2010), pp. 355–364.
[8] Herlihy, M., Lev, Y., Luchangco, V., and Shavit, N. A simple optimistic skiplist algorithm. In SIROCCO’07: Proceedings of the 14th international conference on Structural information and communication complexity
(Berlin, Heidelberg, 2007), Springer-Verlag, pp. 124–138.
[9] Herlihy, M., and Shavit, N. The art of multiprocessor programming.
Morgan Kaufmann, 2008.
[10] Herlihy, M. P., and Wing, J. M. Linearizability: a correctness condition for concurrent objects. ACM Transactions on Programming Languages
and Systems 12 (1990), 463–492.
[11] Lotan, I., and Shavit., N. Skiplist-based concurrent priority queues. In
Proc. of the 14th International Parallel and Distributed Processing Symposium (IPDPS) (2000), pp. 263–268.
Bibliography
34
[12] Marsaglia, G. Xorshift rngs. Journal of Statistical Software 8, 14 (7
2003), 1–6.
[13] Moir, M., and Shavit, N. Concurrent data structures. In Handbook of
Data Structures and Applications, D. Metha and S. Sahni Editors (2007),
pp. 47–14 47–30. Chapman and Hall/CRC Press.
[14] Pugh, W. Concurrent maintenance of skip lists. Tech. rep., University of
Maryland at College Park, College Park, MD, USA, 1990.
[15] Pugh, W. Skip lists: a probabilistic alternative to balanced trees. Commun. ACM 33 (June 1990), 668–676.
[16] Scherer, III, W. N., Lea, D., and Scott, M. L. Scalable synchronous
queues. Commun. ACM 52, 5 (2009), 100–111.
[17] Stepanov, A., and Lee, M. The standard template library. Tech. rep.,
WG21/N0482, ISO Programming Language C++ Project, 1995.
[18] SUN MICROSYSTEMS, INC. JAVA PLATFORM, STANDARD EDITION, Version 6. 4150 Network Circle, Santa Clara, CA 95054, U.S.A,
2006.