by Guy Golan Gueta

Transcription

by Guy Golan Gueta
Tel Aviv University
Raymond and Beverly Sackler Faculty of Exact Sciences
School of Computer Science
AUTOMATIC F INE -G RAINED S YNCHRONIZATION
by
Guy Golan Gueta
under the supervision of Prof. Mooly Sagiv and Prof. Eran Yahav
and the consultation of Dr. G. Ramalingam
A thesis submitted
for the degree of Doctor of Philosophy
Submitted to the Senate of Tel Aviv University
April 2015
ii
To my loved ones, Sophie, Yasmin, Ortal and Ariel.
iii
iv
Abstract
Automatic Fine-Grained Synchronization
Guy Golan Gueta
School of Computer Science
Tel Aviv University
A key challenge in writing concurrent programs is synchronization: ensuring that concurrent accesses and modifications to a shared mutable state do not interfere with each other in undesirable ways.
An important correctness criterion for synchronization is atomicity, i.e., the synchronization should ensure that a code section (a transaction) appears to execute atomically. Realizing an efficient and scalable
synchronization that correctly ensures atomicity is considered a challenging task.
In this thesis, we address the problem of achieving correct and efficient atomicity by developing and
enforcing certain synchronization protocols. We present three novel synchronization approaches that
utilize program-specific information using compile-time and run-time techniques.
The first approach leverages the shape of shared memory in order to transform a sequential library
into an atomic library (i.e., into a library in which each operation appears to execute atomically). The
approach is based on domination locking, a novel fine-grained locking protocol designed specifically
for concurrency control in object-oriented software with dynamic pointer updates. We present a static
algorithm that automatically enforces domination locking in a sequential library which is implemented
using a dynamic forest. We show that our algorithm can successfully add effective fine-grain locking to
libraries where manually performing locking is challenging.
The second approach transforms atomic libraries into transactional libraries, which ensure atomicity of sequences of operations. The central idea is to create a library that exploits information (foresight) provided by its clients. The foresight restricts the cases that should be considered by the library
— thereby permitting more efficient synchronization. This approach is based on a novel synchronization protocol which is based on a notion of dynamic right-movers. We present a static analysis to
infer the foresight information required by the approach, allowing a compiler to automatically insert
the foresight information into the client. This relieves the client programmer of this burden and simplifies writing client code. We show a generic implementation technique to realize the approach in a
given library. We show that this approach enables enforcing atomicity of a wide selection of real-life
v
Java composite operations. Our experiments indicate that the approach enables realizing efficient and
scalable synchronization for real-life composite operations.
Finally, we show an approach that enables using multiple transactional libraries. This approach is
applicable to a special case of transactional libraries in which the synchronization is based on locking
that exploits semantic properties of the library operations. This approach realizes a semantic-based finegrained locking which is based on the commutativity properties of the libraries operations and on the
program’s dynamic pointer updates. We show that this approach leads to effective synchronization. In
some cases, it improves the applicability and the performance of our second approach.
We formalize the above approaches and prove they guarantee atomicity and deadlock freedom. We
show that our approaches provide a variety of tools to effectively deal with common cases in concurrent
programs.
vi
Acknowledgements
First and foremost, I would like to express my deep gratitude and appreciation to my advisors,
Mooly Sagiv and Eran Yahav. Their guidance, inspiration, knowledge, and optimism were crucial for
the completion of this thesis.
I would like to thank G. Ramalingam for his guidance, help, and support throughout the work on
this thesis. This thesis would have been impossible without his guidance and help.
I would like to thank Alex Aiken and Nathan Bronson for interesting discussions, joint work, and
for the enjoyable visits at Stanford University.
I would like to thank Mooly’s group for many fruitful discussions and for being such a wonderful
combination of research colleagues and friends: Ohad Shacham, Shachar Itzhaky, Omer Tripp, Ofri Ziv,
Oren Zomer, Ghila Castelnuovo, Ariel Jarovsky, Hila Peleg, and Or Tamir.
vii
viii
Contents
1
2
Introduction
1
1.1
Automatic Fine-Grained Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2
Transactional Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.3
Composition of Transactional Libraries via Semantic Locking . . . . . . . . . . . . .
5
Fine-Grained Locking using Shape Properties
7
2.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.2
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.3
Domination Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.4
Enforcing DL in Forest-Based Libraries . . . . . . . . . . . . . . . . . . . . . . . . .
26
2.4.1
Eager Forest-Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
2.4.2
Enforcing EFL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
2.4.3
Example for Dynamically Changing Forest . . . . . . . . . . . . . . . . . . .
33
Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
2.5.1
General Purpose Data Structures . . . . . . . . . . . . . . . . . . . . . . . . .
35
2.5.2
Specialized Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
2.5
3
Transactional Libraries with Foresight
41
3.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
3.1.1
Serializable and Serializably-Completable Executions . . . . . . . . . . . . .
43
3.1.2
Serializably-Completable Execution: A Characterization . . . . . . . . . . . .
45
3.1.3
Synchronization Using Foresight . . . . . . . . . . . . . . . . . . . . . . . . .
46
3.1.4
Realizing Foresight Based Synchronization . . . . . . . . . . . . . . . . . . .
48
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
3.2.1
Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
3.2.2
Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
Foresight-Based Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
3.3.1
52
3.2
3.3
The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
3.4
3.5
3.6
3.7
4
3.3.2
The Client Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
3.3.3
Dynamic Right Movers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
3.3.4
Serializability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
3.3.5
B-Serializable-Completability . . . . . . . . . . . . . . . . . . . . . . . . . .
55
3.3.6
E-Completability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
3.3.7
Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
Automatic Foresight for Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
3.4.1
Annotation Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
3.4.2
Inferring Calls to mayUse Procedures . . . . . . . . . . . . . . . . . . . . . .
62
3.4.3
Implementation for Java Programs . . . . . . . . . . . . . . . . . . . . . . . .
64
Implementing Libraries with Foresight . . . . . . . . . . . . . . . . . . . . . . . . . .
64
3.5.1
The Basic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
3.5.2
Using Dynamic Information . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
3.5.3
Optimistic Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
3.5.4
Further Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
3.5.5
Java Threads and Transactions . . . . . . . . . . . . . . . . . . . . . . . . . .
72
Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
3.6.1
Applicability and Precision Of The Static Analysis . . . . . . . . . . . . . . .
72
3.6.2
Comparison To Hand-Crafted Implementations . . . . . . . . . . . . . . . . .
73
3.6.3
Evaluating The Approach On Realistic Software . . . . . . . . . . . . . . . .
75
Java Implementation of the Transactional Maps Library . . . . . . . . . . . . . . . . .
77
3.7.1
Base Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
3.7.2
Extended Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
3.7.3
Utilizing Dynamic Information by Handcrafted Optimization . . . . . . . . . .
79
3.7.4
API Adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
Composition of Transactional Libraries via Semantic Locking
81
4.1
Semantic Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
4.1.1
Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
4.1.2
ADTs With Semantic Locking . . . . . . . . . . . . . . . . . . . . . . . . . .
85
4.1.3
Automatic Atomicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
Automatic Atomicity Enforcement . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
4.2.1
Enforcing S2PL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
4.2.2
Lock Ordering Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
4.2.3
Enforcing OS2PL on Acyclic Graphs . . . . . . . . . . . . . . . . . . . . . .
92
4.2.4
Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
4.2
x
4.2.5
5
6
Handling Cycles via Coarse-Grained Locking . . . . . . . . . . . . . . . . . .
98
4.3
Using Specialized Locking Operations . . . . . . . . . . . . . . . . . . . . . . . . . .
99
4.4
Implementing ADTs with Semantic Locking . . . . . . . . . . . . . . . . . . . . . . .
101
4.5
Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
102
4.5.1
Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
102
4.5.2
Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
103
Related Work
105
5.1
Synchronization Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
105
5.2
Concurrent Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
106
5.3
Automatic Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
107
Conclusions and Future Work
109
Bibliography
111
xi
xii
Chapter 1
Introduction
Concurrency is widely used in software systems because it helps reduce latency, increase throughput,
and provide better utilization of multi-core machines [35, 69]. However, writing concurrent programs is
considered a difficult and error-prone process, due to the need to consider all possible interleavings of
code fragments which are executed in parallel.
Atomicity Atomicity is a fundamental correctness property of code sections in concurrent programs.
Intuitively, a code section is said to be atomic if for every (arbitrarily interleaved) program execution,
there is an equivalent execution with the same overall behavior where the atomic code section is not
interleaved with other parts of the program. In other words, an atomic code section can be seen as a code
section which is always executed in isolation. Atomic code sections help reasoning about concurrent
programs, since one can assume that each atomic code section is never interleaved with other parts of
the program.
Several different variants of the atomicity property have been defined and used in the literature of
databases and shared memory systems, where each variant has its own semantic properties and is aimed
at a specific set of scenarios and considerations (e.g., [45, 55, 75, 86]). For example, linearizability [55]
is a variant of atomicity that is commonly used to describe implementations of shared libraries (and
shared objects): an operation of a linearizable library can be seen as if it always takes effect instantaneously at some point between its invocation and its response. The linearizable property ignores the
actual implementation details of the shared library; instead, it only considers the library behavior from
the point of view of its client.
The Problem
In this thesis we address the problem of automatically ensuring atomicity of code sec-
tions by realizing an efficient and scalable synchronization. One of the main challenges in this problem
is to guarantee atomicity in a scalable way, restricting parallelism only where necessary. The synchronization should not have a high run-time overhead which makes it worthless — i.e., it should have better
1
2
C HAPTER 1. I NTRODUCTION
performance than simple alternatives (like a single global lock [89]).
Solutions for enforcing atomicity, which are implemented in practice, are predominantly handcrafted and tend to lead to concurrency bugs (e.g., see [33, 58, 79]). Automatic approaches (see Section 5.3) allow a programmer to declaratively specify the atomic code sections, leaving it to a compiler
and run-time to implement the necessary synchronization. However, existing automatic approaches have
not been widely adopted due to various concerns [27, 34, 36, 69, 84, 89], including high run-time overhead, poor performance and limited ability to handle irreversible operations (such as I/O operations).
Specialized Synchronization
In this thesis we present several approaches for automatic atomicity en-
forcement, where each approach is designed to handle a restricted class of programs (and scenarios).
The idea is to produce synchronization that enforces atomicity by exploiting the restricted properties of
the programs. For each approach we describe a specialized synchronization protocol, and realize the
protocol by using a combination of compile-time and run-time techniques. The synchronization protocols are designed to ensure efficient and scalable atomicity, without leading to deadlocks and without
using any rollback mechanism.
The presented approaches deal with two different aspects of the synchronization problem for concurrent programs. In the first approach, we deal with the code inside the libraries (i.e., the library
implementation); whereas in the other two approaches, we deal with code which utilizes the libraries
API.
1.1
Automatic Fine-Grained Locking
In Chapter 2, we present an approach that leverages the shape of shared memory1 to transform a sequential library into a linearizable library [55]. This approach is based on the paper “Automatic Fine-Grain
Locking using Shape Properties” that was presented at OOPSLA’2011 [40].
A library encapsulates shared data with a set of procedures, which may be invoked by concurrently
executing threads. Given the code of a library, our goal is to add correct fine-grained locking that ensures
linearizability and permits a high degree of parallelism. Specifically, we are interested in locking in
which each shared object has its own lock, and locks may be released before the end of the computation.
The main insight of this approach is to use the shape of the pointer data structures to simplify reasoning
about fine-grained locking and automatically infer efficient and correct fine-grained locking.
Domination Locking
We define a new fine-grained locking protocol called Domination Locking.
Domination Locking is a set of conditions that guarantee atomicity and deadlock-freedom. Domination
Locking is designed to handle dynamically-manipulated recursive data structures by leveraging natural
1
The shape of the heap’s objects graph.
1.2. T RANSACTIONAL L IBRARIES
3
domination properties of paths in dynamically-changing data structures. This protocol is a strict generalization of several related fine-grained locking protocols such as dynamic tree locking and dynamic
DAG locking [17, 19, 28].
Automatic Fine-Grained Locking
We then present an automatic technique to enforce the conditions
of Domination Locking. The technique is applicable to libraries where the shape of the shared heap is a
forest. The technique allows the shape of the heap to change dynamically as long as the shape is a forest
between invocations of library operations.
We show that our technique adds efficient and scalable fine-grained locking in several practical data
structures where it is hard to produce similar locking manually. We demonstrate the applicability of
the method on balanced search-trees [16, 46], a self-adjusting heap [81] and specialized data structure
implementations [18, 72].
1.2
Transactional Libraries
Linearizable libraries provide operations that appear to execute atomically. However, clients often need
to perform a sequence of library operations that appears to execute atomically, referred to hereafter
as an atomic composite operation. In Chapter 3, we consider the problem of extending a linearizable
library to support arbitrary atomic composite operations by clients. We introduce a novel approach in
which the library ensures atomicity of composite operations by exploiting information provided by its
clients. We refer to such libraries as transactional libraries.
Our basic methodology requires the client code to demarcate the sequence of operations for which
atomicity is desired and provide declarative information to the library (foresight) about the library operations that the composite operation may invoke. It is the library’s responsibility to ensure the desired
atomicity, exploiting the foresight information for effective synchronization.
Example 1.2.1 The idea is demonstrated in code fragment shown in Figure 1.1. This code uses a shared
Counter (a shared library) by invoking its Get and Inc operations. The code provides information
(foresight) about the future possible Counter operations: at line 2 it indicates that any operation may be
invoked (after line 2), at line 4 it indicates that only Inc may be invoked (after line 4), finally at line 9 it
indicates that no more operations will be used (after line 9). This information is utilized by the Counter
implementation in order to efficiently ensure that this code fragment will always be executed atomically.
A detailed version of this example is described in Chapter 3.
Our approach is based on the paper “Concurrent Libraries with Foresight” that was presented at
PLDI’2013 [41].
4
C HAPTER 1. I NTRODUCTION
1 /* @atomic */ {
2 @mayUseAll()
3
c = Get();
4
@mayUseInc()
5
while (c > 0) {
6
c = c-1;
7
Inc();
8
}
9
@mayUseNone()
10 }
Figure 1.1: Code that provides information (foresight) about future possible operations.
Foresight based Synchronization We first present a formalization of this approach. We formalize
the desired goals and present a sufficient correctness condition. As long as the clients and the library
extension satisfy the correctness condition, all composite operations are guaranteed atomicity without
deadlocks. Our sufficiency condition is broad and permits a range of implementation options and finegrained synchronization. It is based on a notion of dynamic right-movers (Section 3.3.3), which generalizes traditional notions of static right-movers and commutativity [61, 66]. Our approach decouples
the implementation of the library from the client. Thus, the correctness of the client does not depend on
the way the foresight information is used by library implementation. The client only needs to ensure the
correctness of the foresight information.
Automatic Foresight for Clients We then present a static analysis to infer the foresight information
required by our approach, allowing a compiler to automatically insert the foresight information into the
client code. This relieves the client programmer of this burden and simplifies writing atomic composite
operations.
Library Extension Realization Our approach permits the use of customized, hand-crafted, implementations of the library extension. However, we also present a generic technique for extending a linearizable
library with foresight. The technique is based on a novel variant of the tree locking protocol in which
the tree is designed according to semantic properties of the library’s operations.
We used our generic technique to implement a single general-purpose Java library for Map data
structures. Our library permits composite operations to simultaneously work with multiple instances of
Map data structures. (We focus on Maps, because Shacham [78] observed that Maps are heavily used
for implementing composite operations in real-life concurrent programs).
We use our library and the static analysis to enforce atomicity of a selection of real-life Java composite operations, including composite operations that manipulate multiple instances of Map data structures.
Our experiments indicate that our approach enables realizing efficient and scalable synchronization for
real-life composite operations.
1.3. C OMPOSITION OF T RANSACTIONAL L IBRARIES VIA S EMANTIC L OCKING
1.3
5
Composition of Transactional Libraries via Semantic Locking
In Chapter 4, we present an approach for handling composite operations that use multiple transactional
libraries. Our approach is described in the short paper “Automatic Semantic Locking” that was presented
at PPOPP’2014 [42]. This approach is also used in the paper “Automatic scalable atomicity via semantic
locking” that was presented at PPOPP’2015 [43].
Our approach can be seen as a combination of Chapter 3 and approaches for automatic lock inference
(e.g., [68]). In this approach, we restrict the synchronization that can be implemented in the libraries to
synchronization that resembles locking — this synchronization is similar to the semantic-aware locking
from the database literature. We refer to such libraries as libraries with semantic-locking.
We describe a static algorithm that enforces atomicity of code sections that use multiple libraries
with semantic-locking. We implement this static algorithm and show it produces efficient and scalable
synchronization.
6
C HAPTER 1. I NTRODUCTION
Chapter 2
Fine-Grained Locking using Shape
Properties
In this chapter, we consider the problem of turning a sequential library into a linearizable library [55].
Our goal is to provide a synchronization method which guarantees atomicity of the library operations in
a scalable way, restricting parallelism only where necessary. We are interested in a systematic method
that is applicable to a large family of libraries, rather than a method specific to a single library.
Fine-Grained Locking One way to achieve scalable multi-threading is to use fine-grained locking
(e.g., [19]). In fine-grained locking, one associates, e.g., each shared object with its own lock, permitting
multiple operations to simultaneously operate on different parts of the shared state. Reasoning about
fine-grained locking is challenging and error-prone. As a result, programmers often resort to coarsegrained locking, leading to limited scalability.
The Problem We would like to automatically add fine-grained locking to a library. A library encapsulates shared data with a set of procedures, which may be invoked by concurrently executing threads.
Given the code of a library, our goal is to add correct locking that ensures atomicity and permits a high
degree of parallelism. Specifically, we are interested in locking in which each shared object has its own
lock, and locks may be released before the end of the computation.
Our main insight is that we can use the restricted shape of pointer data structures to simplify reasoning about fine-grained locking and automatically infer efficient and correct fine-grained locking.
Domination Locking
We define a new fine-grained locking protocol called Domination Locking.
Domination Locking is a set of conditions that guarantees atomicity and deadlock-freedom. Domination Locking is designed to handle dynamically-manipulated recursive data structures by leveraging
natural domination properties of dynamic data structures.
7
8
C HAPTER 2. F INE -G RAINED L OCKING USING S HAPE P ROPERTIES
root
Key =10
Priority=99
Key =5
Priority=20
Key =15
Priority=72
Key =12
Priority=30
Key =18
Priority=50
Figure 2.1: An example for a Treap data structure.
Automatic Fine-Grained Locking We present an automatic technique to enforce the conditions of
Domination Locking. The technique is applicable to libraries where the shape of the shared memory is a
forest. The technique allows the shape of the heap to change dynamically as long as the shape is a forest
between invocations of library operations. In contrast to existing lock inference techniques, which are
based on two-phased locking (see Section 5.3), our technique is able to release locks at early points of
the computation.
Finally, as we demonstrate in Section 2.4 and Section 2.5, our technique adds effective and scalable
fine-grained locking in several practical data structures where it is extremely hard to manually produce
similar locking. Our examples include balanced search-trees [16, 46], a self-adjusting heap [81] and
specialized data structure implementations [18, 72].
Motivating Example Consider a library that implements the Treap data structure [16]. A Treap is a
search tree that is simultaneously a binary search tree (on the key field) and a heap (on the priority
field). An example is shown in Figure 2.1. If priorities are assigned randomly the resulting structure
is equivalent to a random binary search tree, providing good asymptotic bounds for all operations. The
Treap implementation consists of three procedures: insert, remove and lookup. Manually adding
fine-grained locking to the Treap’s code, is challenging since it requires considering many subtle details
of the Treap’s code. In contrast, our technique can add fine-grained locking to the Treap’s code without
considering its exact implementation details. (In other words, our technique does not need to understand
the actual code of the Treap).
For example, consider the Treap’s remove operation shown in Figure 2.2. To achieve concurrent
execution of its operations, we must release the lock on the root, while an operation is still in progress,
once it is safe to do so. Either of the loops (starting at Lines 4 or 12) can move the current context
to a subtree, after which the root (and, similarly, other nodes) should be unlocked. Several parts of
this procedure implement tree rotations that change the order among the Treap’s nodes, complicating
2.1. OVERVIEW
9
any correctness reasoning that depends on the order among nodes. Figure 2.3 shows an example of
manual fine-grained locking of the Treap remove operation. Manually adding fine-grained locking to
the code took an expert several hours and was an extremely error-prone process. In several cases, the
expert locking released a lock too early, resulting in an incorrect concurrent algorithm (e.g., the release
operation in Line 28).
Our technique is able to automatically produce fine-grained concurrency in the Treap’s code, by
relying on its tree shape. This is in contrast to existing alternatives, such as manually enforcing handover-hand locking, that require a deep understanding of code details.
Note that the dynamic tree locking protocol [17] is sufficient to ensure atomicity and deadlockfreedom of the Treap’s example. In fact, the locking shown in Figure 2.3 satisfies the conditions of the
dynamic tree locking protocol. But in contrast to the domination locking protocol which can be automatically enforced in the Treap’s code, none of the existing synchronization techniques (see Section 5.3)
can automatically enforce dynamic tree locking protocol for the Treap (even though the Treap is a single
tree).
2.1
Overview
In this section, we present an informal brief description of our approach.
Domination Locking We define a new locking protocol, called Domination Locking (abbreviated DL).
DL
is a set of conditions that are designed to guarantee atomicity and deadlock freedom for operations
of a well-encapsulated library.
DL
differentiates between a library’s exposed and hidden objects: exposed objects (e.g., the Treap’s
root) act as the intermediary between the library and its clients, with pointers to such objects being
passed back and forth between the library and its clients, while the clients are completely unaware of
hidden objects (e.g., the Treap’s intermediate nodes). The protocol exploits the fact that all operations
must begin with one or more exposed objects and traverse the heap-graph to reach hidden objects.
The protocol requires the exposed objects passed as parameters to an operation to be locked in a
fashion similar to two-phase-locking. However, hidden objects are handled differently. A thread is
allowed to acquire a lock on a hidden object if the locks it holds dominate the hidden object. (A set S
of objects is said to dominate an object u if all paths (in the heap-graph) from an exposed object to u
contains some object in S.) In particular, hidden objects can be locked even after other locks have been
released, thus enabling early release of other locked objects (hidden as well as exposed).
This simple protocol generalizes several fine-grained locking protocols defined for dynamically
changing graphs [17, 19, 28] and is applicable in more cases (i.e., the conditions of DL are weaker).
We use the DL’s conditions as the basis for our automatic technique.
10
C HAPTER 2. F INE -G RAINED L OCKING USING S HAPE P ROPERTIES
1 boolean remove(Node par, int key) {
2
Node n = null;
3
n = par.right; // right child has root
4
while (n != null && key != n.key) {
5
par = n;
6
n = (key < n.key) ? n.left : n.right;
7
}
8
if (n == null)
9
return false; // search failed, no change
10
Node nL = n.left;
11
Node nR = n.right;
12
while (true) { // n is the node to be removed
13
Node bestChild = (nL == null ||
14
(nR != null && nR.prio > nL.prio)) ? nR : nL;
15
if (n == par.left)
16
par.left = bestChild;
17
else
18
par.right = bestChild;
19
if (bestChild == null)
20
break; // n was a leaf
21
if (bestChild == nL) {
22
n.left = nL.right; // rotate nL to n’s spot
23
nL.right = n;
24
nL = n.left;
25
} else {
26
n.right = nR.left; // rotate nR to n’s spot
27
nR.left = n;
28
nR = n.right;
29
}
30
par = bestChild;
31
}
32
return true;
33 }
Figure 2.2: Removing an element from a treap by locating it and then rotating it into a leaf position.
(Our technique can add fine-grained locking to this code without understanding its details.)
2.1. OVERVIEW
11
1 boolean remove(Node par, int key) {
2
Node n = null;
3
acquire(par);
4
n = par.right;
5
if(n != null) acquire(n);
6
while (n != null && key != n.key) {
7
release(par);
8
par = n;
9
n = (key < n.key) ? n.left : n.right;
10
if(n != null) acquire(n);
11
}
12
if (n == null){ release(par); return false; }
13
Node nL = n.left; if(nL != null) acquire(nL);
14
Node nR = n.right; if(nR != null) acquire(nR);
15
while (true) {
16
Node bestChild = (nL == null ||
17
(nR != null && nR.prio > nL.prio)) ? nR : nL;
18
if (n == par.left)
19
par.left = bestChild;
20
else
21
par.right = bestChild;
22
release(par);
23
if (bestChild == null)
24
break;
25
if (bestChild == nL) {
26
n.left = nL.right;
27
nL.right = n;
28
// release(nL); // an erroneous release statment
29
nL = n.left;
30
if(nL != null) acquire(nL);
31
} else {
32
n.right = nR.left;
33
nR.left = n;
34
nR = n.right;
35
if(nR != null) acquire(nR);
36
}
37
par = bestChild;
38
}
39
return true;
40 }
Figure 2.3: Treap’s remove code with manual fine-grained locking.
12
C HAPTER 2. F INE -G RAINED L OCKING USING S HAPE P ROPERTIES
Key =10
Priority=99
par
n
Key =5
Priority=20
Key =15
Priority=72
nL
nR
Key =12
Priority=30
bestChild
Key =18
Priority=50
Figure 2.4: Execution of the Treap’s remove (Figure 2.2) in which the tree shape is violated. The node
pointed by nR and bestChild has two predecessors.
Automatic Locking of Forest-Based Libraries Our technique is able to automatically enforce DL,
in a way that releases locks at early points of the computation. Specifically, the technique is applicable
to libraries whose heap-graphs form a forest at the end of any complete sequential execution (of any
sequence of operations).
Note that existing shape analyses, for sequential programs, can be used to automatically verify if a
library satisfies this precondition (e.g., [76, 88] ). In particular, we avoid the need for explicitly reasoning
on concurrent executions.
For example, the Treap is a tree at the end of any of its operations, when executed sequentially.
Note that, during some of its operation (insert and remove) its tree shape is violated by a node
with multiple predecessors (caused by the tree rotations). An example for tree violation (caused by the
rotations in remove) is shown in Figure 2.4.
Our technique uses the following locking scheme: a procedure invocation maintains a lock on the
set of objects directly pointed to by its local variables (called the immediate scope). When an object
goes out of the immediate scope of the invocation (i.e., when the last variable pointing to that object is
assigned some other value), the object is unlocked if it has (at most) one predecessor in the heap graph
(i.e., if it does not violate the forest shape). If a locked object has multiple predecessors when it goes
out of the immediate scope of the invocation, then it is unlocked eventually when the object has at most
one predecessor. The forest-condition guarantees that every lock is eventually released.
To realize this scheme, we use a pair of reference counts to track incoming references from the heap
and local variables of the current procedure. All the updates to the reference count can be done easily
by instrumenting every assignment statement, allowing a relatively simple compile-time transformation. While we defer the details of the transformation to Section 2.4, Figure 2.5 shows the transformed
implementation of remove (from Figure 2.2). ASNL and ASNF are macros that perform assignment to
a local variable and a field, respectively, update reference counts, and conditionally acquire or release
locks according to the above locking scheme.
2.1. OVERVIEW
13
1 boolean remove(Node par, int key) {
2
Node n = null;
3
Take(par);
4
ASNL(n, par.right);
5
while (n != null && key != n.key) {
6
ASNL(par, n);
7
ASNL(n, (key < n.key) ? n.left : n.right);
8
}
9
if (n == null) {
10
ASNL(par, null);
11
ASNL(n, null);
12
return false;
13
}
14
Node nL = null; ASNL(nL, n.left);
15
Node nR = null; ASNL(nR, n.right);
16
while (true) {
17
Node bestCh = null; ASNL(bestCh, (nL == null ||
18
(nR != null && nR.prio > nL.prio)) ? nR : nL);
19
if (n == par.left)
20
ASNF(par.left, bestCh);
21
else
22
ASNF(par.right, bestCh);
23
if (bestCh == null) {
24
ASNL(bestCh, null);
25
break;
26
}
27
if (bestCh == nL) {
28
ASNF(n.left, nL.right);
29
ASNF(nL.right, n);
30
ASNL(nL, n.left);
31
} else {
32
ASNF(n.right, nR.left);
33
ASNF(nR.left, n);
34
ASNL(nR, n.right);
35
}
36
ASNL(par, bestCh);
37
ASNL(bestCh, null);
38
}
39
ASNL(par, null); ASNL(n, null); ASNL(nL, null);
40
ASNL(nR, null);
41
return true;
42 }
Figure 2.5: Augmenting remove with macros to dynamically enforce domination locking.
14
C HAPTER 2. F INE -G RAINED L OCKING USING S HAPE P ROPERTIES
Main Contributions The main contributions of this chapter can be summarized as follows:
• We introduce a new locking protocol entitled Domination Locking. We show that domination
locking can be enforced and verified by considering only sequential executions [17]: if domination locking is satisfied by all sequential executions, then atomicity and deadlock freedom are
guaranteed in all executions, including non-sequential ones.
• We present an automatic technique to generate fine-grained locking by enforcing the domination
locking protocol for libraries where the heap graph is guaranteed to be a forest in between operations. Our technique can handle any temporary violation of the forest shape constraint, including
temporary cycles.
• We present a performance evaluation of our technique on several examples, including balanced
search-trees [16, 46], a self-adjusting heap [81] and specialized data structure implementations [18,
72]. The evaluation shows that our automatic locking provides good scalability and performance
comparable to hand crafted locking (for the examples where hand crafted locking solutions were
available).
• We discuss extensions and additional applications of our suggestions.
2.2
Preliminaries
Our goal is to augment a library with concurrency control that guarantees strict conflict-serializability [75]
and linearizability [55]. In this section we formally define what a library is and the notion of strict
conflict-serializability for libraries.
Syntax and Informal Semantics A library defines a set of types and a set of procedures that may
be invoked by clients of the library, potentially concurrently. A type consists of a set of fields of type
boolean, integer, or pointer to a user-defined type. The types are private to the library: an object of a type
T defined by a library M can be allocated or dereferenced only by procedures of library M . However,
pointers to objects of type T can be passed back and forth between the clients of library M and the
procedures of library M . Dually, types defined by clients are private to the client. Pointers to clientdefined types may be passed back and forth between the clients and the library, but the library cannot
dereference such pointers (or allocate objects of such type). Procedures have parameters and local
variables, which are private to the invocation of the procedures. (Thus, these are thread-local variables.)
There are no static or global variables shared by different invocations of procedures. (However, our
results can be generalized to support them.)
2.2. P RELIMINARIES
15
stms = skip
| x = e(y1,...,yk)
| assume(b)
| x = new R()
| x = y.f | x.f = y
| acquire(x) | release(x)
| return(x)
Figure 2.6:
Primitive instructions, b stands for a local boolean variable, e(y1,...,yk) stands for an
expression over local variables.
We assume that body of a procedure is represented by a control-flow graph. We refer to the vertices
of a control-flow graph as program points. The edges of a control-flow graph are annotated with primitive instructions, shown in Figure 2.6. Conditionals are encoded by annotating control-flow edges with
assume statements. Without loss of generality, we assume that a heap object can be dereferenced only
in a load (“x = y.f”) or store (“x.f = y”) instruction. Operations to acquire or release a lock refer
to a thread-local variable (that points to the heap object to be locked or unlocked). The other primitive
instructions reference only thread-local variables.
We present a semantics for a library independent of any specific client. We define a notion of
execution that covers all possible executions of the library that can arise with any possible client, but
restricting attention to the part of the program state “owned” by the library. (In effect, our semantics
models what is usually referred to as a “most-general-client” of the library.) For simplicity, we assume
that each procedure invocation is executed by a different thread, which allows us to identify procedure
invocations using a thread-id. We refer to each invocation of a procedure as a transaction. We model a
procedure invocation as a creation of a new thread with an appropriate thread-local state. We describe
the behavior of a library by the relation −→. A transition σ −→ σ 0 represents the fact that a state σ can
be transformed into a state σ 0 by executing a single instruction.
Transactions share a heap consisting of an (unbounded) set of heap objects. Any object allocated
during the execution of a library procedure is said to be a library (owned) object. In fact, our semantics
models only library owned objects. Any library object that is returned by a library procedure is said
to be an exposed object. Other library objects are hidden objects. Note that an exposed object remains
exposed forever. A key idea encoded in the semantics is that at any point during execution a new
procedure invocation may occur. The only assumption made is that any library object passed as a
procedure argument is exposed; i.e., the object was returned by some earlier procedure invocation.
Each heap allocated object serves as a lock for itself. Locks are exclusive (i.e., a lock can be held by
at most one transaction at a time). The execution of a transaction trying to acquire a lock (by an acquire
16
C HAPTER 2. F INE -G RAINED L OCKING USING S HAPE P ROPERTIES
v
∈ V al = Loc ] Z ] {true, false, null}
ρ
∈ E
h ∈ H
= V ,→ V al
= Loc ,→ F ,→ V al
l
∈ L
= Loc
s
∈ S
= K × E × 2L
σ ∈ Σ
= H × (Loc ,→ {true, false}) × (T ,→ S)
Figure 2.7: Semantic domains
Instruction
Transition
ht,ei
Side Condition
hk0 , ρ, Li]i
skip
σ −→ hh, r, %[t 7→
x = e(y1 , . . . , yk )
σ −→ hh, r, %[t 7→ hk0 , ρ[x 7→ [[e]](ρ(y1 ), . . . , ρ(yk ))], Li]i
assume(b)
σ −→ hh, r, %[t 7→ hk0 , ρ, Li]i
x = newR()
σ −→ hh[a 7→ o], r, %[t 7→ hk0 , ρ[x 7→ a], Li]i
x = y.f
σ −→ hh, r, %[t 7→ hk0 , ρ[x 7→ h(ρ(y))(f )], Li]i
ht,ei
ht,ei
ρ(b) = true
ht,ei
a 6∈ dom(h) ∧ ι(R)() = o
ht,ei
ht,ei
x.f = y
σ −→ hh[ρ(x) 7→ (h(ρ(x))[f 7→ ρ(y)])], r, %[t 7→
acquire(x)
σ −→ hh, r, %[t 7→ hk0 , ρ, L ∪ {ρ(x)}i]i
release(x)
σ −→ hh, r, %[t 7→ hk0 , ρ, L \ {ρ(x)}i]i
ht,ei
ρ(y) ∈ dom(h)
hk0 , ρ, Li]i
ρ(x) ∈ dom(h)
ρ(x) ∈ L ∨ ∀hk00 , ρ0 , L0 i ∈ range(%) : ρ(x) 6∈ L0
ht,ei
ht,ei
return(x)
σ −→ hh, r[ρ(x) 7→ true], %[t 7→
return(x)
σ −→ hh, r, %[t 7→ hk0 , ρ, Li]i
hk0 , ρ, Li]i
ht,ei
ρ(x) ∈ L
ρ(x) ∈ dom(h)
ρ(x) 6∈ dom(h)
Table 2.1: The semantics of primitive instructions. For brevity, we use the shorthands: σ = hh, r, %i
and %(t) = hk, ρ, Li, and omit (k, k 0 ) = e ∈ CFGt from all side conditions.
statement) which is held by another transaction is blocked until a time when the lock is available (i.e., is
not held by any transaction). Locks are reentrant; an acquire statement has no impact when it refers
to a lock that is already held by the current transaction. A transaction cannot release a lock that it does
not hold.
Whenever a new object is allocated, its boolean fields are initialized to false, its integer fields are
initialized to 0, and pointer fields are initialized to null. Local variables are initialized in the same
manner.
Semantics Figure 2.7 defines the semantic domains of a state of a library, and meta-variables ranging
over them. Let t ∈ T be the domain of transaction identifiers. A state σ = hh, r, %i ∈ Σ of a library
is a triple: h assigns values to fields of dynamically allocated objects. A value v ∈ Val can be either
a location, an integer, a boolean value,or null. r maps exposed objects to true, and hidden objects to
false. Finally, % associates a transaction t with its transaction local state %(t). A transaction-local state
s = hk, ρ, Li ∈ S is: k is the value of the transaction’s program counter, ρ records the values of its local
2.2. P RELIMINARIES
17
variables, and L is the transaction’s lock set which records the locks that the transaction holds.
The behavior of a library is described by the relations −→ and ⇒. The relation −→ is a subset of
Σ × (T × (K × K)) × Σ, and is defined in Table 2.1.1
ht,ei
A transition σ −→ σ 0 represents the fact that σ can be transformed into σ 0 via transaction t executing
the instruction annotating control-flow edge e. Invocation of a new transaction is modeled by the relation
t
⇒⊆ Σ × T × Σ; we say that hh, r, %i ⇒ σ 0 if σ 0 = hh, r, %[t 7→ s]i where t 6∈ dom(%) and s is any
valid initial local state: i.e., s = hentry, ρ, {}i, where entry is the entry vertex, and ρ maps local
variables and parameters to appropriate initial values (based on their type). In particular, ρ must map
any pointer parameter of a type defined by the library to an exposed object (i.e., an object u in h such
t
that r(u) = true). We write σ −→ σ 0 , if there exists t such that σ ⇒ σ 0 or there exists ht, ei such that
ht,ei
σ −→ σ 0 .
Running Transactions
Each control-flow graph of a procedure has two distinguished control points:
an entry site from which the transaction starts, and an exit site in which the transaction ends (if a CFG
edge is annotated with a return statement, then this edge points to the exit site of the procedure). We
say that a transaction t is running in a state σ, if t is not in its entry site or exit site. An idle state, is a
state in which no transaction is running.
Executions The initial state σI has an empty heap and no transactions. A sequence of states π =
σ0 , . . . , σk is an execution if the following hold:
(i) σ0 is the initial state, (ii) for 0 ≤ i < k,
σi −→ σi+1 . An execution π = σ0 , . . . , σk is a complete execution, if σk is idle. An execution
π = σ0 , . . . , σk is a sequential execution, if for each 0 ≤ i ≤ k at most one transaction in σi is running.
An execution is non-interleaved if transitions of different transactions are not interleaved (i.e., for every
pair of transactions ti 6= tj either all the transitions executed by ti come before any transition executed
by tj , or vice versa). Note that, a sequential execution is a special case of a non-interleaved execution. In a sequential execution a new transaction starts executing only after all previous transactions
have completed execution. In a non-interleaved execution, a new transaction can start executing before
a previous transaction completes execution, but the execution is not permitted to include transitions by
the previous transaction once the new transaction starts executing. We say that a sequential execution is
completable if it is a prefix of a complete sequential execution.
Schedules The schedule of an execution π = σ0 , . . . , σk is a sequence ht0 , e0 i, . . . , htk−1 , ek−1 i such
hti ,ei i
t
i
that for 0 ≤ i < k: σi −→ σi+1 , or σi ⇒
σi+1 and ei = einit (where einit is disjoint with all
edges in the CFG). We say that a sequence ξ = ht0 , e0 i, . . . , htk−1 , ek−1 i is a feasible schedule, if ξ
1
For simplicity of presentation, we use an idempotent variant of acquire (i.e., acquire has no impact when the lock
has already owned by the current transaction). We note that this variant is permitted by the Lock interface from the
java.util.concurrent.locks package, and can easily be implemented in languages such as Java and C++.
18
C HAPTER 2. F INE -G RAINED L OCKING USING S HAPE P ROPERTIES
is a schedule of an execution. The schedule of a transaction t in an execution is the (possibly noncontiguous) subsequence of the execution’s schedule consisting only of t’s transitions.
Notice that each feasible schedule uniquely defines a single execution because: (i) we assume that
there exists a single intial state; and (ii) each instruction defined inTable 2.1 is a partial function (in our
semantics nondeterminism is modeled by permitting CFG nodes with several outgoing edges).
Graph-Representation The heap (shared memory) of a state identifies an edge-labelled multidigraph
(a directed graph in which multiple edges are allowed between the same pair of vertices), which we call
the heap graph. Each heap-allocated object is represented by a vertex in the graph. A pointer field f in
an object u that points to an object v is represented by an edge (u, v) labelled f . (Note that the heap
graph represents only objects owned by the library. Objects owned by the client are not represented in
the heap graph.) .
We define the allocation id of an object in an execution to be the pair (t, i) if the object was allocated
by the i-th transition executed by a transaction t. An object o1 in an execution π1 corresponds to
an object o2 in an execution π2 iff their allocation ids are the same. We compare states and objects
belonging to different executions modulo this correspondence relation.
Strict Conflict-Serializability and Linearizability
Given an execution, we say that two transitions
conflict if: (i) they are executed by two different transactions, (ii) they access some common object
(i.e., read or write fields of the same object).
Executions π and π 0 are said to be conflict-equivalent if they consist of the same set of transactions,
and the schedule of every transaction t is the same in both executions, and the executions agree on the
order between conflicting transitions (i.e., the ith transition of a transaction t precedes and conflicts
with the jth transition of a transaction t0 in π, iff the former precedes and conflicts with the latter in π 0 ).
Conflict-equivalent executions produce the same state [86]. An execution is conflict-serializable if it is
conflict-equivalent with a non-interleaved execution.
We say that an execution π is strict conflict-serializable if it is conflict-equivalent to a non-interleaved
execution π 0 where a transaction t1 completes execution before a transaction t2 in π 0 if t1 completes execution before a transaction t2 in π.
Assume that all sequential executions of a library satisfy a given specification Φ. In this case, a
strict conflict-serializable execution is also linearizable [56] with respect to specification Φ.2 Thus,
correctness in sequential executions combined with strict conflict-serializability is sufficient to ensure
linearizability.
2
Strict conflict-serializability guarantees the atomicity and the run-time order required by the linearizability property.
Moreover, note that according to the linearizability property (as defined in [56]) the execution may contain transactions that
will never be able to complete.
2.3. D OMINATION L OCKING
19
The above definitions can also be used for feasible schedules because (as explained earlier) a feasible
schedule uniquely defines a single execution.
2.3
Domination Locking
In this section we present the Domination Locking Protocol (abbreviated DL). We show that if every
sequential execution of a library satisfies DL and is completable, then every concurrent execution of the
library is strict conflict-serializable and is a prefix of a complete-execution (i.e., atomicity and deadlockfreedom are guaranteed).
The locking protocol is parameterized by a total order ≤ on all heap objects, which remains fixed
over the whole execution.
Definition 2.1 Let ≤ be a total order of heap objects.
We say that an execution satisfies the Domination Locking protocol, with respect to ≤, if it satisfies the
following conditions:
1. A transaction t can access a field of an object u, only if u is currently locked by t.
2. A transaction t can acquire an exposed object u, only if t has never acquired an exposed object v
such that u ≤ v.
3. A transaction t can acquire an exposed object, only if t has never released a lock.
4. A transaction t can acquire a hidden object u, only if every path between an exposed object to u
includes an object which is locked by t.
Intuitively, the protocol works as follows. Requirement (1) prevents race conditions where two
transactions try to update an object neither has locked. Conditions (2) and (3) deal with exposed objects.
Very little can be assumed about an object that has been exposed; references to it may reside anywhere
and be used at any time by other transactions that know nothing about the invariants t is maintaining.
Thus, as is standard, requirements (2) and (3) ensure all transactions acquire locks on exposed objects
in a consistent order, preventing deadlocks. The situation with hidden objects is different, and we know
more: other threads can only gain access to t’s hidden objects through some chain of references starting
at an exposed object, and so it suffices for t to guard each such potential access path with a lock. Another
way of understanding the protocol is that previous proposals (e.g., [28, 59, 63, 80]) treat all objects as
exposed, whereas domination locking also takes advantage of the information hiding of abstract data
types to impose a different, and weaker, requirement on encapsulated data. In particular, no explicit
order is imposed on the acquisition or release of locks on hidden objects, provided condition (4) is
maintained.
20
C HAPTER 2. F INE -G RAINED L OCKING USING S HAPE P ROPERTIES
Theorem 2.2 Let ≤ be a total order of heap objects. If every sequential execution of the library is
completable and satisfies Domination Locking with respect to ≤, then every execution of the library is
strict conflict-serializable, and is a prefix of a complete-execution.
This theorem implies that a concurrent execution cannot deadlock, since it is guaranteed to be the
prefix of a complete-execution.
Domination Locking generalizes previously proposed protocols such as Dynamic Tree Locking
(DTL) protocol and Dynamic Dag Locking (DDL) protocol [17], which themselves subsume idioms
such as hand-over-hand locking. The DTL and DDL protocols were inspired by database protocols for
trees and DAGs ([28, 59, 63, 80]), but customized for use in programs where shape invariants may be
temporarily violated.
In particular, any execution that satisfies DTL or DDL can be shown to satisfy DL. In comparing these
protocols, it should be noted that DTL and DDL were described in a restricted setting where the exposed
objects took the form of a statically fixed set of global variables. DL generalizes this by permitting a
dynamic set of exposed objects (which can grow over time). More importantly, DL is a strict generalization of DTL and DDL: executions that satisfy DL might not satisfy either DTL or DDL. Among other
things, DL does not require the heap graph to satisfy any shape invariants. Thus, the above theorem generalizes a similar theorem established for DDL and DTL in [17]. The above theorem, like those in [17],
is important because it permits the use of sequential reasoning, e.g., to verify if a library guarantees
strict conflict-serializability via DL. More interestingly, this reduction theorem also simplifies the job of
automatically guaranteeing strict conflict-serializability via DL, as we illustrate in this paper.
The requirement for a total order of exposed objects, does not restrict its applicability since in any
conventional programming environment such order can be obtained (e.g., by using memory address of
objects, or by using a simple mechanism that assigns unique identifiers to objects). Furthermore, no
order is needed when each transaction accesses a single exposed object.
Proof for Theorem 2.2
We now present the proof for Theorem 2.2. We start by discussing several
basic properties of the programming model and domination locking. We then show that domination
locking permits ordering the transactions in a way that allows all transaction to complete one after
the other (assuming that all sequential executions are completable) such that the resultant complete
execution is equivalent to a non-interleaved execution.
An execution is said to be well-locked if every transaction in the execution accesses a field of an
object, only when it holds a lock on that object. We say that a transaction t is in phase-1 if it is still
running and has never released a lock. Otherwise, we say that t is in phase-2 (i.e., t is in phase-2 if it
has already completed, or it has released at least one lock).
2.3. D OMINATION L OCKING
21
Lemma 2.3 Let ξ = ξp ξt ξs be any feasible well-locked schedule, where ξt is the schedule of a transaction t. If t is in phase-1 (after ξ), then there is no conflict between ξt and ξs (in ξ) .
Proof Transaction t has never released a lock in ξt , hence the transactions in ξs are not able to acquire
locks acquired by t in ξt . Since ξp ξt ξs is well-locked, the transactions in ξs do not access objects that
are accessed by t in ξt . Hence there is no conflict between ξt and ξs . We say that an ni-execution (non-interleaved execution) is phase-ordered if all phase-2 transactions
precede phase-1 transactions.
Lemma 2.4 Any feasible well-locked ni-schedule ξ1 ξ2 · · · ξn (where each ξi is executed by ti ) is conflictequivalent to a well-locked phase-ordered ni-schedule ξi1 · · · ξin .
Proof For any 1 ≤ i ≤ n such that ti is in phase-1, the ni-schedule ξ1 ξ2 · · · ξn is conflict-equivalent
to the feasible ni-schedule ξ1 · · · ξi−1 ξi+1 · · · ξn ξi (because of Lemma 2.3). By repeatedly using this
property we can move all phase-1 transactions to the end of the schedule. Hence there exists a feasible
well-locked phase-ordered ni-schedule ξi1 · · · ξin that is conflict-equivalent to ξ1 ξ2 · · · ξn . In the following we assume that ≤h is a total order of all heap objects. We assume that ≤h has a
minimal value ⊥ (i.e., if u is an object then ⊥ ≤h u). We say that u <h v, if u 6= v and u ≤h v.
We say that max(σ, t) = u, if u is the maximal exposed object that is locked by transaction t in
state σ (i.e., u is locked by t in σ, and every exposed object v that is locked by t in σ satisfies v ≤h u).
If no exposed object is locked by t in σ, then max(σ, t) = ⊥.
Let π = α1 · · · αk be a phase-ordered execution3 . Let s be the last state of π. We say that π is
fully-ordered, if for every αi and αj that are in phase-1 the following holds: if i < j then max(s, tj ) ≤h
max(s, ti ).
Lemma 2.5 Any feasible well-locked ni-schedule ξ = ξ1 ξ2 · · · ξn (where each ξi is executed by ti ) is
conflict-equivalent to a well-locked fully-ordered ni-schedule ξi1 · · · ξin .
Proof Identical to to Lemma 2.4. Here we reorder the phase-1 transactions according to ≤h : let i 6= i0
such that ti and ti0 are in phase 1; if max(s, ti0 ) ≤h max(s, ti ) (where s is the state after ξ) , then we
move ξi to the end of the schedule before moving ξi0 . We say that a set S of objects dominates an object u (in a given state) if every path from an exposed
object to u contains some objects from S. We say that a transaction t blocks an object u (in a given
state) if the set of objects locked by t dominates u.
3
When we write α1 · · · αk or β1 · · · βk or α1 β1 · · · αk βk , we mean that each αi (and βi ) is executed by transaction ti .
22
C HAPTER 2. F INE -G RAINED L OCKING USING S HAPE P ROPERTIES
Lemma 2.6 Let π = π1 π2 be a sequential execution that follows domination locking. Let σ be the state
at the end of π1 . Let t be a transaction in phase 2 at the end of π1 . For any object o in state σ, t can
access o during the (remaining) execution π2 only if t blocks o in σ.
Proof Let u be an object in σ which is not blocked by t in σ. Hence σ contains a path P from an
exposed object e to u, such that none of the objects in P are locked by t. We inductively show that none
of the objects in P are locked, and hence not accessed or modified during π2 .
Let v be an object in P . We write L(v) to denote the length of the shortest path (in state σ) from e to v.
We prove by induction on L(v) that t does not lock v in π2
If L(v) = 0, then v = e (v is an exposed object). Because of condition 3, transaction t does not lock v
in π2 .
If L(v) > 0, then from the induction hypothesis v is not blocked by t in π2 . Because of condition 4,
transaction t does not lock v in π2 . A conflict-predecessor of a step (t, e) in execution π, is a step that precedes (t, e) in π and uses (i.e.,
accesses or locks) the same object and is executed by t0 6= t.
Let π1 and π2 be two executions such that for every transaction t the schedule of t in π1 is a prefix of the
schedule of t in π2 . π2 is said to be a conflict-equivalent extension of π1 if every step (t, e) in π1 has the
same conflict-predecessors as the corresponding step in π2 . π2 is said to be an equivalent completion of
π1 if it is a complete execution and is a conflict-equivalent extension of π1 .
Note that if an execution α1 β1 · · · αn βn is a conflict-equivalent extension of π = α1 · · · αn , then the
execution α1 · · · αn β1 · · · βn is also a conflict-equivalent extension of π.
Lemma 2.7 Let πni be a well-locked ni-execution with a schedule α1 · · · αk . Let πe be a conflictequivalent extension of πni with a schedule α1 β1 · · · αk βk . Assume that ti blocks an object u at the end
of αi in πe . Then, the execution of αi+1 · · · αk in πni does not access u. (Note that in this case ti might
not actually block object u at the end of αi in πni ).
Proof Let σ denote the state at the end of αi in πni . For any object x in σ accessed by the execution
of αi+1 · · · αk in πni we define the path Px inductively as follows. If x is an exposed object in σ, then
Px is defined to be the sequence [x]. If x is a hidden object in σ, then the execution of αi+1 · · · αk must
have dereferenced some field of some object that pointed to x. Consider the first field y.f dereferenced
by αi+1 · · · αk that pointed to x, where y represents an object. We define Px to consist of the sequence
Py followed by x.
Assume that u is accessed during the execution of αi+1 · · · αk in πni . Hence Pu exists at the end of
αi in πni . By the definition of a conflict-equivalent extension, Pu also exists at the end of execution of
αi in πe . (in particular, for 1 ≤ j ≤ i the execution of βj in πe does not access any object in Pu ). Hence,
2.3. D OMINATION L OCKING
23
ti must hold a lock on some object y in this path (at the end of αi in both πni as well as πe ). Since πni is
well-locked, the execution of αi+1 · · · αk in πni could not have locked y which is a contradiction. Hence
u is not accessed during the execution of αi+1 · · · αk in πni Lemma 2.8 Let π = α1 · · · αn be a well-locked fully-ordered execution with at least one incomplete
transaction. Let tk be the first incomplete transaction in π (i.e., k is the minimal number such that tk is
incomplete). If every sequential execution of a library follows domination locking and is completable,
then π has an equivalent extension α1 · · · αk βk αk+1 · · · αn in which transaction tk is completed.
Proof Since α1 · · · αk represents a sequential execution, it has a completion α1 · · · αk βk that follows
domination locking. We consider the following cases.
Case 1: After π transaction tk is in phase-2. Let σ represent the state produced by the execution of
α1 · · · αk . From Lemma 2.6, all objects in σ accessed during the execution of βk (in α1 · · · αk βk ) must
be blocked by tk in σ. From Lemma 2.7 the execution of αk+1 · · · αn (in α1 · · · αn ) cannot access any
object blocked by tk in σ. Hence the schedule α1 · · · αk βk αk+1 · · · αn is feasible and is a conflictequivalent extension of α1 · · · αn .
Case 2: k = n. Here α1 · · · αk βk is the equivalent extension.
Case 3: k < n, and after π transaction tk is in phase-1. Let σ represent the state produced by the
execution of α1 · · · αk−1 . Let k < m ≤ n. Because of Lemma 2.3, no conflict-dependence can exist
between the running transactions (because they are all in phase-1), hence α1 · · · αk−1 αm represents a
feasible sequential execution that follows domination locking.
Let u be an exposed object in σ that is accessed by tk in α1 · · · αk−1 αk βk , we will show that u is
not accessed by tm in α1 · · · αk−1 αm . If u is accessed or locked by αk in α1 · · · αk−1 αk βk , then u
is not accessed or locked by αm in α1 · · · αk−1 αm (because tk and tm have no conflict in π). Otherwise, u is locked by βk in α1 · · · αk−1 αk βk . Let σ 0 denote the state produced by the execution of π.
max(σ 0 , tm ) <h max(σ 0 , tk ) (because after the fully-ordered execution π, tk and tm are in phase-1 and
tk precedes tm ). max(σ 0 , tk ) <h u (because of condition 2). Hence, max(σ 0 , tm ) <h u Hence, u is
not accessed or locked by tm in α1 · · · αk−1 αm .
Let v be a hidden object in σ that is accessed by tk in α1 · · · αk−1 αk βk . we will show that v is not
accessed by tm in α1 · · · αk−1 αm . v is necessarily reachable from exposed objects in σ, hence there
exists a path P (in σ) from an exposed object w to v, such that w is the only exposed object in P . tk
accesses w in α1 · · · αk−1 αk βk (because of conditions 4 and 1). Assume that v is accessed by tm in
α1 · · · αk−1 αm , then tm accesses w in α1 · · · αk−1 αm (conditions 4 and 1). But we have showed that
this is not possible for exposed objects. Therefore v is not accessed by tm in α1 · · · αk−1 αm .
We have showed, for every k < m ≤ n, tk does not access (in α1 · · · αk−1 αk βk ) any object that
is accessed by tm (in α1 · · · αk−1 αm ). Hence, α1 · · · αk βk αk+1 · · · αn is an equivalent extension of
24
C HAPTER 2. F INE -G RAINED L OCKING USING S HAPE P ROPERTIES
α1 · · · αn . Lemma 2.9 Let π = α1 · · · αn be a well-locked fully-ordered execution. If every sequential execution of a library follows domination locking and is completable, then π has an equivalent completion
α1 β1 · · · αn βn .
Proof If π is not a complete execution, we construct an equivalent completion α1 β1 · · · αn βn by repeatedly applying Lemma 2.8 (i.e., if π contains k > 0 incomplete transactions, lemma 2.8 is used to
produce a equivalent extension for π that contains k − 1 incomplete transactions). Lemma 2.10 Let ξ = ξp ξt ξs be any feasible well-locked ni-schedule, where ξt is the schedule of a
transaction t. If ξ · (t, e) is feasible, then ξp ξt · (t, e) is also feasible.
Proof Assume that ξ · (t, e) is feasible. We show that ξp ξt · (t, e) is feasible. The only sources of
infeasibility are when the step (t, e) involves a conditional branch (i.e., an assume statement) or an
attempt to acquire a lock. We make the simplifying assumption that an assume statement refers to only
thread-local variables. (Note that there is no loss of generality here since any statement “assume e”
can be rewritten as “x = e; assume x” where x is a thread-local variable.) As a result, ξp ξt · (t, e)
must be feasible if (t, e) involves a conditional branch. Now, consider the case where (t, e) involves an
“acquire x” instruction where x is a thread-local variable. If the object x points to is unlocked at the
end of ξp ξt ξs , it must be unlocked at the end of ξp ξt as well. Hence, feasibility follows in this case as
well. Lemma 2.11 If every sequential execution of a library follows domination locking and is completable,
then every ni-execution is well-locked.
Proof We prove by induction on the length of the executions. Let ξ be a schedule of a well-locked
ni-execution. We will prove that if ξ · (t, e) is feasible, then it is a schedule of a well-locked execution.
Assume that after ξ, the step (t, e) accesses an object u. From Lemma 2.5, ξ is conflict-equivalent to a
fully-ordered ni-schedule ξ 0 = α1 · · · αn .
We consider the following cases.
Case 1: there exists i such that t = ti and 1 ≤ i < n.
From Lemma 2.10, α1 · · · αi · (ti , e) is a feasible schedule. From the induction hypothesis, ti holds a
lock on u after α1 · · · αi . Hence, ti holds a lock on u after ξ 0 = α1 · · · αn . Hence, ti holds a lock on u
after ξ.
2.3. D OMINATION L OCKING
25
Case 2: t = tn .
From Lemma 2.9, ξ 0 has an equivalent completion with the schedule α1 β1 · · · αn βn .
We define ξ 00 = α1 β1 · · · αn−1 βn−1 αn (this is a prefix of α1 β1 · · · αn βn ).
The step (tn , e) accesses u after ξ 00 (because tn has the same local state after ξ 0 and ξ 00 ). Since ξ 00 · (tn , e)
represents a sequential execution, u is locked by tn after ξ 00 . Hence, tn holds a lock on u after α1 · · · αn .
Hence, tn holds a lock on u after ξ.
Case 3: t does not appear in ξ
According the definition of a schedule, the first step of a transaction does not access an object. Lemma 2.12 If every sequential execution of a library follows domination locking and is completable,
then every execution π is conflict-equivalent to a fully-ordered execution π 0 such that a transaction t
completes before a transaction t0 begins in π 0 if t completes before t0 begins in π.
Proof We prove this by induction on the length of the execution. Consider any execution with a schedule ξ · (ti , e). By the inductive hypothesis, the execution of ξ is conflict-equivalent to a fully-ordered
execution with the schedule ξ 0 = α1 · · · αk 4 such that a transaction t completes before a transaction t0
begins in ξ 0 if t completes before t0 begins in ξ.
From Lemma 2.11, ξ 0 is well-locked. We consider the following cases:
Case 1: After α1 · · · αi , transaction ti is in phase 2, and (ti , e) does not access an heap object.
In this case, α1 · · · αk · (ti , e) is conflict equivalent to α1 · · · αi · (ti , e) · αi+1 · · · αk
Case 2: After α1 · · · αi , transaction ti is in phase 2, and (ti , e) accesses an heap object u.
According to Lemma 2.9, ξ 0 = α1 · · · αk has an equivalent completion ξ 00 = α1 β1 · · · αk βk .
Let ξ 000 = α1 β1 · · · αi−1 βi−1 αi (ξ 000 is a prefix of ξ 00 ).
ti has the same local state after α1 · · · αi and ξ 000 (according the definition of conflict-equivalent extension).
According to Lemma 2.10, α1 · · · αi · (ti , e) is a feasible schedule, so ξ 000 · (ti , e) is also a feasible schedule.
Also ξ 000 · (ti , e) represents a sequential execution (which follows domination locking).
Hence according to Lemma 2.6, ti blocks u after ξ 000 .
Hence, ti blocks u after αi in ξ 00 .
Hence, according to Lemma 2.7, αi+1 · · · αk does not access u in ξ 0 = α1 · · · αk .
Therefore, α1 · · · αk · (ti , e) is conflict equivalent to α1 · · · αi · (ti , e) · αi+1 · · · αk
Case 3: Transaction ti is in phase 1 after α1 · · · αi .
Because of Lemma 2.3, we can reorder all phase-1 transactions and ti (even if ti is in phase-2 after
α1 · · · αk · (ti , e)).
4
Note that 1 ≤ i ≤ k and αi may be empty
26
C HAPTER 2. F INE -G RAINED L OCKING USING S HAPE P ROPERTIES
If ti is in phase-2 after α1 · · · αk · (ti , e), then we can construct the fully-ordered equivalent execution
by moving αi · (ti , e) just before all the phase-1 transactions.
Otherwise (ti is still in phase-1 after α1 · · · αk · (ti , e)), we can construct the fully-ordered equivalent
execution by moving αi · (ti , e) to the right place according to the max values (between the phase-1
transactions). Proof [for Theorem 2.2] From Lemma 2.12, we know that every execution of the library is strict conflictserializable.
We now want to show that every execution is also a prefix of a strict conflict-serializable. Consider any
execution π. According to Lemma 2.12, there exists a fully-ordered execution π 0 = α1 · · · αn which is
conflict equivalent to π. According to Lemma 2.9, π 0 has an equivalent completion α1 β1 · · · αn βn . According the definition of conflict-equivalent extension, there exists execution α1 · · · αn β1 · · · βn . Hence,
π 0 is a prefix of a complete execution. Since π and π 0 end with the same state, π is also a prefix of a
complete execution. 2.4
Enforcing DL in Forest-Based Libraries
In this section, we describe our technique for automatically adding fine-grained locking to a library
when the library operates on heaps of restricted shape. Specifically, the technique is applicable to
libraries that manipulate data structures with a forest shape, even with intra-transaction violations of
forestness. For example, the Treap data structure (mentioned at the beginning of this chapter) has a
tree shape which is temporarily violated by tree-rotations (during tree-rotations a node may have two
parents). Our technique has no limit on the number of violations or their effect on the data structures
shape, as long as they are eliminated before the end of the transaction.
In Section 2.4.1, we describe the shape restrictions required by our technique, and present dynamic
conditions that are enforced by our source transformation. We refer to these conditions as the Eager
Forest-Locking protocol (EFL).
In Section 2.4.2, we show how to automatically enforce EFL by a source-to-source transformation
of the original library code.
2.4.1
Eager Forest-Locking
When the shape of the heap manipulated by the library is known to be a forest (possibly with temporary
violations), we can enforce domination locking by dynamically enforcing the conditions outlined below.
First, we define what it means for a library to be forest-based. We say that a hidden object u is
2.4. E NFORCING DL IN F OREST-BASED L IBRARIES
27
consistent in a state σ, if u has at most one incoming edge in σ.5 We say that an exposed object u is
consistent in a state σ, if it does not have any incoming edges in σ.
Definition 2.13 A library M is a forest-based library, if in every sequential execution, all objects in idle
states are consistent.
For a forest-based library, we define the following Eager Forest-Locking conditions, and show that
they guarantee that the library satisfies the domination locking conditions.
Eager Forest-Locking Requirements Given a transaction t, we define t’s immediate scope as the
set of objects which are directly pointed to by local variables of t. Intuitively, eager forest-locking is
a simple protocol: a transaction t should acquire a lock on an object whenever it enters t’s immediate
scope and t should release a lock on an object whenever the object is out of t’s immediate scope and is
consistent. The protocol description below is a bit complicated because the abovementioned invariant
will be temporarily violated while an object is being locked or unlocked. (In particular, conditions 1, 2,
and 4 restrict the extent to which the invariant can be violated.)
Definition 2.14 Let ≤ be a total order of heap objects. We say that an execution satisfies the Eager
Forest-Locking (EFL) with respect to ≤, if it satisfies the following conditions:
1. A transaction t can access a field of an object, only if all objects in t’s immediate scope are locked
by t.6
2. A transaction t can release an object, only if all objects in t’s immediate scope are locked by t.
3. A transaction t can release a lock of an object u, only if u is consistent.
4. Immediately after a transaction t releases a lock of an object u, t removes u from its immediate
scope (i.e., the next instruction of t removes u from immediate scope by writing to a local variable
that points to u)
5. A transaction t can acquire an exposed object u, only if t has never acquired an exposed object v
such that u ≤ v.
In contrast to the DL conditions, the EFL conditions can directly be enforced by instrumenting the
code of a given library because all its dynamic conditions can be seen as conditions on its immediate
scope and local memory. Such code instrumentation is allowed to only consider sequential executions,
as stated by the following theorems:
5
In the graph representation of the heap. Recall that the heap-graph contains only library owned objects. In particular, this
definition does not consider pointers to exposed objects that may be stored in client objects
6
Notice that t can accesses a field of object o only by executing x.f=y or y=x.f when x points to o. Furthermore,
an unlocked object may be inside t’s immediate scope (because the programming model permits pointing to an object before
locking this object).
28
C HAPTER 2. F INE -G RAINED L OCKING USING S HAPE P ROPERTIES
Theorem 2.15 Let ≤ be a total order of heap objects. Let π be a sequential execution of a forest-based
library. If π satisfies EFL with respect to ≤, then π satisfies DL with respect to ≤.
From Theorem 2.2 and Theorem 2.15 we conclude the following.
Theorem 2.16 Let ≤ be a total order of heap objects.
If every sequential execution of a forest-based library is completable and satisfies EFL with respect to ≤,
then every execution of this library is strict conflict-serializable, and is a prefix of a complete-execution.
Proof for Theorem 2.15 We now present the proof for Theorem 2.15. We write DLi to denote condition (i) of the Domination Locking protocol (DL). We write EFLi to denote condition (i) of the Eager
Forest-Locking protocol (EFL).
Lemma 2.17 Let M be a forest-based library. Let π = σ0 , . . . , σk be a sequential execution of M that
satisfies the EFL protocol. If a hidden object u is not consistent at σk , then u is locked at σk .
Proof Let ξ = ht0 , e0 i, . . . , htk−1 , ek−1 i be the schedule of π. Let t be the last transaction in π. At the
beginning of t, the object u is consistent (because M is a forest-based library). Let i be the maximal
number such that t = ti , ei is annotated with x.f=y and y points to u at σi (such i exists because u is
not consistent at σk ). Because of EFL1 , u is locked in σi . Because of EFL3 , u is not released after σi .
Hence, u is locked at σk . Lemma 2.18 Let π be a sequential execution of a forest-based library. If π satisfies EFL , then π
satisfies DL3
Proof Let t be a transaction in π. Because of EFL1 and EFL2 all exposed objects (which can be reached
by t) are locked before t accesses or releases objects.
Let s be the first state in which all exposed objects are locked by t. Because of EFL5 no exposed can be
locked by t after s. Hence, π satisfies DL3 . Lemma 2.19 Let π = σ0 , . . . , σk be a sequential execution of a forest-based library. If π satisfies EFL
, then π satisfies DL4
Proof We assume that π satisfies EFL. Let ξ = ht0 , e0 i, . . . , htk−1 , ek−1 i be the schedule of π. Let i be
a number such that, ei is annotated with acquire(x) and x points to a hidden object u at σi , and u is
not locked in σi . We consider the following cases.
Case 1: At σi , u has no predecessors. In this case, there is no path between an exposed object to u.
2.4. E NFORCING DL IN F OREST-BASED L IBRARIES
29
Case 2: At σi , u has at least two predecessors. Because of Lemma 2.17, u is locked in σi . Hence, this
case will never happen.
Case 3: At σi , u has one predecessor. Let p be the predecessor of u at σi . Let j be the maximal number
such that: j ≤ i, and u is in ti ’s immediate scope in state σj , and u is not in ti ’s immediate scope in
state σj−1 .
Because of EFL1 , ti cannot use x.f=y between σj and σi . Hence, ej is annotated with x=y.f and y
points to p at σj . Became of EFL2 , p is not released by t between σj and σi . Hence, p is locked by t at
σi . Proof [for Theorem 2.15] Because of EFL1 , π satisfies DL1 . Because of EFL5 , π satisfies DL2 . Because
of Lemma 2.18, π satisfies DL3 . Because of Lemma 2.19, π satisfies DL4 . 2.4.2
Enforcing EFL
In this section, we present a source-to-source transformation that enforces EFL in a forest-based library.
The idea is to instrument the library such that it counts stack and heap references to objects, and use
these reference counts to determine when to acquire and release locks. Since the EFL conditions are
defined over sequential executions, reasoning about the required instrumentation is fairly simple.
Run-Time Information
The instrumented library tracks objects in the immediate scope of the current
transaction7 by using stack-reference counters; the stack-reference counter of an object u, tracks the
number of references from local variables to u; hence u is in the immediate scope of current transaction
whenever its stack-reference counter is greater than 0. To determine consistency of objects, it uses a
heap-reference counter; the heap-reference counter of an object u, tracks the number of references in
heap objects that point to u; a hidden object is consistent, whenever its heap-counter equals to 0 or 1;
and an exposed object is consistent, whenever its heap-counter equals to 0. To determine whether an
object has been exposed, it uses a boolean field; whenever an object is exposed (returned) by the library,
this field is set to true (in that object).
Locking Strategy The instrumented code uses a strategy that follows EFL conditions. At the beginning of the procedure, the instrumented library acquires all objects that are pointed to by parameters (and
are thus exposed objects). The order in which these objects are locked is determined by using a special
function, unique that returns a unique identifier for each object8 . After locking all exposed objects,
7
Note that we consider sequential executions, so we can assume a single current transaction.
Note that only exposed objects are pointed by the procedure parameters. And according to Definition 2.13 these are the
only exposed objects the transaction will see.
8
30
C HAPTER 2. F INE -G RAINED L OCKING USING S HAPE P ROPERTIES
Operation
Code
if(x!=null) {
acquire(x);
Take(x)
x.stackRef++;
}
if(x!=null) {
x.stackRef-- ;
Drop(x)
if(x.stackRef==0 && IsConsistent(x))
release(x);
}
if(x.isExposed)
IsConsistent(x)
return (x.heapRef == 0);
else
return (x.heapRef <= 1);
MarkExposed(x)
if(x!=null) x.isExposed=true;
Table 2.2: Primitive operations used in the EFL transformation.
1 TakeArgs2(x,y) {
2 if(unique(x) < unique(y))
3
{ Take(x); Take(y); }
4 else
5
{ Take(y); Take(x); }
6}
Figure 2.8: Acquiring two procedure arguments in a unique locking order.
the instrumented library acts as follows: (i) it acquires object u whenever its stack-reference-counter
becomes 1; (ii) it releases object u whenever u is consistent, and its stack-reference-counter becomes 0.
This strategy releases all locks before completion of a transaction (since every object becomes consistent before that point), so it cannot create incompletable sequential executions.
Source-to-Source Transformation
Our transformation instruments each object with three additional
fields: stackRef and heapRef to maintain the stack and heap reference counts (respectively), and
isExposed to indicate whether the object has been exposed. The transformation is based on the prim-
itive operations of Table 2.2.
The procedures Take and Drop maintain stack reference counters and perform the actual locking.
Take(x) locks the object referenced by x and increments the value of its stack reference counter.
Drop(x) decreases the stack reference count of the object referenced by x, and releases its lock if it is
safe to do so according to the EFL protocol, i.e., if the reference from x was the only reference to the
object, and the object is consistent. Drop uses the function IsConsistent which indicates whether an
object is consistent or not (according to its heap-counter and the isExposed field).
2.4. E NFORCING DL IN F OREST-BASED L IBRARIES
31
ASNL(x,ptrExp) {
temp=ptrExp;
x = ptrExp
Take(temp);
Drop(x);
x=temp;
}
ASNF(x.f,ptrExp) {
temp=x.f;
Take(temp);
if(temp!=null) temp.heapRef--;
Drop(temp);
x.f = ptrExp
temp=ptrExp;
Take(temp);
if(temp!=null) temp.heapRef++;
x.f = temp;
Drop(temp);
}
Table 2.3: The macros ASNL and ASNF for pointer assignments enforcing EFL.
For each procedure, of the library, our transformation is performed as follows:
1. At the beginning of the procedure, add code that acquires all objects pointed to by arguments
according to a fixed order; in a case of a single pointer argument l, this can be done by adding
Take(l) (as in line 3 of Figure 2.5); the code of Figure 2.8 demonstrates the case of 2 pointer
arguments; in the general case objects are sorted to obtain the proper order.
2. Replace every assignment of a pointer expression with the corresponding code macros in Table 2.3. The macro ASNL(x,ptrExp) replaces an assignment of a pointer expression ptrExp
to a local pointer x, this macro performs this assignment, while maintaining stack-counters and
following the required locking strategy. The macro ASNF(x.f,ptrExp) replaces an assignment
of a pointer expression to a field of an object, this macro maintains the heap-counters in objects
(its implementation follows the required locking strategy).
3. Whenever a local variable l reaches the end of its scope, add ASNL(l,null); this releases the object pointed by l. If this is the end of the procedure, and l is about to be returned (i.e., by the statement return(l)), then instead of adding ASNL(l,null) add the block {MarkExposed(l);Drop(l);}.
32
C HAPTER 2. F INE -G RAINED L OCKING USING S HAPE P ROPERTIES
1 void AddValues(Node x, Node y) {
2 while(x!=null && y!=null) {
3
x.value+=y.value;
4
x=x.next;
5
y=y.next;
6 }}
Figure 2.9: Example procedure adding the values from one linked-list into another.
1 void AddValues(Node x, Node y) {
2 TakeArgs2(x,y);
3 while(x!=null && y!=null) {
4
x.value+=y.value;
5
ASNL(x,x.next);
6
ASNL(y,y.next);
7 }
8 ASNL(x,null); ASNL(y,null);
9}
Figure 2.10: Transformed code enforcing EFL for the procedure AddValues of Figure 2.9.
Example The procedure of Figure 2.9 takes a pair of pointers to singly-linked lists, and adds values
of one list to the values of the other. Figure 2.10 shows the code transformed to enforce EFL. The
transformed procedure starts with an invocation of TakeArgs2 (shown in Figure 2.8) to lock exposed
objects in a fixed order. In the body of AddValues, the assignment x=x.next is replaced by the macro
ASNL(x,x.next), which assigns x.next to x while maintaining EFL requirements. The assignment
y=y.next is handled in a similar way. At the end of AddValues, local variables go out of scope and
locks are released by adding ASNL(x,null) and ASNL(y,null).
Practical Consideration In some cases, some of our instrumentation code can be avoided. For example, instead of replacing x=null with ASNL(x,null), we could just add Drop(x) before the assignment. Or whenever it is known that a variable will not have a null value, we could avoid the if
statements in Take and Drop.
In libraries where the forestness condition is not violated even temporarily, the heap reference
counter is not needed (since all objects remain consistent during a sequential execution of this transaction).
In many cases, exposed objects can be identified by the types of objects (e.g., List is a type of
exposed objects, and Node is a type of hidden object); in such cases type information can be used
instead of using the isExposed field.
2.4. E NFORCING DL IN F OREST-BASED L IBRARIES
33
1 void move(SkewHeap src, SkewHeap dest) {
2
Node t1, t3, t2;
3
t1=dest.root;
4
t2=src.root;
5
if(t1.key > t2.key) { // assume both heaps are not empty
6
t3=t1; t1=t2; t2=t3;
7
}
8
dest.root=t1;
9
src.root=null;
10
t3=t1.right;
11
while(t3 != null && t2 != null) {
12
t1.right=t1.left;
13
if(t3.key < t2.key) {
14
t1.left=t3; t1=t3; t3=t3.right;
15
}
16
else {
17
t1.left=t2; t1=t2; t2=t2.right;
18
}
19
}
20
if(t3 == null) t1.right=t2;
21
else t1.right=t3;
22 }
Figure 2.11: Moving the content of one Skew-Heap to another Skew-Heap.
Using Static Analysis The shown instrumented code can be optimized by using various static techniques. It is sufficient for such static techniques to consider only sequential executions of the library.
A live-variables analysis [15] can detect local pointers with unused values. Assigning null to such
pointers will eliminate unused pointers, and as a result will release locks earlier.
Some static tools (e.g. [64]) can help avoid some of the instrumentation code. For example, if a tool
can detect that a local variable l is always null at some point of the CFG, our instrumentation code can
avoid calling Take(l) in this case.
2.4.3
Example for Dynamically Changing Forest
As an example for a dynamically changing forest, consider the procedure shown in Figure 2.11. This
procedure operates on two Skew-Heaps [81] (a self-adjusting minimum-heap implemented as a binary
tree). The procedure moves the content of one Skew Heap (pointed by src) to another one (pointed by
dest), by simultaneously traversing the heaps; during its operation, nodes are dynamically moved from
one data structure to another one. Figure 2.12 show its code, after the source transformation.
34
C HAPTER 2. F INE -G RAINED L OCKING USING S HAPE P ROPERTIES
1 void move(SkewHeap src, SkewHeap dest) {
2
Node t1, t3, t2;
3
TakeArgs2(src,dest);
4
ASNL(t1, dest.root);
5
ASNL(t2, src.root);
6
if(t1.key > t2.key) {
7
ASNL(t3,t1); ASNL(t1,t2); ASNL(t2,t3);
8
}
9
ASNF(dest.root, t1);
10
ASNL(dest, null); // dest becomes dead
11
ASNF(src.root, null);
12
ASNL(src, null); // src becomes dead
13
ASNL(t3, t1.right);
14
while(t3 != null && t2 != null) {
15
ASNF(t1.right, t1.left);
16
if(t3.key < t2.key) {
17
ASNF(t1.left, t3); ASNL(t1, t3); ASNL(t3, t3.right);
18
}
19
else {
20
ASNF(t1.left, t2); ASNL(t1, t2); ASNL(t2, t2.right);
21
}
22
}
23
if(t3 == null) ASNF(t1.right, t2);
24
else ASNF(t1.right, t3);
25
ASNL(t1, null); ASNL(t2, null); ASNL(t3, null);
26 }
Figure 2.12: Moving Skew Heaps with automatic fine-grained locking.
2.5. P ERFORMANCE E VALUATION
2.5
35
Performance Evaluation
We evaluate the performance of our technique on several benchmarks. For each benchmark, we compare
the performance of the benchmark using fine-grained locking automatically generated using our technique to the performance of the benchmark using a single coarse-grained lock. We also compare some
of the benchmarks to versions with hand-crafted fine-grained locking. For some benchmarks, manually
adding fine-grained locking turned out to be too difficult even for concurrency experts.
In our experiments, we consider 5 different benchmarks: two balanced search-tree data structures, a
self-adjusting heap data structure, and two specialized tree-structures (which are tailored to their application).
Two different machines have been used for our experiments. The first machine is an Intel i7 machine
with 8 hardware threads (one quad-core i7 CPU, each core with two hardware threads). The second is a
Sun SPARC enterprise T5140 machine with 64 hardware threads (two eight-core CPUs, each core with
four hardware threads).
2.5.1
General Purpose Data Structures
Balanced Search-Trees
We consider two Java implementations of balanced search trees: a Treap [16], and a Red-Black Tree
with a top-down balancing [23, 46]. For both balanced trees, we consider the common operations of
insert, remove and lookup.
Methodology We follow the evaluation methodology of Herlihy et al. [50], and consider the data
structures under a workload of 20% inserts, 10% removes, and 70% lookups. The keys are generated
from a random uniform distribution between 1 and 2 × 106 . To ensure consistent and accurate results,
each experiment consists of five passes; the first pass warms up the VM9 and the four other passes are
timed. Each experiment was run four times and the arithmetic average of the throughput is reported as
the final result.
Every pass of the test program consists of each thread performing one million randomly chosen
operations on a shared data structure; a new data structure is used for each pass.
Evaluation For both search trees, we compare the results of our automatic locking to a coarse-grained
global lock. For the Treap, we also consider a version with manual hand-over-hand locking. Enforcing
hand-over-hand locking for the Treap is challenging because after a rotation, the next thread to traverse
a path will acquire a different sequence of locks. Assuring the absence of deadlock under different
acquisition orders is challenging.
9
Java virtual machine.
36
C HAPTER 2. F INE -G RAINED L OCKING USING S HAPE P ROPERTIES
Throughput (ops/msec)
Single
Manual hand-over-hand
Automatic
1800
1600
1400
1200
1000
800
600
400
200
0
1
2
4
Threads
6
8
Figure 2.13: Throughput for a Treap on the Intel machine with 70% lookups, 20% inserts and 10%
removes.
Throughput (ops/msec)
Single
Manual hand-over-hand
Automatic
1800
1600
1400
1200
1000
800
600
400
200
0
1
2
4
8
Threads
16
32
64
Figure 2.14: Throughput for a Treap on the SPARC machine with 70% lookups, 20% inserts and 10%
removes.
For the Red-Black Tree, the task of manually adding fine-grained locks proved to be too challenging
and error prone. Rotations and deletions are much more complicated than in a Treap. Previous work on
fine-grained locking for these trees alters the tree invariants and algorithm, as in [74].
Figure 2.13 and Figure 2.14 show results for the Treap. On the Intel machine (Figure 2.13), our
automatic locking scales as well as the manual hand-over-hand locking. On the SPARC machine (Figure 2.14), the manual hand-over-hand locking is more efficient than ours locking; they both scale up to
32 threads. The degradation in the performance of the SPARC machine for 64 threads, can be explained
by cross-chip latency and cache invalidations, since only the 64 threads experiment spans more than one
chip. In both machines, starting from 2 threads, the fine-grained approaches outperform the single-lock
synchronization.
Figure 2.15 and Figure 2.16 show results for the Red-Black Tree. On the Intel machine (Figure 2.15),
our automatic locking scales up to 8 threads. On the SPARC machine (Figure 2.16) it scales up to 16
threads. In both machines, starting from 4 threads, our automatic locking outperforms the single-lock
synchronization.
2.5. P ERFORMANCE E VALUATION
37
Single
Automatic
Throughput (ops/msec)
1400
1200
1000
800
600
400
200
0
1
2
4
Threads
6
8
Figure 2.15: Throughput for a Red-Black Tree on the Intel machine with 70% lookups, 20% inserts
and 10% removes.
Single
Automatic
Throughput (ops/msec)
600
500
400
300
200
100
0
1
2
4
8
Threads
16
32
64
Figure 2.16: Throughput for a Red-Black Tree on the SPARC machine with 70% lookups, 20% inserts
and 10% removes.
38
C HAPTER 2. F INE -G RAINED L OCKING USING S HAPE P ROPERTIES
Throughput (ops/msec)
Single
Automatic
1800
1600
1400
1200
1000
800
600
400
200
0
1
2
4
Threads
6
8
Figure 2.17: Throughput for a Skew Heap on the Intel machine with 50% inserts and 50% removeMin.
Throughput (ops/msec)
Single
1
Figure 2.18:
Automatic
800
700
600
500
400
300
200
100
0
2
4
8
Threads
16
32
64
Throughput for a Skew Heap on the SPARC machine with 50% inserts and 50% re-
moveMin.
Self-Adjusting Heap
We consider a Java implementation of a Skew Heap [23, 81], which is a self-adjusting heap data structure. We consider the operations of insert and removeMin.
We use the same evaluation methodology we used for the search trees. Here we consider a workload
of 50% inserts and 50% removes on a heap initialized with one million elements. We compare the
results of our automatic locking to a coarse-grained global lock.
The results are shown in Figure 2.17 and Figure 2.18. On the Intel machine (Figure 2.17), our
automatic locking scales up to 6 threads. On the SPARC machine (Figure 2.18) it scales up to 16
threads. Here, in both machines, starting from 4 threads, our automatic locking is faster than the singlelock approach.
2.5.2
Specialized Implementations
To illustrate the applicability of our technique to specialized data structures (which are tailored to their
application), we consider Java implementation of Barnes-Hut algorithm [18], and a C++ implementation
2.5. P ERFORMANCE E VALUATION
Single
39
Original hand-over-hand (manual)
Automatic
Normalized Time
120%
100%
80%
60%
40%
20%
0%
1
2
4
8
Threads
Figure 2.19: Apriori (on the Intel machine): normalized time of hash-tree construction.
of the Apriori Data-Mining algorithm [14] from [72].
Apriori In this application, a number of threads concurrently build a Hash-Tree data structure (a tree
data structure in which each node is either a linked-list or a hash-table). The original application uses
customized hand-over-hand locking tailored to this application. We evaluate the performance of our
locking relative to this specialized manual locking and to a single global lock. We show that our locking
performs as well as the specialized manual locking scheme in the original application.
In the experiments, we measured the time required for the threads to build the Hash-Tree. Figure 2.19 shows the speedup of the original hand-crafted locking, and our locking over a single lock.10
For 2 and 4 threads, the speedup of our locking is almost as good as the original manual locking. In
the case of 8 threads it performs better than the original locking (around 30% faster). Both have a small
overhead in the case of a single thread (around 4% slower).
Barnes-Hut The Barnes-Hut algorithm simulates the interaction of a system of bodies (such as galaxies or particles) and is built from several phases. Its main data structure is an OCT-Tree. We parallelized
the Construction-Phase in which the OCT-Tree is built, and used our technique for synchronization. We
measured the benefit gained by our locking.
In the experiments, we measured the time required for threads to build the OCT-tree (i.e., the
Construction-Phase). Figure 2.20 and Figure 2.21 show the results. In both machines, after 4 threads
both fine grained locking approaches are fully scalable. However, as in the previous results, for small
number of threads the fine grained approaches are still slower than the sequential version (probably,
because of their synchronization overhead).
10
This benchmark has only been performed on the Intel machine because of compatibility problems with the environment
installed in the SPARC machine.
40
C HAPTER 2. F INE -G RAINED L OCKING USING S HAPE P ROPERTIES
Sequential
Manual hand-over-hand
1
2
Automatic
Normalized Time
250%
200%
150%
100%
50%
0%
4
Threads
6
8
Figure 2.20: Barnes-Hut (on the Intel machine): normalized time of OCT-Tree construction.
Normalized Time
Sequential
Manual hand-over-hand
Automatic
160%
140%
120%
100%
80%
60%
40%
20%
0%
1
2
4
8
Threads
16
32
64
Figure 2.21: Barnes-Hut (on the SPARC machine): normalized time of OCT-Tree construction.
Chapter 3
Transactional Libraries with Foresight
Linearizable libraries (such as the ones produced by the approach in Chapter 2) shield programmers
from the complexity of concurrency. Indeed, modern programming languages such as Java, Scala,
and C# provide a large collection of linearizable libraries — these libraries provide operations that are
guaranteed to be atomic, while hiding the complexity of the implementation from clients. Unfortunately,
clients often need to perform a sequence of library operations that appears to execute atomically. In the
sequel, we refer to such a sequence as an atomic composite operation.
The problem of realizing atomic composite operations is an important and widespread one [24, 78].
Atomic composite operations are a restricted form of software transactions [48]. However, generalpurpose software transaction implementations have not gained acceptance [27, 34, 36, 69, 84, 89] due to
high runtime overhead, poor performance, and limited ability to handle irreversible operations (such as
I/O operations). Programmers typically realize such composite operations using ad-hoc synchronization
leading to many concurrency bugs in practice (e.g., [78]).
Transactional Libraries with Foresight In this chapter, we address the problem of extending a linearizable library [55] to allow clients to execute an arbitrary composite operation atomically. Our basic
methodology requires the client code to demarcate the sequence of operations for which atomicity is
desired and provide declarative information to the library (foresight) about the library operations that
the composite operation may invoke (as illustrated later). It is the library’s responsibility to ensure the
desired atomicity, exploiting the foresight information for effective synchronization.
We first present a formalization of this approach. We formalize the desired goals and present a
sufficient correctness condition. As long as the clients and the library extension satisfy the correctness
condition, all composite operations are guaranteed atomicity without deadlocks. Furthermore, our condition does not require the use of rollbacks. Our sufficiency condition is broad and permits a range
of implementation options and fine-grained synchronization. It is based on a notion of dynamic rightmovers, which generalizes traditional notions of static right-movers and commutativity [61, 66].
Our formulation decouples the implementation of the library from the client. Thus, the correctness
41
42
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
of the client does not depend on the way the foresight information is used by library implementation.
The client only needs to ensure the correctness of the foresight information.
Automatic Foresight for Clients We then present a simple static analysis to infer calls (in the client
code) to the API used to pass the foresight information. Given a description of a library’s API, our
algorithm conservatively infers the required calls. This relieves the client programmer of this burden
and simplifies writing atomic composite operations.
Library Extension Realization Our approach permits the use of customized, hand-crafted, implementations of the library extension. However, we also present a generic technique for extending a linearizable
library with foresight. The technique is based on a novel variant of the tree locking protocol in which
the tree is designed according to semantic properties of the library’s operations.
We used our generic technique to implement a general-purpose Java library for Map data structures.
Our library permits composite operations to simultaneously work with multiple instances of Map data
structures.
Experimental Evaluation We use the Maps library and the static analysis to enforce atomicity of a
selection of real-life Java composite operations, including composite operations that manipulate multiple
instances of Map data structures. Our experiments indicate that our approach enables realizing efficient
and scalable synchronization for real-life composite operations.
Main Contributions Of This Chapter We develop the concept of transactional libraries with foresight
along several dimensions, providing the theoretical foundations, an implementation methodology, and
an empirical evaluation. Our main contributions are:
• We introduce the concept of transactional libraries with foresight, in which the library ensures
atomicity of composite operations by exploiting information (foresight) provided by its clients.
The main idea is to shift the responsibility of synchronizing composite operations of the clients to
the hands of the library, and have the client provide useful foresight information to make efficient
library-side synchronization possible.
• We define a sufficient correctness condition for clients and the library extension. Satisfying this
condition guarantees atomicity and deadlock-freedom of composite operations (Section 3.3).
• We show how to realize both the client-side (Section 3.4) and the library-side (Section 3.5) for
leveraging foresight. Specifically, we present a static analysis algorithm that provides foresight
information to the library (Section 3.4), and show a generic technique for implementing a family
of transactional libraries with foresight (Section 3.5).
• We realized our approach and evaluated it on a number of real-world composite operations. We
show that our approach provides efficient and scalable synchronization (Section 3.6).
3.1. OVERVIEW
43
int value = I;
void Inc() { atomic { value=value+1; } }
void Dec() { atomic { if (value > 0) then value=value-1; } }
int Get() { atomic { return value; } }
Figure 3.1: Specification of the Counter library. I denotes the initial value of the counter.
1 /* Thread T1 */
2 /* @atomic */ {
1 /* Thread T2 */
2 /* @atomic */ {
3
@mayUseInc()
3
@mayUseDec()
4
Inc();
4
Dec();
5
Inc();
5
Dec();
6
@mayUseNone()
6
@mayUseNone()
7}
7}
Figure 3.2: Simple compositions of counter operations.
3.1
Overview
We now present an informal overview of our approach, for extending a linearizable library into a transactional library with foresight, using a toy example. Figure 3.1 presents the specification of a single
Counter (library). The counter can be incremented (via the Inc() operation), decremented (via the
Dec() operation), or read (via the Get() operation). The counter’s value is always nonnegative: the
execution of Dec() has an effect only when the counter’s value is positive. All the counter’s procedures
are atomic.
Figure 3.2 shows an example of two threads each executing a composite operation: a code fragment
consisting of multiple counter operations. (The mayUse annotations will be explained later.) Our goal
is to execute these composite operations atomically: a serializable execution of these two threads is one
that is equivalent to either thread T1 executing completely before T2 executes or vice versa. Assume that
the counter value is initially zero. If T2 executes first, then neither decrement operation will change the
counter value, and the subsequent execution of T1 will produce a counter value of 2. If T1 executes first
and then T2 executes, the final value of the counter will be 0. Figure 3.3 shows a slightly more complex
example.
3.1.1
Serializable and Serializably-Completable Executions
Figure 3.4 shows prefixes of various interleaved executions of the code shown in Figure 3.2 for an initial
counter value of 0. Nodes are annotated with the values of the counter. Bold double circles depict
non-serializable nodes: these nodes denote execution prefixes that are not serializable (and thus need
to be avoided by proper synchronization). E.g., node #18 is a non-serializable node since it represents
44
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
1 /* Thread T1 */
2 /* @atomic */ {
3
@mayUseAll()
4
c = Get();
1 /* Thread T2 */
2 /* @atomic */ {
5
@mayUseInc()
3
@mayUseDec()
6
while (c > 0) {
4
Dec();
c = c-1;
5
Dec();
Inc();
6
@mayUseNone()
7
8
9
7}
}
10
@mayUseNone()
11 }
Figure 3.3: Compositions of counter dependent operations.
#1
value=0
T1
#2
value=1
T1
T2
#3
value=0
T2
T1
#4
value=2
#5
value=0
#6
value=1
T2
T1
T2
T1
T2
#7
value=0
T2
T1
#8
value=1
#9
value=1
#10
value=0
#11
value=2
#12
value=0
#13
value=1
T2
T2
T1
T2
T1
T1
#14
value=0
#15
value=0
#16
value=1
#17
value=1
#18
value=1
#19
value=2
Figure 3.4: Execution prefixes of the code shown in Figure 3.2, for a counter with I = 0. Each node
represents a prefix of an execution; a leaf node represents a complete execution.
3.1. OVERVIEW
45
the non-serializable execution T2 .Dec(); T1 .Inc(); T2 .Dec(); T1 .Inc() (which produces a final counter
value of 1).
Bold single circles depict doomed nodes: once we reach a doomed node, there is no way to order the
remaining operations in a way that achieves serializability. E.g., node #6 is a doomed node since it only
leads to non-serializable complete executions (represented by nodes #17 and #18). Finally, dashed
circles depict safe nodes, which represent serializably-completable executions. We formalize this notion
later, but safe nodes guarantee that the execution can make progress, while ensuring serializability.
Our goal is to ensure that execution stays within safe nodes. Even in this simple example, the set
of safe nodes and, hence, the potential for parallelism depends on the initial value I of the counter. For
I ≥ 1 all nodes are safe and thus no further synchronization is necessary. Using our approach enables
realizing all available parallelism in this example (for every I ≥ 0), while avoiding the need for any
backtracking (i.e., rollbacks).
3.1.2
Serializably-Completable Execution: A Characterization
We now present a characterization of serializably-completable executions based on a generalization of
the notion of static right movers [66]. We restrict ourselves to executions of two threads here, but our
later formalization considers the general case.
We define an operation o by a thread T to be a dynamic right-mover with respect to thread T 0 after
an execution p, iff for any sequence of operations s executed by T 0 , if p; T.o; s is feasible, then p; s; T.o
is feasible and equivalent to the first execution. Given an execution ξ, we define a relation @ on the
threads as follows: T @ T 0 if ξ contains a prefix p; T.o such that o is not a dynamic right-mover with
respect to T 0 after p. As shown later, if @ is acyclic, then ξ is a serializably-completable execution (as
long as every sequential execution of the threads terminates).
In Figure 3.4, node #2 represents the execution prefix T1 .Inc() for which T1 @ T2 .This is because
T2 .Dec() is a possible suffix executed by T2 , and T1 .Inc(); T2 .Dec() is not equivalent to T2 .Dec(); T1 .Inc().
On the other hand, node #5 represents the execution prefix T1 .Inc(); T2 .Dec() for which T2 6@ T1 .This
execution has one possible suffix executed by T1 (i.e., T1 .Inc()), and the execution T1 .Inc(); T2 .Dec(); T1 .Inc()
is equivalent to the execution T1 .Inc(); T1 .Inc(); T2 .Dec().
Observe that the relation @ corresponding to any non-serializable or doomed node has a cycle, while
it is acyclic for all safe nodes.
Note that the use of a dynamic (i.e., state-dependent) right-mover relation is critical to a precise
characterization above. E.g., Inc and Dec are not static right-movers with respect to each other.
46
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
3.1.3
Synchronization Using Foresight
We now show how we exploit the above characterization to ensure that an interleaved execution stays
within safe nodes.
Foresight. A key aspect of our approach is to exploit knowledge about the possible future behavior of
composite operations for more effective concurrency control. We enrich the interface of the library with
operations that allow the composite operations to assert temporal properties of their future behavior. In
the ”Counter” example, assume that we add the following operations:
• mayUseAll(): indicates transaction may execute arbitrary operations in the future.
• mayUseNone(): indicates transaction will execute no more operation.
• mayUseDec(): indicates transaction will invoke only Dec operations in the future.
• mayUseInc(): indicates transaction will invoke only Inc operations in the future.
The code in Figure 3.2 is annotated with calls to these operations in a straightforward manner. The code
shown in Figure 3.3, is conservatively annotated with a call to mayUseAll() since the interface does not
provide a way to indicate that the transaction will only invoke Get and Inc operations.
Utilizing Foresight. We utilize a suitably modified definition of the dynamic right mover relation, where
we check for the right mover condition only with respect to the set of all sequences of operations the
other threads are allowed to invoke (as per their foresight assertions). To utilize foresight information,
a library implementation maintains a conservative over-approximation @0 of the @ relation. The implementation permits an operation to proceed iff it will not cause the relation @0 to become cyclic (and
blocks the operation otherwise until it is safe to execute it). This is sufficient to guarantee that the
composite operations appear to execute atomically, without any deadlocks.
We have created an ad-hoc implementation of the counter that (implicitly) maintains a conservative over-approximation @0 (see Figure 3.5). Our implementation permits all serializably-completable
execution prefixes for the example shown in Figure 3.2 (for every I ≥ 0). Our implementation also
provides high degree of parallelism for the example shown in Figure 3.3 — for this example the loop of
T1 can be executed in parallel to the execution of T2 .
Fine-grained Foresight. We define a library operation to be a tuple that identifies a procedure as
well as the values of the procedure’s arguments. For example, removeKey(1) and removeKey(2)
are two different operations of a library with the procedure removeKey(int k). In order to distinguish between different operations which are invoked using the same procedure, a mayUse procedure (which is used to pass the foresight information) can have parameters. For example, a library
that represents a single Map data structure can have a mayUse procedure mayUseKey(int k), where
mayUseKey(1) is defined to refer to all operations on key 1 (including, for example, removeKey(1)),
and mayUseKey(2) is defined to refer to all operations on key 2 (including, for example, removeKey(2)).
Special Cases. Our approach generalizes several ideas that have been proposed before. One exam-
3.1. OVERVIEW
47
Counter c ; // an internal counter
ReadWriteLock zeroLock; ReadWriteLock getLock; // two read-write locks
void mayUseAll() {
if(!holdLock(getLock) && !holdLock(zeroLock)) {
acquireWrite(getLock); acquireWrite(zeroLock);
}
}
void mayUseInc() {
if(!holdLock(getLock) && !holdLock(zeroLock)) {
acquireRead(getLock); acquireRead(zeroLock);
}
else if(holdLockInWriteMode(getLock) && holdLockInWriteMode(zeroLock)) {
downgrade(getLock); downgrade(zeroLock);
}
}
void mayUseDec() {
if(!holdLock(getLock) && !holdLock(zeroLock)) acquireRead(getLock);
}
void mayUseNone() {
if(holdLock(getLock)) releaseLock(getLock);
if(holdLock(zeroLock)) releaseLock(zeroLock);
}
void Inc() {
assert(holdLock(getLock) && holdLock(zeroLock));
c.Inc();
}
void Dec() {
assert(holdLock(getLock) && !holdLockInReadMode(zeroLock));
if(c.Get()<numOfDecs && !holdLockInWriteMode(zeroLock)) acquireWrite(zeroLock);
c.Dec();
}
int Get() {
assert(holdLockInWriteMode(getLock) && holdLockInWriteMode(zeroLock));
return c.Get();
}
Figure 3.5:
Pseudo code of an ad-hoc implementation of a transactional Counter with foresight. This
implementation ensures acyclicity of the @ relation as long as the mayUse (foresight) information is correct. It is based on an internal atomic Counter, and read-write locks. In the code, acquireWrite(l)
acquires l in a write-mode, and acquireRead(l) acquires l in a read-mode; if lock l is already
held by the current thread in a write-mode, then downgrade(l) downgrades l from write-mode to readmode.
holdLockInWriteMode(l) returns true iff l is held by the current thread in a write-mode.
holdLockInReadMode(l) returns true iff l is held by the current thread in a read-mode. holdLock(l)
returns true iff either holdLockInWriteMode(l)==true or holdLockInReadMode(l)==true. The
value of numOfDecs is equal to the number of threads that are currently executing the procedure Dec() (for
brevity we omit the code that updates numOfDecs).
48
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
ple, from databases, is locking that is based on operations-commutativity (e.g., see [20, chapter 3.8]).
Such locking provides several lock modes where each mode corresponds to a set of operations; two
threads are allowed to execute in parallel as long as they do not hold lock modes that correspond to
non-commutative operations. A simple common instance is a read-write lock [30], in which threads
are allowed to simultaneously hold locks in a read-mode (which corresponds with read-only operations
that are commutative with each other). Interestingly, the common lock-acquire and lock-release
operations used for locking, can be seen as special cases of the procedures used to pass the foresight
information.
Another example is shared-ordered locking [13]. This locking allows threads to simultaneously hold
lock-modes that correspond with non-commutative operations. Their implementation ensures atomicity
by guaranteeing that the actual execution is equivalent to a non-interleaved execution in which the same
threads acquire the locks in the same order.
3.1.4
Realizing Foresight Based Synchronization
What we have described so far is a methodology for foresight-based concurrency control. This prescribes the conditions that must be satisfied by the clients and library implementations to ensure atomicity for composite operations.
Automating Foresight For Clients. One can argue that adding calls to mayUse operations is an error
prone process. Therefore, in Section 3.4 we show a simple static analysis algorithm which conservatively infers calls to mayUse operations (given a description of the mayUse operations supported by the
library). Our experience indicates that our simple algorithm can handle real-life programs.
Library Implementation. We permit creating customized, hand-crafted, implementations of the library
extension (e.g., Figure 3.5). However, in order to simplify creating such libraries, we present a generic
technique for implementing a family of libraries with foresight (Section 3.5). The technique is based on
a novel variant of the tree locking protocol in which the tree is designed according to semantic properties
of the library’s operations. We have utilized the technique to implement a general purpose Java library
for Map data structures.
3.2
3.2.1
Preliminaries
Libraries
A library A exposes a set of procedures PROCSA . We define a library operation to be a tuple (p, v1 , · · · , vk )
consisting of a procedure name p and a sequence of values (representing actual values of the procedure
arguments). The set of operations of a library A is denoted by OPA . Library operations are invoked by
client threads (defined later). Let T denote the set of all thread identifiers. An event is a tuple (t, m, r),
3.2. P RELIMINARIES
49
Library Designer
Library
specification
Programmer
Composite operations
static analysis
Composite operations
+ mayUse operations
atomic composite
operations
Figure 3.6: Overview of our approach for foresight based synchronization.
where t is a thread identifier, m is a library operation, and r is a return value. An event captures both an
operation invocation as well as its return value.
A history is defined to be a finite sequence of events. The semantics of a library A is captured by a
set of histories HA — if h ∈ HA , then we say that h is feasible for A. Histories capture the interaction
between a library and its client (a set of threads). Though multiple threads may concurrently invoke
operations, this simple formalism suffices in our setting, since we assume the library to be linearizable.
An empty history is an empty sequence of events.
Let h ◦ h0 denote the concatenation of history h0 to the end of history h. Note that the set HA
captures multiple aspects of the library’s specification. If h is feasible, but h ◦ (t, m, r) is not, this could
mean one of three different things: r may not be a valid return value in this context, or t is not allowed
to invoke m is this context, or t is allowed to invoke m in this context, but the library will block and not
return until some other event has happened.
A library A is said to be total if for any thread t, operation m ∈ OPA and h ∈ HA , there exists r
such that h ◦ (t, m, r) ∈ HA .
A library A is said to be deterministic if the following is satisfied:
∀h, t, m, r, r0 : h ◦ (t, m, r) ∈ HA ∧ h ◦ (t, m, r0 ) ∈ HA =⇒ r = r0
3.2.2
Clients
Syntax and Informal Semantics
A client t1 || t2 || · · · || tn consists of the parallel composition
of a set of sequential client programs ti (also referred to as threads). Each thread ti is represented
by the control-flow graph CFGti . The edges of a control-flow graph are annotated with client instructions, shown in Figure 3.7. Conditionals are encoded by annotating control-flow edges with assume
statements. A library operation is used by invoking a procedure (by using the client instruction “x =
p(x1,...,xk)”). All variables referenced in a thread ti are private to ti (i.e., they are thread-local
50
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
stms ::= skip
| x = exp
| assume(x)
| x = p(x1,...,xk)
Figure 3.7: Client instructions. x, x1,...,xk stand for a local variables, exp stands for an expression over
local variables, and p stands for a procedure name.
Client Instruction
Transition
hk, vi⇒i
skip
Side Condition
hk 0 , vi
x = exp(x1 , . . . , xn ) hk, vi⇒i hk 0 , v[x 7→ [[exp]](v(x1 ), . . . , v(xn ))]i
assume(x)
hk, vi⇒i hk 0 , vi
x = p(x1 , . . . , xn )
hk, vi ⇒
(ti ,m,r)
0
i hk , v[x
v(x) = true
7→ r]i
m = (p, v(x1 ), . . . , v(xn ))
Table 3.1: The relation ⇒i . exp(x1 , . . . , xn ) stands for an expression over local variables x1 , . . . , xn .
In all cases we assume that the edge (k, k 0 ) ∈ CFGti is annotated with the client instruction.
variables). Thus, the threads have no shared state except the (internal) state of the library, which is
accessed or modified only via library operations.
A CFG-node may have several outgoing edges (this enables writing programs with conditional
branches, and programs with nondeterministic choices). For simplicity, we assume that if node u has
two (or more) outgoing edges, then each one of these edges is annotated with either a skip or an
assume instruction.
Each control-flow graph has two distinguished nodes: an entry site from which the thread starts,
and an exit site in which the thread ends. The entry site has no incoming edges, and the exit site has no
outgoing edges.
Semantics The semantics [[ti ]] of a single thread ti is defined to be a labelled transition system (Σi , ⇒i )
over a set of thread-local states Σi .
A local state s = hk, vi ∈ Σi of a thread ti is a pair: k is the value of the ti ’s program counter (a
control-flow graph node), and v is a function from local variables to values.
Let EVi be the set of all events executed by thread ti (i.e., EVi = {(t, m, r) | t = ti }). The behavior
of ti is described by using the relation ⇒i ⊆ Σi × (EVi ∪ {}) × Σi . This relation is defined in Table 3.1.
The execution of any instruction other than a library operation invocation is represented by a (thread
local) transition σ ⇒i σ 0 . The execution of a library operation invocation is represented by a transition
e
of the form σ ⇒i σ 0 , where event e captures both the invocation as well as the return value. Note that
3.2. P RELIMINARIES
51
this semantics captures the semantics of the “open” program ti . When ti is “closed” by combining it
with a library A, the semantics of the resulting closed program is obtained by combining [[ti ]] with the
semantics of A, as illustrated later.
An initial state is a state in which the thread location is at its entry site. A final state is a state in
which the thread location is at its exit site. Let s0 be an initial state of ti ; a ti -execution is defined to
a
a
a
be a sequence of ti -transitions s0 ⇒1i s1 , s1 ⇒2i s2 , · · · , sk−1 ⇒ki sk such that every aj is either or an
event. Such execution is said to be complete if sk is a final state of ti .
The semantics of a client C = t1 || · · · || tn is obtained by simply composing the semantics of the
individual threads, permitting any arbitrary interleaving of the executions of the threads.
We define the set of transitions of C to be the disjoint union of the set of transitions of the individual
threads.
A C-execution is defined to be a sequence ξ of C-transitions such that each ξ | ti is a ti -execution,
where ξ | ti is the subsequence of ξ consisting of all ti -transitions.
We now define the semantics of the composition of a client C with a library A. Given a C-execution
ξ, we define φ(ξ) to be the sequence of event labels in ξ. The set of (C, A)-executions is defined to be
the set of all C-executions ξ such that φ(ξ) ∈ HA . We abbreviate “(C, A)-execution” to execution if no
confusion is likely.
Threads as Transactions Our goal is to enable threads to execute code fragments containing multiple
library operations as atomic transactions (i.e., in isolation). For notational simplicity, we assume that
we wish to execute each thread as a single transaction. (Our results can be generalized to the case where
each thread may wish to perform a sequence of transactions.) In the sequel, we may think of threads
and transactions interchangeably. This motivates the following definitions.
Non-Interleaved and Sequential Executions An execution ξ is said to be a non-interleaved execution
if for every thread t all t-transitions in ξ appear contiguously. Thus, a non-interleaved execution ξ is of
the form ξ1 , · · · , ξk , where each ξi represents a different thread’s (possibly incomplete) execution. Such
a non-interleaved execution is said to be a sequential execution if for each 1 ≤ i < k, ξi represents a
complete thread execution.
Serializability Two executions ξ and ξ 0 are said to be equivalent iff for every thread t, ξ | t = ξ 0 | t. An
execution ξ is said to be serializable iff it is equivalent to some non-interleaved execution.
Serializably Completable Executions For any execution ξ, let W(ξ) denote the set of all threads that
have at least one transition in ξ. An execution ξ is said to be a complete execution iff ξ | t is complete
for every thread t ∈ W(ξ). A client execution ξ is completable if ξ is a prefix of a complete execution
ξc such that W(ξ) = W(ξc ). An execution ξ is said to be serializably completable iff ξ is a prefix of
a complete serializable execution ξc such that W(ξ) = W(ξc ). Otherwise, we say that ξ is a doomed
execution.
52
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
An execution may be incompletable due to problems in a client thread (e.g., a non-terminating loop)
or due to problems in the library (e.g., blocking by a library procedure leading to deadlocks).
3.3
Foresight-Based Synchronization
We now formalize our goal of extending a base library B into a foresight-based library E that permits
clients to execute arbitrary composite operations atomically.
3.3.1
The Problem
Let B be a given library. (Note that B can also be considered to be a specification.) We say that a library
E is an extension of B if (i) PROCSE ⊃ PROCSB , (ii) {h ↓ B | h ∈ HE } ⊆ HB , where h ↓ B is the
subsequence of events in h that represent calls of operations in OPB , and (iii) PROCSE \ PROCSB do
not have a return value1 . We are interested in extensions where the extension procedures (PROCSE \
PROCSB ) are used for synchronization to ensure that each thread appears to execute in isolation.
Given a client C of the extended library E, let C ↓ B denote the program obtained by replacing every
extension procedure invocation in C by the skip statement. Similarly, for any execution ξ of (C, E),
we define ξ ↓ B to be the sequence obtained from ξ by omitting transitions representing extension
procedures. We say that an execution ξ of (C, E) is B-serializable if ξ ↓ B is a serializable execution of
(C ↓ B, B). We say that ξ is B-serializably-completable if ξ ↓ B is a serializably completable execution
of (C ↓ B, B). We say that E is a transactional extension of B if for any (correct) client C of E, every
(C, E)-execution is B-serializably-completable. Our goal is to build transactional extensions of a given
library.
3.3.2
The Client Protocol
In our approach, the extension procedures are used by transactions (threads) to provide information to
the library about the future operations they may perform. We refer to procedures in PROCSE \ PROCSB
as mayUse procedures, and to operations in MU E = OPE \OPB as mayUse operations. We now formalize
the client protocol, which captures the preconditions the client must satisfy, namely that the foresight
information provided via the mayUse operations must be correct.
The semantics of mayUse operations is specified by a function mayE : MU E 7→ P(OPB ) that maps
every mayUse operation to a set of base library operations. In Section 3.4 we show simple procedure
annotations that can be used to define the set MU E and the function mayE .
The mayUse operations define an intention-function IE : HE × T 7→ P(OPB ) where IE (h, t)
1
It means that they always return the value ”void”; the value ”void” is never used in expressions and never passed as a
procedure argument.
3.3. F ORESIGHT-BASED S YNCHRONIZATION
53
represents the set of (base library) operations thread t is allowed to invoke after the execution of h. For
every thread t ∈ T and a history h ∈ HE , the value of IE (h, t) is defined as follows. Let M denote the
set of all mayUse operations invoked by t in h. ( I ) IF M IS EMPTY, THEN IE (h, t) = OPB . ( II ) IF M
T
IS NON - EMPTY, THEN IE (h, t) = m∈M mayE (m). We extend the notation and define IE (h, T ), for
S
any set of threads T , to be t∈T IE (h, t).
Note that, the intention set IE (h, t) can only shrink as the execution proceeds. The mayUse operations cannot be used to increase the intention set of a thread.
Definition 3.1 (Client Protocol) Let h be a history of library E. We say that h follows the client protocol if for any prefix h0 ◦ (t, m, r) of h, we have m ∈ IE (h0 , t) ∪ MUE .
We say that an execution ξ follows the client protocol, if φ(ξ) follows the client protocol.
3.3.3
Dynamic Right Movers
We now consider how the library extension can exploit the foresight information provided by the client
to ensure that the interleaved execution of multiple threads is restricted to safe nodes (as described in
Section 3.1). First, we formalize the notion of a dynamic right mover.
Given a history h of a library A, we define the set EA [h] to be {h0 | h ◦ h0 ∈ HA }. (Note that
if h is not feasible for A, then EA [h] = ∅.) Note that if EA [h1 ] = EA [h2 ], then the concrete library
states produced by h1 and h2 cannot be distinguished by any client (using any sequence of operations).
Dually, if the concrete states produced by histories h1 and h2 are equal, then EA [h1 ] = EA [h2 ].
Definition 3.2 (Dynamic Right Movers) Given a library A, a history h1 is said to be a dynamic right
mover with respect to a history h2 in the context of a history h, denoted h : h1 .A h2 , iff
EA [h ◦ h1 ◦ h2 ] ⊆ EA [h ◦ h2 ◦ h1 ].
An operation m is said to be a dynamic right mover with respect to a set of operations Ms in the context
of a history h, denoted h : m .A Ms , iff for any event (t,m,r) and any history hs consisting of operations
in Ms , we have h : (t, m, r) .A hs .
Properties of Dynamic Right Movers The following example shows that an operation m can be a
dynamic right mover with respect to a set M after some histories but not after some other histories.
Example 3.3.1 Consider the Counter described in Section 3.1. Let hp be a history that ends with a
counter value of p > 0. The operation Dec is a dynamic right mover with respect to the set {Inc} in
the context of hp since for every n the histories hp ◦ (t, Dec, r) ◦ (t1 , Inc, r1 ), . . . , (tn , Inc, rn ) and
hp ◦ (t1 , Inc, r1 ), . . . , (tn , Inc, rn ) ◦ (t, Dec, r) have the same set of suffixes (since the counter value is
p − 1 + n after both histories).
54
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
Let h0 be a history that ends with a counter value of 0. The operation Dec is not a dynamic right
mover with respect to the set {Inc} in the context of h0 since after a history h0 ◦(t, Dec, r)◦(t0 , Inc, r0 )
the counter’s value is 1, and after h0 ◦ (t0 , Inc, r0 ) ◦ (t, Dec, r) the counter’s value is 0. Thus, (t, Get, 1)
is a feasible suffix after the first history but not the second.
The following example shows that the dynamic right mover is not a symmetric property.
Example 3.3.2 Let hi be a history that ends with a counter value of i > 0. The operation Inc is
not a dynamic right mover with respect to the set {Dec} in the context of hi since after a history
hi ◦ (t, Inc, r) ◦ (t1 , Dec, r1 ), . . . , (ti+1 , Dec, ri+1 ) the Counter’s value is 0, and after
hi ◦ (t1 , Dec, r1 ), . . . , (ti+1 , Dec, ri+1 ) ◦ (t, Inc, r) the Counter’s value is 1.
One important aspect of the definition of dynamic right movers is the following: it is possible to
have h : m .A {m1 } and h : m .A {m2 } but not h : m .A {m1 , m2 }.
Static Right Movers and Commutativity The notion of dynamic right mover can be used to define
static right mover and commutativity. We say that an operation m is a static right mover with respect to
operation m0 , if every feasible history h satisfies h : m .A {m0 }. We say that m and m0 are staticallycommutative, if m is a static right mover with respect to m0 and vice versa.
3.3.4
Serializability
It follows from the preceding discussion that an incomplete history h may already reflect some executionorder constraints among the threads that must be satisfied by any other history that is equivalent to h.
These execution-order constraints can be captured as a partial-ordering on thread-ids.
Definition 3.3 (Safe Ordering) Let h be a history of E; and let Th be the set of threads that appear in
h. A partial ordering v ⊆ Th × Th , is said to be safe for h iff for any prefix h0 ◦ (t, m, r) of h, where
m ∈ OPB , we have h0 ↓ B : m .B IE (h0 , P ), where P = {t0 ∈ Th | t 6v t0 }.
A safe ordering represents a conservative over-approximation of the execution-order constraints
among thread-ids (required for serializability). Note that in the above definition, the right-mover property is checked only with respect to the base library B.
Example 3.3.3 Assume that the Counter is initialized with a value I > 0. Consider the history (return
values omitted for brevity):
h = (t, mayU seDec), (t0 , mayU seInc), (t, Dec), (t0 , Inc).
If v is a safe partial order for h, then t0 v t because after the third event Inc is not a dynamic right
mover with respect to the operations allowed for t (i.e., {Dec}). Dually, the total order defined by t0 v0 t
3.3. F ORESIGHT-BASED S YNCHRONIZATION
55
is safe for h since after the second event, the operation Dec is a dynamic right mover with respect to the
operations allowed for t0 (i.e., {Inc}) because the Counter’s value is larger than 0.
Definition 3.4 (Safe Extension) We say that library E is safe extension of B, if for every h ∈ HE that
follows the client protocol there exists a partial ordering vh on threads that is safe for h.
The above definition prescribes the synchronization (specifically, blocking) that a safe extension
must enforce. In particular, assume that h is feasible history allowed by the library. If the history
h ◦ (t, m, r) has no safe partial ordering, then the library must block the call to m by t rather than return
the value r.
Theorem 3.5 (Serializability) Let E be a safe extension of a library B. Let C be a client of E. Any
execution ξ of (C, E) that follows the client protocol is B-serializable.
Proof Let v be a safe partial ordering for φ(ξ). Let tz1 , tz2 , · · · , tzp denote any total ordering of the
threads that execute in ξ that is consistent with v: i.e., tzi v tzj ⇒ i ≤ j. Let ξb be ξ ↓ B. Let ξi
denote ξb | tzi . We can inductively show that ξni = ξ1 , ξ2 , · · · , ξp is a valid non-interleaved execution of
(C ↓ B, B), using the right-mover property. 3.3.5 B-Serializable-Completability
We saw in Section 3.1 and Figure 3.4 that some serializable (incomplete) executions may be doomed:
i.e., there may be no way of completing the execution in a serializable way. Safe extensions, however,
ensure that all executions avoid doomed nodes and are serializably completable. However, we cannot
guarantee completability if a client thread contains a non-terminating loop or violates the client protocol.
This leads us to the following conditional theorem.
Theorem 3.6 (B-Serializable-Completability) Let B be a total and deterministic library. Let E be a
safe extension of B. Let C be a client of E. If every sequential execution of (C, E) follows the client
protocol and is completable, then the following are satisfied:
1. every execution of (C, E) is B-serializably-completable
2. every execution of (C, E) follows the client protocol
The precondition in Theorem 3.6 is worth noting. We require client threads to follow the client
protocol and terminate. However, it is sufficient to check that clients satisfy these requirements in
sequential executions. This simplifies reasoning about the clients.
56
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
Proof for Theorem 3.6 We prove the theorem by using the following lemmas.
Lemma 3.7 Let B be a deterministic library. Let E be an extension of B. Let C be a client of E such that
every sequential execution of (C, E) is completable. If π is a sequential execution of C such that π ↓ B is
an execution of (C ↓ B, B), then π is an execution of (C, E). (i.e., π is an execution of the composition
of client C with library E.)
Proof We use induction on the length of the executions. Assume that π = π 0 ◦ α where α is a single
transition. Form the induction hypothesis, π 0 is an execution of (C, E). Hence, φ(π 0 ) ∈ HE . If α does
not execute an event then φ(π) ∈ HE — therefore π is an execution of (C, E). If α executes an event,
we assume that α executes the event (t, m, r). There exists a sequential execution of (C, E) in which
after π 0 thread t invokes operation m (this operation is not blocked since all sequential executions of
(C, E) are completable). Hence, there exists r0 such that φ(π 0 ) ◦ (t, m, r0 ) ∈ HE .
We consider the following cases.
Case 1: m is a mayUse operation.
All mayUse operations always return ”void”, hence r = r0 . Therefore π is an execution of (C, E).
Case 2: m is not a mayUse operation.
In this case,m ∈ OPB . φ(π 0 ↓ B) ◦ (t, m, r) ∈ HB (because π ↓ B is an execution of (C ↓ B, B)). Since
B is deterministic, r = r0 . Therefore π is an execution of (C, E). Lemma 3.8 Let B be a total and deterministic library. Let E be a safe extension of B. Let C be a client
of E such that every sequential execution of (C, E) follows the client protocol and is completable. Let π
be an incomplete execution such that π ↓ B is an execution of (C ↓ B, B), and v is a safe total order for
φ(π). There exists a sequence of transitions π 0 such that ππ 0 ↓ B is a complete execution of (C ↓ B, B),
v is a safe total order for φ(ππ 0 ), and each thread that appears in π 0 also appears in π.
Proof Intuitively, we wish to show that π ↓ B can be completed (in (C ↓ B, B)) such that v remains a
safe order for the execution.
We write tz1 , tz2 , · · · , tzn to denote the threads in π where i ≤ j ⇒ tzi v tzj . Let πi denote
π | tzi . From Definition 3.3, we know that π1 , · · · , πn ↓ B is an execution of (C ↓ B, B) such that
EB [φ(π ↓ B)] ⊆ EB [φ(π1 , · · · , πn ↓ B)].
Consider the maximal k such that π1 , · · · , πk is a sequential execution. We show below how we can let
tzk complete execution. Applying the same argument inductively gives us the lemma.
From Lemma 3.7, we know that π1 , · · · , πk is a sequential execution of (C, E). All sequential executions of (C, E) are completable, hence there exists α such that: α is a transition of tzk and π1 , · · · , πk , α
is a sequential execution of (C, E).
We want to show that πα ↓ B is an execution of (C ↓ B, B) and v is safe for φ(πα) We assume that α
3.3. F ORESIGHT-BASED S YNCHRONIZATION
57
invokes an event (tzk , m, r) where m ∈ OPB (otherwise, the proof is trivial).
Thread tzk invokes operation m after π1 , · · · , πk , therefore it invokes operation m after π (it has the
same local state after both executions). m is in the intension set of tzk after π1 , · · · , πk , therefore m is
in the intension set of tzk after π.
Since B is total, φ(π ↓ B) ◦ (tzk , m, r0 ) ∈ HB .
The order v is safe for φ(π) ◦ (tzk , m, r0 ), therefore:
φ(π1 , · · · , πk ↓ B) ◦ (tzk , m, r0 ) ◦ φ(πk+1 , · · · , πn ↓ B) ∈ HB .
Since π1 , · · · , πk , α ↓ B is an execution of (C ↓ B, B), we know that φ(π1 , · · · , πk ↓ B) ◦ (tzk , m, r) ∈
HB . Since B is deterministic, r = r0 .
Therefore, πα ↓ B is an execution of (C ↓ B, B) and v is safe for φ(πα). Lemma 3.9 Let B be a total and deterministic library. Let E be a safe extension of B. Let C be a client
of E such that every sequential execution of (C, E) follows the client protocol and is completable. If π is
an execution of (C, E) that follows the client protocol, then π is B-serializably-completable.
Proof Since E is a safe extension and φ(π) follows the client protocol, there exists a total order v that is
safe for φ(π). From Lemma 3.8, there exists a sequence of transitions π 0 such that ππ 0 ↓ B is a complete
execution of (C ↓ B, B), v is a safe order for φ(ππ 0 ), and each thread that appears in π 0 also appears in
π. Hence ππ 0 ↓ B is a complete serializable execution of (C ↓ B, B). Lemma 3.10 Let B be a total and deterministic library. Let E be a safe extension of B. Let C be a client
of E such that every sequential execution of (C, E) follows the client protocol and is completable. Every
execution of (C, E) follows the client protocol.
Proof We use induction on the length of the executions. Assume that π = π 0 ◦ α where α is a single
transition. Form the induction hypothesis, π 0 follows the client protocol.
If α does not invoke a base operation, then π follows the client protocol. Otherwise, we assume that α
invokes (t, m, r).
From Lemma 3.9, there exists π 00 such that π 0 π 00 ↓ B is a complete serializable execution. Let πs be a
sequential execution such that πs ↓ B is equivalent to π 0 π 00 ↓ B. From Lemma 3.7, πs is an execution
of (C, E). Hence, πs follows the client protocol.
Hence, m is in the intension set of t after π 0 |t. Hence, m is in the intension set of t after π 0 . Therefore,
π follows the client protocol. Theorem 3.6 is a conclusion from Lemma 3.9 and Lemma 3.10.
58
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
3.3.6
E-Completability
The preceding theorem about B-Serializable-Completability has a subtle point: it indicates that it is
possible to complete any execution of (C, E) in a serializable fashion in B. The extended library E,
however, could choose to block operations unnecessarily and prevent progress. This is undesirable. We
now formulate a desirable progress condition that the extended library must satisfy.
In the sequel we assume that every terminated thread has an empty intension set — i.e., if thread t
has terminated in execution ξ then IE (φ(ξ), t) = ∅. This can be realized (for example) by ensuring that
every thread always executes a mayUse operation m such that mayE (m) = {} before it terminates (such
operation can be seen as an ”end-transaction” operation).
Given a history h and a thread t, we say that t is incomplete after h iff IE (h, t) 6= ∅. We say that
history h is incomplete if there exists some incomplete thread after h.
We say that a thread t is enabled after history h, if for all events (t, m, r) such that h ◦ (t, m, r)
satisfies the client protocol and h ◦ (t, m, r) ↓ B ∈ HB , we have h ◦ (t, m, r) ∈ HE . Note that this
essentially means that E will not block t from performing any legal operation.
Definition 3.11 (Progress Condition) We say that a library E satisfies the progress condition iff for
every history h ∈ HE that follows the client protocol the following conditions hold:
• If h is incomplete, then at least one of the incomplete threads t is enabled after h.
• If h is complete, then every thread t that does not appear in h is enabled after h.
Theorem 3.12 (E-Completability) Let B be a total and deterministic library. Let E be a safe extension
of B that satisfies the progress condition. Let C be a client of E. If every sequential execution of
(C, E) follows the client protocol and is completable, then every execution of (C, E) is completable and
serializable.
Proof From Theorem 3.6 and the progress condition.
3.3.7
Special Cases
In this subsection we describe two special cases of safe extension.
Eager-Ordering Library
Our notion of safe-ordering permits v to be a partial order. In effect, this
allows the system to determine the execution-ordering between transactions lazily, only when forced to
do so (e.g., when one of the transactions executes a non-right-mover operation). One special case of this
approach is to use a total order on threads, eagerly ordering threads in the order in which they execute
3.4. AUTOMATIC F ORESIGHT FOR C LIENTS
59
their first operations. The idea of shared-ordered locking [13] in databases is similar to this. Using such
approach guarantees strict-serializability [75] which preserves the runtime order of the threads.
Definition 3.13 Given a history h we define an order ≤h of the threads in h such that: t ≤h t0 iff t = t0
or the first event of t precedes the the first event of t0 (in h).
Definition 3.14 (Eager-Ordering Library) We say that library E is eager-ordering if for every h ∈ HE
that follows the client protocol, ≤h is safe for h.
Library with Semantic-Locking
A special case of eager-ordering library is library with semantic
locking. (This special case appears in the database literature, e.g., see [20, chapter 3.8] and [86, chapters
6–7]). The idea here is to ensure that two threads are allowed to execute concurrently only if any operations they can invoke commute with each other. This is achieved by treating each mayUse operation as
a lock acquisition (on the set of operations it denotes). A mayUse operation m by any thread t, after a
history h, will be blocked if h contains a thread t0 6= t such that some operation in mayE (m) does not
statically commute with some operation in IE (h, t0 ).
Definition 3.15 (Library with Semantic-Locking) We say that E is a library with semantic locking, if
for every h ∈ HE that follows the client protocol: if t, t0 are two different threads that appear in h, and
m ∈ IE (h, t) and m0 ∈ IE (h, t0 ), then m and m0 are statically-commutative.
Note that, for the examples shown in Section 3.1 such library will not allow the threads to run
concurrently. This is because the operations Inc and Dec are not statically-commutative.
3.4
Automatic Foresight for Clients
In this section, we present our static analysis to infer calls (in the client code) to the API used to pass
the foresight information. The static analysis works for the general case covered by our formalism, and
does not depend on the specific implementation of the extended library.
We assume that we are given the interface of a library E that extends a base library B, along with
a specification of the semantic function mayE using a simple annotation language. We use a static
algorithm for analyzing a client C of B and instrumenting it by inserting calls to mayUse operations that
guarantee that (all sequential executions of) C correctly follows the client protocol.
Example Library. In this section, we use a library of Maps as an example. The base procedures of the
library are shown in Figure 3.8 (their semantics will be described later). The mayUse procedures are
shown in Figure 3.9 — their semantic function is specified using the annotations that are shown in this
figure (the annotation language is described in Section 3.4.1).
Figure 3.10 shows an example of a code section with calls to the base library procedures. The calls
to mayUse procedures shown in bold are inferred by our algorithm (described in Section 3.4.2).
60
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
int createNewMap();
int put(int mapId,int k,int v);
int get(int mapId,int k);
int remove(int mapId,int k);
bool isEmpty(int mapId);
int size(int mapId);
Figure 3.8: Base procedures of the example Maps library.
void mayUseAll();@{(createNewMap),(put,*,*,*),(get,*,*),(remove,*,*),(isEmpty,*),(size,*)}
void mayUseMap(int m);@{(put,m,*,*),(get,m,*),(remove,m,*),(isEmpty,m),(size,m)}
void mayUseKey(int m,int k);@{(put,m,k,*),(get,m,k),(remove,m,k)}
void mayUseNone();@{}
Figure 3.9: Annotated mayUse procedures of the example library.
mayUseMap(m);
if (get(m,x) == get(m,y)) {
mayUseKey(m,x); remove(m,x); mayUseNone();
} else {
remove(m,x); mayUseKey(m,y); remove(m,y); mayUseNone();
}
Figure 3.10: Code section with inferred calls to mayUse procedures.
3.4. AUTOMATIC F ORESIGHT FOR C LIENTS
3.4.1
61
Annotation Language
The semantic function mayE is specified using annotations. These annotations are described by symbolic
operations and symbolic sets.
Let PVar be a set of variables, and ∗ be a symbol such that ∗ 6∈ PVar. A symbolic operation
(over PVar) is a tuple of the form (p, a1 , · · · , an ), where p is a base library procedure name, and each
ai ∈ PVar ∪ {∗}. A symbolic set is a set of symbolic operations.
Example 3.4.1 Here are four symbolic sets for the example library (we assume that m, k ∈ PVar):
SY1 = {(createNewMap), (put, ∗, ∗, ∗), (get, ∗, ∗), (remove, ∗, ∗), (isEmpty, ∗), (size, ∗)}
SY2 = {(put, m, ∗, ∗), (get, m, ∗), (remove, m, ∗), (isEmpty, m), (size, m)}
SY3 = {(put, m, k , ∗), (get, m, k ), (remove, m, k )}.
SY4 = {}
Let Value be the set of possible values (of parameters of base library procedures). Given a function
asn : P V ar 7→ Value and a symbolic set SY, we define the set of operations SY(asn) to be
[
{(p, v1 , . . . , vn ) | ∀i.(ai 6= ∗) ⇒ (vi = asn(ai ))}.
(p,a1 ,...,an )∈SY
Example 3.4.2 Consider the symbolic sets from Example 3.4.1. The set SY3 (asn) contains all operations with the procedures put, get, and remove in which the first parameter is equal to asn(m) and the
second parameter is equal to asn(k). The sets SY1 (asn) and SY4 (asn) are not dependent on asn. The set
SY1 (asn) contains all operations with the procedures createNewMap, put, get, remove, isEmpty
and size. The set SY4 (asn) is empty.
The Annotations Every mayUse procedure p is annotated with a symbolic set over the the set of
formal parameters of p. For example, in Figure 3.9, the procedure mayUseAll is annotated with SY1 ,
mayUseMap is annotated with SY2 , mayUseKey is annotated with SY3 , and mayUseNone is annotated
with SY4 .
Let p be a mayUse procedure with parameters x1 , . . . , xn which is annotated with SYp . An invocation of p with the values v1 , . . . , vn is a mayUse operation that refers to the set defined by SYp and a
function that maps xi to vi (for every 1 ≤ i ≤ n).
Example 3.4.3 In Figure 3.9, the procedure mayUseAll() is annotated with SY1 , hence its invocation
is a mayUse operation that refers to all the base library operations . The procedure mayUseKey(int
m, int k) is annotated with SY3 , hence mayUseKey(0,7) refers to all operations with the proce-
dures put, get, and remove in which the first parameter is 0 and the the second parameter is 7.
62
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
3.4.2
Inferring Calls to mayUse Procedures
We use a simple abstract interpretation algorithm ([73]) to infer calls to mayUse procedures. Given a
client C of B and annotated mayUse procedures, our algorithm conservatively infers calls to the mayUse
procedures such that the client protocol is satisfied in all sequential executions of C.
Assumptions. The algorithm assumes that there exists a mayUse procedure (with no parameters) that
refers to the set of all base library operations (the client protocol can always be enforced by adding a
call to this procedure at the beginning of each code section). It also assumes that there exists a mayUse
procedure (with no parameters) that refers to an empty set, the algorithm adds a call to this procedure at
the end of each code section.
Correct Symbolic Sets Every thread local state σ defines a function asnσ : P V ar 7→ Value such that
asnσ (x) = v iff v is the value of x in state σ. We say that a symbolic set SY is correct for a program
point ` if every sequential execution ξ, thread t, and a local state σ in which thread t is at location `
satisfy: if σ is a state of ξ | t, then all operations invoked by thread t after σ are in SY(asnσ ).
The Relation ⊇. We say that symbolic set SY is a superset of a symbolic set SY0 , denoted SY ⊇ SY0 ,
if every total function asn : P V ar 7→ Value satisfies SY(asn) ⊇ SY0 (asn). For example, the symbolic
sets defined in Example 3.4.1 satisfy SY1 ⊇ SY2 ⊇ SY3 ⊇ SY4 .
Computing Correct Symbolic Sets
Our algorithm uses a simple static analysis that computes a cor-
rect symbolic set for every program point. We phrase this analysis as a simple backwards abstract
interpretation ([73]).
Our analysis computes symbolic sets in which each procedure has at most 2 symbolic operations
(i.e., in every symbolic set there are no 3 symbolic operations with the same procedure name)2 . We
write US to denote the set of such symbolic sets.
We use the set US and the relation ⊇ to define the lattice L = hUS, ⊇i. Note that its minimum
element is an empty set; its maximum element is a set in which every procedure p has the symbolic
operation with the form (p, ∗, · · · , ∗) (where the number of ∗ instances equals to the number of p’s
arguments).
We use a function R : P(US) × PVar 7→ US which is defined as follows: for every S ⊆ US and
x ∈ PVar, the value of R(S, x) is obtained from S by replacing every instance of x with ∗. For example
the value of R({{(get, ∗, x), (put, ∗, x, y)}, {(contains, ∗, z)}}, x) is
{{(get, ∗, ∗), (put, ∗, ∗, y)}, {(contains, ∗, z)}}
2
Any constant number can be used. In our implementation, we have used 2.
3.4. AUTOMATIC F ORESIGHT FOR C LIENTS
63
[[skip]](S)
=
S
[[x = exp]](S)
=
R(S, x)
[[assume(x)]](S)
=
S
[[x = p(x1 , . . . , xn )]](S)
=
R(S ∪ {p(x1 , . . . , xn )}, x)
Figure 3.11: Abstract transformers for computing correct symbolic sets.
For each program point we compute an element of L by using the abstract transformers shown in
Figure 3.11.
mayUse Invocations. Every possible client instruction (as defined in Figure 3.7) that invokes a mayUse
procedure corresponds to a symbolic set (as described in Section 3.4.1). For example, according to Figure 3.9, the instruction mayUseKey(x,y) corresponds to the symbolic set:
{(put, x, y, ∗), (get, x, y), (remove, x, y)}.
For every program label l with a computed symbolic set SYl , we find a minimal symbolic set SY0l ⊇
SYl that corresponds to a client instruction that invokes a mayUse procedure. We add this instruction to
the code at l. The assumption that the library has a mayUse procedure that corresponds to all operations
ensures that for every program label we will find an instruction. The added instructions guarantee that
the transformed code sections follow the client protocol (in all possible sequential executions).
The assumption that the library has a mayUse procedure that corresponds to an empty set ensures
that the algorithm adds a call to this procedure at the end (i.e., exit site) of each code section.
Identifying Redundant mayUse Operations The algorithm identifies and removes redundant mayUse
operations by inspecting the CFG of the code sections. The algorithm repeatedly uses the following
heuristics:
• If the CFG has two nodes n1 , n2 with mayUse operations, and every path from the entry site to
n2 contains n1 , and every path from n1 to n2 does not contain a call to a procedure (of the base
library), then the mayUse operation in n1 is redundant.
• If the CFG has two nodes n1 , n2 with an identical call to mayUse operation p(x1 , . . . , xn ), and
every path from the entry site to n2 contains n1 , and the variables x1 , . . . , xn are not assigned
between n1 and n2 , then the mayUse operation in n2 is redundant.
64
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
3.4.3
Implementation for Java Programs
We have implemented the algorithm for Java programs in which the relevant code sections (that should
appear to executed atomically) are annotated as atomic sections (an example is shown in Figure 3.12.)
Unprotected Accesses and Invocations.
Our implementation may fail to enforce atomicity of the
Java code sections because: (i) Java code can access shared-memory which is not part of the extended
library (e.g., by accessing a global variable); (ii) our simple implementation does not analyze the
procedures which are invoked by the annotated code sections. The implementation reports (warnings)
about suspected accesses (to shared-memory) and about invocations of procedures that do not belong
to the extended library. These reports should be handled by a programmer or by a static algorithm
(e.g., purity analysis [82]) that verifies that they will not be used for inter-thread communication (in
our formal model, they can be seen as thread-local operations). In order to avoid superfluous warnings,
the implementation uses a list of common pure procedures from the Java standard libraries (part of its
configuration) — these procedures are not reported by our implementation. For example the procedure
invoked in Figure 3.12 at line 7 (the constructor of Integer) will not be reported because this procedure
is known as a pure procedure.
Java Exceptions
The implementation of our algorithm has a mode in which it considers Java ex-
ceptions. This is realized by considering a CFG that contains edges which are used when exceptions are thrown (our implementation obtained such CFG by using utilities from [1]). For simplicity, in most part of this work we ignore exceptions. In the experiments described in Section 3.6, we
have considered exceptions because we cannot assume an absence of exceptions, and we do not know
the intended impact of exceptions in the mentioned applications (e.g., in Figure 3.12 the exception
NullPointerException is thrown at line 5, this exception may have a meaning which is used by the
application code).
When considering exception, the mayUse operations that should be added to the exit site, are added
using a Java try-finally statement (to make sure that these operations will be executed). This is demonstrated in Figure 3.13.
3.5
Implementing Libraries with Foresight
In this section we present a generic technique for realizing an eager-ordering safe extension (see Definition 3.14) of a given base library B. Our technique exploits a variant of the tree locking protocol over
a tree that is designed according to semantic properties of the library’s operations. The approach can be
used by a library designer for the implementation of an extended (transactional) library.
3.5. I MPLEMENTING L IBRARIES WITH F ORESIGHT
65
1 public int getStateIndex(String state, boolean add) { @atomic {
2 Integer index = stepStateIndex.get(state);
3 if ((index == null) && add) {
4
if (state == null)
5
throw new NullPointerException("String state");
6
int t = stepStateIndex.size();
7
index = new Integer(t);
8
Integer check = stepStateIndex.putIfAbsent(state,index);
9
if (check != null) {
10
11
index = check;
}
12 }
13 return index != null ? index : -1;
14 }}
Figure 3.12: Composite operation from Tammi[2].
1 public int getStateIndex(String state, boolean add) { @atomic {
2 try {
3
TLibrary.mayUseAll();
4
Integer index = stepStateIndex.get(state);
5
if ((index == null) && add) {
6
if (state == null)
7
throw new NullPointerException("String state");
8
int t = stepStateIndex.size();
9
index = new Integer(t);
10
TLibrary.mayUseKey(state);
11
Integer check = stepStateIndex.putIfAbsent(state,index);
12
TLibrary.mayUseNone();
13
if (check != null) {
14
15
index = check;
}
16
}
17
return index != null ? index : -1;
18 } finally { TLibrary.mayUseNone(); }
19 } }
Figure 3.13: Composite operation from Figure 3.12 with added mayUse operations. (In this figure,
the mayUse operations are invoked by calling static methods of the class ”TLibrary”, see details in
Section 3.7.4) .
66
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
Assumptions In this section we assume that: (i) the first operation which is invoked by a thread is
a mayUse operation; (ii) the last operation which is invoked by a thread is a mayUse operation that
refers to an empty set. Note that, both assumptions are satisfied by the calls inferred by the algorithm of
Section 3.4.
3.5.1
The Basic Approach
Example Library. Here, we use the example from Section 3.4. The procedures of the base library are
shown in Figure 3.8. The procedure createNewMap creates a new Map and returns a unique identifier
corresponding to this Map. The other procedures have the standard meaning (e.g., as in java.util.Map),
and identify the Map to be operated on using the unique mapId identifier. In all procedures, k is a key,
v is a value.
We now describe the mayUse procedures we use to extend the library interface (formally defined by
the annotations in Figure 3.9):
(1) mayUseAll(): indicates that the transaction may invoke any library operations
(2) mayUseMap(int mapId): indicates that the transaction will invoke operations only on Map
mapId
(3) mayUseKey(int mapId, int k) : indicates that the transaction will invoke operations only
on Map mapId and key k
(4) mayUseNone(): indicates the end of transaction (it will invoke no more operations)
In the following, we write mayE (m) to denote the the set of operations associated with the mayUse
operation m.
Implementation Parameters Our technique is parameterized and permits the creation of different instantiations offering tradeoffs between concurrency granularity and overheads. The parameters to our
extension are a locking-tree and a lock-mapping.
A locking-tree is a directed static tree where each node n represents a (potentially unbounded) set
of library operations On , and satisfies the following requirements:
(i) the root of the locking-tree
represents all operations of the base library; (ii) if n0 is a child of n, then On0 ⊆ On ; (iii) if n and n0 are
roots of disjoint sub-trees, then every m ∈ On and m0 ∈ On0 are statically-commutative.
Example 3.5.1 Figure 3.14 shows a possible locking-tree for the Map library. The root A represents all
(library) operations. Each M i (i = 0, 1) represents all operations with argument mapId that satisfies3
: i = mapId % 2. Each Kji (i = 0, 1 and j = 0, 1, 2) represents all operations with arguments mapId
and k that satisfy: i = mapId % 2 ∧ j = k% 3.
3
We write % to denote the modulus operator. Note that we can use a hash function (before applying the modulus operator).
3.5. I MPLEMENTING L IBRARIES WITH F ORESIGHT
67
A
1
0
M
M
0
K0
0
K1
0
K2
1
K0
1
K1
1
K2
Figure 3.14: The locking-tree used in the example.
The lock-mapping is a function P from mayUse operations to tree nodes and a special value ⊥. For
a mayUse operation m, P (m) is the node which is associated with m. For each mayUse operation m,
P should satisfy: if mayE (m) 6= ∅, then mayE (m) ⊆ OP (m) , otherwise P (m) = ⊥.
Example 3.5.2 Here is a possible lock-mapping for our example. mayUseAll() is mapped to the root
A. mayUseMap(mapId) is mapped to M i where i = mapId % 2. mayUseKey(mapId,k) is mapped
to Kji where i = mapId % 2 ∧ j = k% 3. mayUseNone() is mapped to ⊥.
Implementation We associate a lock with each node of the locking-tree. The mayUse operations are
implemented as follows:
• The first invocation of a mayUse operation m by a thread (that has not previously invoked any
mayUse operation) acquires the lock on P (m) as follows. The thread follows the path in the tree
from the root to P (m), locking each node n in the path before accessing n’s child. Once P (m)
has been locked, the locks on all nodes except P (m) are released.4
• An invocation of a mayUse operation m0 by a thread that holds the lock on P (m), locks all nodes
in the path from P (m) to P (m0 ) (in the same tree order), and then releases all locks except
P (m0 ). If P (m0 ) = P (m) or P (m0 ) is not reachable from P (m),5 then the execution of m0 has
no impact.
• If a mayUse operation m is invoked by t and P (m) = ⊥, then t releases all its owned locks .
Furthermore, our extension adds a wrapper around every base library procedure, which works as
follows. When a non-mayUse operation m is invoked, the current thread t must hold a lock on some
node n (otherwise, the client protocol is violated). Conceptually, this operation performs the following
steps: (1) wait until all the nodes reachable from n are unlocked; (2) invoke m of the base library and
return its return value. Here is a possible pseudo-code for isEmpty:
4
5
This is simplified version. Other variants, such as hand-over-hand locking, will work as well.
This may happen, for example, when OP (m0 ) ⊃ OP (m) .
68
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
bool isEmpty(int mapId) {
n := the node locked by the current thread
wait until all nodes reachable from n are unlocked
return baseLibrary.isEmpty(mapId);
}
Correctness The implementation satisfies the progress condition because: if there exist threads that
hold locks, then at least one of them will never wait for other threads (because of the tree structure, and
because we assume that the base library is total).
We say that t is smaller than t0 , if the lock held by t is reachable from the lock held by t0 . The
following property is guaranteed: if t ≤h t0 (see Definition 3.13) then either t is smaller than t0 or all
operations allowed for t and t0 are statically-commutative. In an implementation, a non-mayUse operation waits until all smaller threads are completed, hence the extended library is a safe extension.
Note that we have ignored cases in which the first operation that is invoked by a thread is a nonmayUse operation (because of our assumption, and because such cases will never occur with the algorithm described in Section 3.4). Nevertheless, if needed, a library implementation can dynamically
ensure that the first operation is a mayUse operation by automatically invoke such operation just before executing the first non-mayUse operation. This can be implemented, for example, by adding the
following code to the beginning of each base procedure:
if( no node is locked by the current thread ) mayUseAll();
... // the remaining code of the procedure
3.5.2
Using Dynamic Information
The dynamic information utilized by the basic approach is limited. In this section we show two ways
that enable (in some cases) to avoid blocking by utilizing dynamic information.
Utilizing the State of the Locks
In the basic approach, a non-mayUse operation, invoked by thread t, waits until all the reachable nodes
(i.e., reachable from the node which is locked by t) are unlocked — this ensures that the operation is a
right-mover with respect to the preceding threads. In some cases this is too conservative; for example:
Example 3.5.3 Consider the example from Section 3.5.1, and a case in which thread t holds a lock on
M 0 (assume t is allowed to use all operations of a single Map). If t invokes remove(0,6) it will wait
until K00 , K10 and K20 are unlocked. But, waiting for K10 and K20 is not needed, because threads that hold
3.5. I MPLEMENTING L IBRARIES WITH F ORESIGHT
69
locks on these nodes are only allowed to invoke operations that are commutative with remove(0,6).
In this case it is sufficient to wait until K00 is unlocked.
So, if a non-mayUse operation m is invoked, then it is sufficient to wait until all reachable nodes in
the following set are unlocked:
{n | ∃m0 ∈ On : m is not static-right-mover with m0 }
Utilizing the State of the Base Library
In some cases, the state of the base library can be used to avoid waiting. For example:
Example 3.5.4 Consider the example from Section 3.5.1, and a case in which thread t holds a lock on
M 0 (assume t is allowed to use all operations of a single Map), and other threads hold locks on K00 ,
K10 and K20 . If t invokes isEmpty, it will have to wait until all the other threads unlock K00 , K10 and
K20 . This is not always needed, for example, if the Map manipulated by t has 4 elements, then the other
threads will never be able to make the Map empty (because, according to the Map semantics, they can
only affect 3 keys, so they cannot remove more than 3 elements). Hence, the execution of isEmpty by t
is a dynamic-right-mover.
A library designer can add code that observes the library’s state and checks that the operation is a
dynamic-right-mover; in such a case, it executes the operation of the base library (without waiting). For
example, the following code lines can be added to the beginning of isEmpty(int mapId):
bool c1 = M 0 or M 1 are held by the current thread ;
bool c2 = baseLibrary.size(mapId) > 3 ;
if(c1 and c2) return baseLibrary.isEmpty(mapId);
... // the remaining code of isEmpty
This code verifies that the actual Map cannot become empty by the preceding threads; in such a case
we know that the operation is a dynamic-right-mover. Note that writing code that dynamically verifies
right-moverness may be challenging, because it may observe inconsistent state of the library (i.e., the
library may be concurrently changed by the other threads).
3.5.3
Optimistic Locking
According to the above description, the threads are required to lock the root of the locking-tree. This
may create contention (because of several threads trying to lock the root at the same time) and potentially
degrade performance [54].
To avoid contention, we use the following technique. For each lock we add a counter — the counter
is incremented whenever the lock is acquired. When a mayUse operation m is invoked (by a thread that
70
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
has not invoked a mayUse operation) it performs the following steps: (1) go over all nodes from the
root to P (m) and read the counter values; (2) lock P (m); (3) go over all nodes from the root to P (m)
(again), if one node is locked or its counter has been modified then unlock P (m) and restart (i.e., go to
step 1).
The idea is to simulate hand-over-hand locking by avoid writing to shared memory. This is done by
only locking the node P (m) (and only read the state of the other nodes). When we do not restart in step
3, we know that the execution is equivalent to one in which the thread performs hand-over-hand locking
from the root to P (m).
3.5.4
Further Extensions
In this section, we describe additional extensions of the basic approach. We first describe how to utilize
read-write locks — this enables situations in which several threads hold the same lock (node). We also
describe how to associate several nodes with the same mayUse operation — this enables, for example,
mayUse operation that is associated with operations on several different keys.
Utilizing Read-Write Locks
The basic approach does not permit situations in which several threads simultaneously hold lock on
the same node, this prevents situations in which several threads are simultaneously allowed to invoke
commutative operations which are represented by the same node (this is sometimes desirable, for example, one may want to allow several threads to simultaneously invoke all read-only operations which are
represented by the root).
In order to extend the basic approach, we represent an implementation by using a locking-tree and
a lock-mapping P (as defined in Section 3.5.1) and a set of operations R. The set R should satisfy the
following: every m, m0 ∈ R are statically-commutative.
Example 3.5.5 For the example presented in Section 3.5.1, we can use a set R with the following
operations: all read-only operations (i.e., invocations of isEmpty, size and get) and invocations of
createNewMap. Any pair of operations from R are statically-commutative.
Here the implementation is similar to Section 3.5.1 with the following differences:
• For each mayUse operation m (such that mayE (m) 6= ∅) we create a mayUse operation mR
such that mayE (mR ) = mayE (m) ∩ R. The lock-mapping P maps m and mR to the same node
(i.e., P (m) = P (mR )). m is called a W-mayUse operation, mR is called R-mayUse operation. For the example from Section 3.5.1, such operations can be addded by adding the following mayUse procedures: mayUseAllReadOnly(), mayUseMapReadOnly(int mapId) and
mayUseKeyReadOnly(int mapId, int k).
3.5. I MPLEMENTING L IBRARIES WITH F ORESIGHT
71
• The locks are read-write locks — can be acquired in a read-mode or a write-mode. A write-mode
can be downgraded to a read-mode. (e.g., see Java’s ReentrantReadWriteLock).
• The R-mayUse operations always acquire nodes in a read-mode, the W-mayUse operations always
acquire in a write-mode.
• If a R-mayUse operation mR is invoked, and P (mR ) is currently held by the current thread in
a write mode, then the locking mode of P (mR ) is changed (downgraded) from write-mode to
read-mode.
• If a W-mayUse operation is invoked after a R-mayUse operation (by the same thread) the invocation has no impact.
• When a non-mayUse operation is invoked then: if the thread currently holds lock in a write-mode
then it waits until all reachable nodes are unlocked; otherwise (holds in a read-mode), it waits
until all reachable nodes are either unlocked or locked in a read-mode. (note that, the technique
from Section 3.5.2 can still be used in order to avoid considering all reachable nodes).
Associating Several Nodes with One mayUse operation
In some cases it is desirable to associate several nodes with one mayUse operation. This enables, for
example, mayUse operation that is associated with the nodes K00 and K10 from Section 3.5.1 (these
nodes represent operations on different keys)
In order to handle this, the lock-mapping P is a function from mayUse operations to (non-empty)
sets of nodes and ⊥. For each mayUse operation m, P should satisfy: if mayE (m) 6= ∅, then mayE (m) ⊆
S
n∈P (m) On , otherwise P (m) = ⊥.
Here, a mayUse operation m needs to consider all the paths to the nodes in P (m) — i.e., the
approach remains the same but instead of considering one node, we consider several nodes.
• An invocation of a mayUse operation m (by a thread that has not invoked a mayUse operation)
locks all nodes in the paths from the root to the nodes in P (m). The nodes are locked in a tree
order — if n is a parent of n0 , then n is locked before n0 . After all nodes in the paths are locked:
all the nodes, except those in P (m), are released.
• An invocation of a mayUse operation m0 by a thread that holds lock on nodes in the set S, locks
all nodes in the paths from S to P (m0 ) (in the same tree order), and then releases all nodes except
the nodes in P (m0 ). Note that, nodes which are not reachable from S are ignored.
72
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
3.5.5
Java Threads and Transactions
In the Java environment (and in other modern programming environments) the same thread may need to
execute several transactions (one after the other). This does not restrict our technique, since whenever a
thread t invokes mayUse operation of an empty set, the library forgets the identity of t (by releasing all
locks which are owned by t), and therefore thread t can be seen as a brand new transaction.
3.6
Experimental Evaluation
In this section we present an experimental evaluation of our approach. The goals of the evaluation are:
(i) to measure the precision and applicability of the static algorithm presented in Section 3.4; (ii) to
compare the performance of our approach to a synchronization implemented by experts; and (iii) to
determine if our approach can be used to perform synchronization in realistic software with reasonable
performance.
Towards these goals, we implemented a general purpose Java library for Map data structures using
the technique presented in Section 3.5 (see also Section 3.7). In all cases, in which our library is used,
the calls to the mayUse operations have been automatically inferred by our static algorithm.
Methodology For all performance benchmarks (except the GossipRouter), we follow the performance
evaluation methodology of Herlihy et al. [50], and consider the clients under different workloads. To
ensure consistent and accurate results, each experiment consists of eight passes; the first three passes
warms up the VM and the five other passes are timed. Each experiment was run four times and the
arithmetic average of the throughput is reported as the final result. Every pass of the test program
consists of each thread performing one million randomly chosen operations on a client code6 .
We used a Sun SPARC enterprise T5140 server machine running Solaris 10 — this is a 2-chip
Niagara system in which each chip has 8 cores (the machine’s hyperthreading was disabled).
3.6.1
Applicability and Precision Of The Static Analysis
We applied our static analysis (from Section 3.4) to 58 Java code sections (composite operations)
from [79] that manipulate Maps (taken from open-source projects). Each composite operation has (at
least) two calls to Map procedures.
For 18 composite operations, our implementation reported (warnings) about procedure invocations
that did not belong to the library (see Section 3.4.3) — we manually verified that these invocations
are pure7 (so they can be seen as thread-local operations). A summary of this experiment is shown in
6
Note that in this section a thread is a Java thread. This is in contrast to our formal model in which a thread is a compositeoperation. See Section 3.5.5.
7
We have found that the purity of the invoked procedures is obvious, and can be verified by existing static algorithms such
as [82].
3.6. E XPERIMENTAL E VALUATION
73
#
Source Application
Nodes
mayUse
#
Source Application
Nodes
mayUse
1
Adobe BlazeDS
30
2
30
Hudson
27
2
2
Adobe BlazeDS
30
2
31
ifw2
29
2
3
Apache Cassandra*
45
2
32
ifw2
29
2
4
Apache MyFaces Trinidad
20
2
33
Jack4J*
61
2
5
Apache ServiceMix
27
2
34
JBoss
29
2
6
Apache Tomcat*
66
6
35
Jetty
29
2
7
Apache Tomcat*
63
6
36
Jetty
26
2
8
Apache Tomcat
17
4
37
Jetty
26
2
9
Apache Tomcat
22
2
38
Jexin
26
2
10
Apache Tomcats
21
2
39
Keyczar
30
2
11
autoandroid*
56
2
40
memcache-client
22
4
12
Beanlib*
62
3
41
OpenEJB*
44
2
13
Clojure
30
2
42
OpenJDK
30
2
14
Cometdim
26
2
43
Tammi*
30
2
15
DWR
27
2
44
Tammi
33
2
16
dyuproject
29
4
45
Tammi
46
4
17
ehcache-spring-annotation*
27
2
46
Tammi
32
5
18
Ektorp
43
5
47
ProjectTrack
26
2
19
Flexive*
23
2
48
ProjectTrack
28
2
20
Flexive
29
4
49
RestEasy
27
4
21
Flexive*
19
2
50
RestEasy*
29
2
22
GlassFish*
30
2
51
RestEasy
27
2
23
GlassFish*
34
2
52
RestEasy
25
2
24
Granite*
48
4
53
RestEasy
26
2
25
Gridkit*
37
2
54
retrotranslator*
34
4
26
Gridkit
37
2
55
Torque-spring
23
2
27
GWTEventService
14
2
56
Yasca
27
2
28
GWTEventService*
19
2
57
OpenJDK
26
4
29
Hazelcast
21
2
58
OpenJDK
48
4
Figure 3.15: Composite operations from [79]. For each composite operation we mention its source
application, the number of CFG nodes , and the number of mayUse operations inferred by our algorithm.
We use * to denote cases in which we manually verified that non-library operations are Pure.
Figure 3.15.
Surprisingly, in spite of its simplicity, the algorithm inferred ”optimal” mayUse operations in the
following sense: in all cases we were not able to correctly add mayUse operations such that for some
program locations the set of allowed operations will be smaller (during an execution).
3.6.2
Comparison To Hand-Crafted Implementations
We selected several composite operations over a single Map: the computeIfAbsent pattern [3], and a few
other common composite Map operations (that are supported by ConcurrentHashMap [4]). For these
composite operations, we compare the performance of our approach to a synchronization implemented
by experts in the field.
74
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
operations/mllisecond
Global Lock
Ours
Manual
ConcurrentHashMapV8
4800
2400
1200
600
300
1
2
4
8
16
Threads
Figure 3.16: ComputeIfAbsent: throughput as a function of the number of threads.
ComputeIfAbsent The ComputeIfAbsent pattern appears in many Java applications. Many bugs in Java
programs are caused by non-atomic realizations of this simple pattern (see [79]). It can be described
with the following pseudo-code:
if(!map.containsKey(key)) {
value = ... // pure computation
map.put(key, value);
}
The idea is to compute a value and store it in a Map, if and only if, the given key is not already
present in the Map. We chose this benchmark because there exists a new version of Java Map, called
ConcurrentHashMapV8, with a procedure that gets the computation as a parameter (i.e., a function is
passed as a parameter), and atomically executes the pattern’s code [3].
We compare four implementations of this pattern: (i) an implementation which is based on a global
lock; (ii) an implementation which is based on our approach; (iii) an implementation which is based on
ConcurrentHashMapV8; (iv) an implementation which is based on hand-crafted fine-grained locking8 .
The computation was emulated by allocating a relatively-large Java object (∼ 128 bytes).
The results are shown in Figure 3.16. As expected, the global lock implementation does not scale
with the number of threads. Note that, the ConcurrentHashMapV8 performs well for 1 and 2 threads,
this can be explained by the fact that the pattern is directly implemented inside the data structure (and
it only scans the data structure once). We are encouraged by the fact that our approach provides better
performance than ConcurrentHashMapV8 for at least 8 threads; and also, it is (at most) 25% slower
than the hand-crafted fine-grained locking.
Common Composite Map Operations We studying common composite operations over a single
Map. The composite operations we consider here are implemented in the Java’s ConcurrentHashMap [4].
8
we used lock stripping, similar to [49], with 32 locks; this is an attempt to estimate the benefits of manual hand-crafted
synchronization without changing the underlying library.
3.6. E XPERIMENTAL E VALUATION
75
Global Lock
8000
8000
7000
6000
2000
6000
4000
1000
5000
2000
4000
3000
40% putIfAbsent
30% remove
30% replace
0
1
2
4000
4
8
16
Ours
ConcurrentHashMap
4000
70% contains
10% putIfAbsent
10% remove
10% replace
3000
50% contains
50% putIfAbsent
2000
1000
0
1
2
4
8
16
0
1
2
4
8
16
3000
Figure 3.17:
2000
Composite
Map Operations: throughput (operations/millisecond) as a function of the
1000
number of threads (1-16).
0
1
2
4
8
16
Java’s ConcurrentHashMap[4] implementation provides the following composite operations:
• putIfAbsent(K key, V value) — adds a (key,value) into the map only if the key is not
already present. Implemented using the basic operations contains and put.
• replace(K key, V value) — replaces the value for a key only if it is currently mapped to
some value. Implemented using the basic operations contains and put.
• remove(K key, V value) — remove entry for key from the map only if it is currently mapped
to the given value. Implemented using the basic operations get and remove(K key).
The ConcurrentHashMap documentation for each of the above procedures provides a pseudo-code
in which the procedure is expressed in terms of basic Map operations. We based our implementation of
composite operations directly on these pseudo-code fragments.
We compare four implementations of these operations: (i) an implementation which is based on a
global lock; (ii) an implementation which is based on our approach; (iii) the actual implementation of
ConcurrentHashMap [4]. The results, for three workloads, are shown in Figure 3.17.
3.6.3
Evaluating The Approach On Realistic Software
We applied our approach to three benchmarks with multiple Maps — in these benchmarks, several Maps
are simultaneously manipulated by the composite operations. We used the Graph benchmark [49],
Tomcat’s Cache [5], and a multi-threaded application GossipRouter [10]. In these benchmarks, we
compare the performance to coarse grained locking.
Graph This benchmark is based on a Java implementation of the Graph that has been used for the
evaluation of [49]. The Graph consists of four composite operations: find successors, find predecessors,
insert edge, and remove edge. Its implementation uses two Map data structures in which several different
values can be associated with the same key (such type of Maps is supported by our library; also [8]
contains an example for such type of Maps).
76
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
Global Lock
1200
1000
800
600
400
200
0
4500
Graph
70% Find successors
20% Insert
10% Remove
2
4
8
1500
1000
3500
200
3000
100
Graph
45% Find successors
45% Find predecessors
9% Insert
1% Remove
1500
Graph
50% Insert
50% Remove
1000
35% Find successors
35% Find predecessors
20% Insert
10% Remove
500
0
0
16
2000
1
2
800
500
600
0
8
16
2
4
8
16
2
Cache
800
90% Get, 10% Put
Size=50K
1
4
8
16
8
16
(c)
600
2
4
Cache
90% Get, 10% Put
Size=5000K
400
8
16
200
0
1
1
1000
400
200
500
4
(b)
1000
0
Ours (w/o optimization)
Graph
1500
(a)
2000
300
2500
1
2500
4000
Ours
0
1
(d)
2
4
8
16
(e)
1
2
4
(f)
Figure 3.18: Graph and Cache: throughput (operations/millisecond) as a function of the number of
threads (1-16).
We compare a synchronization which is based on a global lock, and a synchronization which is
based on our approach. We use the workloads from [49]. The results are shown in Figure 3.18(a)–(d).
For some of the workloads, we see that there is a drop of performance between 8 and 16 threads.
This can be explained by the fact that each chip of the machine has 8 cores, so using 16 threads requires
using both chips (this creates more overhead).
Tomcat’s Cache This benchmark is based on a Java implementation of Tomcat’s cache [5]. This cache
uses two types of Maps which are supported by our library: a standard Map, and a weak Map (see [6]).
The cache consists of two composite operations Put and Get which manipulate the internal Maps. In
this cache, the Get is not a read-only operation (in some cases, it copies an element from one Map
to another). The cache gets a parameter (size) which is used by its algorithm. Figure 3.18(e) and
Figure 3.18(f) show results for two workloads.
GossipRouter The GossipRouter is a Java multi-threaded routing service from [10]. Its main state is a
routing table which consists of several Map data structures. (The exact number of Maps is dynamically
determined).
We use a version of the router (”3.1.0.Alpha3”) with several bugs that are caused by an inadequate
synchronization in the code that access the routing table. We have manually identified all code sections
that access the routing table as atomic sections; and verified (manually) that: whenever these code
sections are executed atomically, the known bugs do not occurred.
We compare two ways to enforce atomicity of the code sections: a synchronization which is based on
a global lock, and a synchronization which is based on our approach. We used a performance tester from
3.7. JAVA I MPLEMENTATION OF THE T RANSACTIONAL M APS L IBRARY
Messages/Second
Global Lock
1200
800
400
0
77
Ours
5000 Messages per client
16 Clients
1
2
Cores
4
8
16
Figure 3.19: GossipRouter: throughput as a function of the number of active cores.
[10] (called MPerf ) to simulate 16 clients where each client sends 5000 messages. In this experiment
the number of threads cannot be controlled from the outside (because the threads are autonomously
managed by the router). Therefore, instead of changing the number of threads, we changed the number
of active cores. The results are shown in Figure 3.19.
For the router’s code, our static analysis has reported (warnings) about invocations of procedures
that do not belong to the Maps library. Interestingly, these procedures perform I/O and do not violate
atomicity of the code sections. Specifically, they perform two types of I/O operations: logging operation
(used to print debug messages), and operations that send network messages (to the router’s clients). They
do not violate the atomicity of the atomic sections, because they are not used to communicate between
the threads (from the perspective of our formal model, they can be seen as thread-local operations).
3.7
Java Implementation of the Transactional Maps Library
For our evaluation, we have implemented a single Java library of Maps by using the technique described
in Section 3.5.
Compatible API. In order to simplify the evaluation (on existing Java code), we have created an API
which is similar to the existing Java Maps (see Section 3.7.4). The (existing) code sections were converted to use our library by changing the Java types of the Maps.
(For example, ”ConcurrentHashMap<K,V>” was changed to ”CLIBConcurrentHashMap<K,V>”).
3.7.1
Base Library
We have implemented the base library9 by wrapping several existing implementations of concurrent
Maps. All procedures of the base library are linearizable. We have used the following Map types:
• Standard Map (similar to [4], the implementation was taken from [8]).
• Weak Map (this Map is an efficient variant of [6], the implementation was also taken from [8]).
9
this library does not have mayUse procedures.
78
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
• Multi Map — in this Map each key may be associated with multiple values10 . We have manually
created a linearizable implementation of this Map by using Java locks and a standard Map.
For each Map type T , we have created a procedure that allocates Maps of type T .
3.7.2
Extended Library
MayUse procedures The extended library has the following mayUse procedures:
void mayUseAll();
void mayUseKey(Object key);
void mayUseKey(Object key1, Object key2);
void mayUseMap(Map map);
void mayUseNone();
void mayUseAllRO();
void mayUseKeyRO(Object key);
void mayUseKeyRO(Object key1, Object key2);
void mayUseMapRO(Map map);
An invocation of mayUseAll() is associated with all the library operations. An invocation of
mayUseKey(k) is associated with all operations on key k (e.g., an operation that puts a value in an
element with key = k) 11 . An invocation of mayUseKey(k1,k2) is associated with all operations on
key k1 or key k2. An invocation of mayUseMap(m) is associated with all operations on map m. An
invocation of mayUseNone() is associated with an empty set of operations.
The other mayUse operations are called read-only (their procedure names end with ”RO”) — for
each mayUse operation m (except mayUseNone()) we have a read-only mayUse operation mR which
is associated with m’s operations that are read-only. (If the name of m is p(...) , then the name of
mR is pRO(...)).
Parameters We have used a locking-tree which is similar to the example shown in Section 3.5.1 — in
our implementation the structure of the locking-tree is configurable. In our experiments we have used a
locking-tree in which the root has 2 children (denoted by M 0 and M 1 ), and each M i has 32 children.
The lock-mapping P is defined according to the above description of the mayUse operations. The set R
(discussed in Section 3.5.4) is the set of all read-only operations.
Hash-Function In contrast to Section 3.5.1 our Maps are represented by Java objects (and not by integers), so we use a hash-function that maps Java objects to integers (we use the hash-function before
applying the modulus operator).
10
11
Similar to com.google.common.collect.Multimap from [8]
We are using the fact that operations on different keys are always commutative.
3.7. JAVA I MPLEMENTATION OF THE T RANSACTIONAL M APS L IBRARY
3.7.3
79
Utilizing Dynamic Information by Handcrafted Optimization
In order to use the optimization ”Utilizing the State of the Base Library” (Section 3.5.2), we have
manually changed the following procedures
• boolean isEmpty(Map m): returns true IFF Map m is empty
• boolean sizeLargerThan(Map m,int n): returns true IFF Map m contains more than n
elements
• boolean sizeSmallerThan(Map m,int n): returns true IFF Map m contains less than n
elements
The procedure isEmpty was changed as demonstrated in Section 3.5.2. The following pseudo-code
describes the optimization for sizeLargerThan (sizeSmallerThan is similar):
bool sizeLargerThan(Map m,int n) {
wait until M i is not held by another thread
int s = baseLibrary.size(m);
if(n < s-C) return false;// C is equal to the number nodes reachable from M i
if(n > s+C) return true; // at most C threads may precede the current thread
... // the remaining code of sizeLargerThan
In our performance evaluation (Section 3.6), this optimization (only) affects the performance of two
benchmarks: the Cache and the GossipRouter. For the Cache benchmark the average throughput improvement is 27%, and the maximal throughput improvement is 60%. For the GossipRouter benchmark
the average throughput improvement is 3%, and the maximal throughput improvement is 9%.
3.7.4
API Adapter
Our library is realized as a Java class (”TLibrary”) — its ”procedures” are static methods of this class.
In order to create API which is similar to Java’s API, we added a way to represent Maps as Java objects.
We use the Adapter design pattern [39].
For each Map type T , we created a Java class T such that: (1) the constructor of Map T creates a
Map (using a procedure of our library) and store a reference to this Map; (2) other methods of T, call to
the corresponding procedure of the library.
For example, a constructor of class Map is implemented by: {this.myMap=TLibrary.createMap();}
and the method isEmpty() is implemented by: {return TLibrary.isEmpty(this.myMap);}
80
C HAPTER 3. T RANSACTIONAL L IBRARIES WITH F ORESIGHT
Chapter 4
Composition of Transactional Libraries
via Semantic Locking
The approach presented in Chapter 3 is designed to handle programs in which the composite operations
use a single transactional library; the transactional library has to support all abstract data types (ADT)
that may be concurrently used by the program’s threads. In this chapter, we present an approach for
composite operations that simultaneously use multiple transactional libraries.
We focus on a special case of transactional libraries: libraries with semantic-locking (see Section 3.3.7). In this chapter, the mayUse operations of these libraries are called locking operations
since their meaning resembles the meaning of the common locking operations (lock-acquire and
lock-release).
In order to handle multiple libraries, we assume that each ADT instance is a single independent
library (with semantic-locking). The locking operations of the ADTs are used to enforce atomicity of
the composite operations while exploiting semantic properties of the ADTs.
The Problem We consider a Java multi-threaded program (also referred to as a client), which makes
use of several ADTs (libraries) with semantic-locking. We assume that the only mutable state shared
by multiple threads are instances of these ADTs. We permit atomic sections as a language construct:
a block of code may be marked as an atomic section (as demonstrated in Figure 4.1). An execution
of an atomic section is called a transaction. Our goal is to ensure that transactions appear to execute
atomically and make progress (avoiding deadlocks), while exploiting the semantic-locking provided by
the ADTs.
Automatic Atomicity We present a compiler that automatically infers calls to semantic-locking operations. Given a client program (without calls to semantic locking operations) and a specification of
the semantic-locking operations of the ADTs used by the client, the compiler inserts invocations of
semantic locking operations into the atomic sections in the client program to guarantee atomicity and
81
82
C HAPTER 4. C OMPOSITION OF T RANSACTIONAL L IBRARIES VIA S EMANTIC L OCKING
1 atomic {
2
set=map.get(id);
3
if(set==null) {
4
set=new Set(); map.put(id, set);
5
}
6
set.add(x);
7
if(flag) {
8
queue.enqueue(set);
9
map.remove(id);
10
}
11 }
Figure 4.1: An atomic section that manipulates several ADTs (a Map, a Set, and a Queue). This example
is inspired by the code of Intruder [26], further discussed in Section 4.5.
1 atomic {
2
map.lockKey(id); set=map.get(id);
3
if(set==null) {
4
set=new Set(); map.put(id, set);
5
}
6
set.lockAdd(); set.add(x);
7
if(flag) {
8
queue.lockAll(); queue.enqueue(set); queue.unlockAll();
9
map.remove(id);
10
}
11
map.unlockAll(); set.unlockAll();
12 }
Figure 4.2: The atomic section of Figure 4.1 with calls to semantic locking operations automatically
inserted by our compiler.
83
deadlock-freedom of these atomic sections. The synchronization generated by our compiler follows a
semantics-aware variant of the two-phase locking protocol (see [20, chapter 3.8] and [86, chapters 6–7]).
Example 4.0.1 Figure 4.1 shows an example for an atomic section which is given to our compiler. This
code section manipulates several ADTs (a Map, a Set, and a Queue), and does not contain calls to their
semantic-locking operations. Figure 4.2 shows the result of applying our compiler to the atomic section
of Figure 4.1.
Pointers and Limitations Our compiler handles programs in which pointers to ADT instances are dynamically manipulated; these programs are allowed to work with an unbounded number of ADT objects.
For some of these programs, our compiler is unable to ensure deadlock-freedom by using only the semantic locking operations of the ADTs. These programs are handled using an additional specialized
coarse-grained synchronization. However, our experimental evaluation (Section 4.5) shows that our
compiler creates effective synchronization that benefits from semantic locking even in a program in
which coarse-grained synchronization is used.
The main differences between this chapter and Chapter 3 can be described as follows.
• Chapter 3 focuses on a more restrictive version of the problem addressed here: namely, atomic
sections in which the shared state consists of ADTs belonging to a single ADT library (with a
single centralized foresight-based synchronization). In contrast, in this chapter we permit the
ADTs to belong to different libraries: this enables using libraries that have been implemented
independently.
• The protocol in Chapter 3 is incomparable to the protocol used in this chapter. In some ways the
protocol in Chapter 3 is more permissive, because it is able to exploit properties like dynamiccommutativity and dynamic-right-mover (in Section 3.1 we show an example in which dynamicright-movers are essential to achieve parallelism). In other ways the protocol in Chapter 3 is
less permissive, because it requires all the semantic locks to be obtained at the beginning of the
atomic section as a result, the implementation technique presented in Section 3.5 is not able to
provide parallelism for transactions that may use operations which are not right-mover with each
other (the limited performance of such transactions is demonstrated in Section 4.5 by the Intruder
example).
Moreover, the approach of Chapter 3 is implemented as a semantic-based variant of the tree
locking protocol (see Section 3.5), whereas in this chapter we utilize a semantic-based variant of
the two-phase locking protocol.
84
C HAPTER 4. C OMPOSITION OF T RANSACTIONAL L IBRARIES VIA S EMANTIC L OCKING
• We believe that in chapter the required internal library synchronization is simpler than the one
used in Chapter 3 — mainly becuause it is based on commutativity rather than on right-movers.
[43] shows that realizing the synchronization discussed in this chapter is relatively simple (by
using a semi-automatic static algorithm).
• In this chapter the compiler aims to statically avoid synchronization between transactions that
accesses unrelated libraries/ADTs (by using points-to analysis). This is in contrast to Chapter 3
in which the shared single library is responsible to dynamically handle all transactions.
4.1
Semantic Locking
In this section we describe the terminology that is used in this chapter, present our methodology for
realizing atomic sections using semantic locking, and formalize the problem addressed in the subsequent
sections.
4.1.1
Basics
Clients A client is a concurrent program that satisfies the following restrictions. All state shared by
multiple threads is encapsulated as a collection of ADT instances. (The notion of an ADT is formalized
later.) The shared mutable state is accessed only via ADT operations. The language provides support
for atomic sections: an atomic section is simply a block of sequential code prefixed by the keyword
atomic. Shared state can be accessed or modified only within an atomic section. We use the term
transaction to refer to the execution of an atomic section within an execution.
In the simpler setting, a client is a whole program (excluding the ADT implementations). More
generally, a client can be a module or simply a set of atomic sections. However, we assume that all
atomic sections accessing the shared state are available.
ADTs An abstract data type (ADT) encapsulates state accessed and modified via a set of methods.
Statically, it consists of an interface (also referred to as its API) and a class that implements the interface.
The implementation is assumed to be linearizable [55] with respect to the sequential specification of the
ADT. We also assume that its object constructor is a pure-method.
We will use the term ADT instance to refer to a runtime object that is an instance of the ADT class.
We will abbreviate “ADT instance” to just ADT if no confusion is likely. Two different ADT instances
can have no shared state. Every ADT instance is assumed to have an unique identifier (such as its
memory address).
As in Chapter 3, we use the term operation to denote a tuple consisting of an ADT method name
m and runtime values v1 , · · · , vn for m’s arguments (not including the ADT instance on which the
operation is performed), written as (m, v1 , . . . , vn ) . An operation represents an invocation of a method
4.1. S EMANTIC L OCKING
85
on an ADT instance (at runtime). For example, we write (add,7) to denote the operation that represents
an invocation of the method add with the argument 7.
Methodology Our goal is to realize an implementation of the atomic sections in clients. Our approach
decomposes the responsibility for this task into two parts: one to be realized by the ADT (library)
implementations and one to be realized by the compiler (on behalf of the client). This lets us exploit
ADT-specific semantic locking that can take advantage of the semantics of ADT operations (to achieve
more parallelism and fine-grained concurrency).
4.1.2
ADTs With Semantic Locking
Synchronization API The basic idea is to utilize locks on operations (as opposed to locks on data).
Specifically, every ADT must provide a synchronization API, in addition to its standard API, that allows
a transaction to acquire (and release) operations (on ADT instances).
We refer to an operation (m, v1 , · · · , vn ) as a locking operation if method m belongs to the synchronization API. We refer to it as a standard operation if method m belongs to the standard API. A locking
operation l is meant to be used (by a client transaction) to acquire (permission to invoke) certain standard operations. Thus, we may think of l as corresponding to a lock on a set of standard operations (on
the corresponding ADT instance). (Notice that, a locking operation is essentially a mayUse operation
as defined in Chapter 3.)
Example 4.1.1 Figure 4.3 presents the API for a Set-of-integers ADT. The figure presents both the
standard API and the synchronization API. The semantics of the standard API operations is the usual
one. The locking operations are used to lock (and unlock) standard operations. Consider the following
4 sets of standard operations (where we write N to denote the set of integers):
La = {(add, v), (remove, v), (contains, v), (size), (clear) | v ∈ N }
Lb = {(size), (contains, v) | v ∈ N }
Lc = {(add, v) | v ∈ N }
Ld = {(add, 7), (remove, 7), (contains, 7)}
The set La can be locked by calling lockAll(); The set Lb can be locked by calling lockReadOnly();
the set Lc can be locked by calling lockAdd(); and the set Ld can be locked by calling lockValue(7).
Note that the methods lockAll(), lockReadOnly() and lockAdd() do not have arguments,
each one of them is used to lock a constant set of operations. The lockValue method has an argument
which is used to determined the set locked by its invocation. For any integer v, the set
{(add, v), (remove, v), (contains, v)} can be locked by calling lockValue(v).
The locking operations are not meant to be called directly by the client code. Instead, our compiler
will insert calls to these operations while compiling atomic sections. To enable this, we require the ADT
86
C HAPTER 4. C OMPOSITION OF T RANSACTIONAL L IBRARIES VIA S EMANTIC L OCKING
// Standard API
// Synchronization API
void add(int i);
void lockAll();
void remove(int i);
void lockReadOnly();
boolean contains(int i);
void lockAdd();
int size();
void lockValue(int i);
void clear();
void unlockAll();
Figure 4.3: API of a Set with semantic locking.
interface to declare for each locking method the set of operations it corresponds to using the annotation
language from Section 3.4.1. We also assume that each ADT has a method (without arguments) that
locks all its standard operations (and this method is called ”lockAll”); and a method (without arguments) that unlocks all the ADT operations that are held by the current transaction (and this method is
called ”unlockAll”).
Requirements From ADTs We now describe the semantic guarantees that the ADT (implementation) is
required to satisfy, specifically with regards to the synchronization API. We first formalize the notion of
commutativity of operations. Two operations are said to be commutative if applying them to the same
ADT instance in either order leads to the same final ADT state (and returns the same response). For
example, the operations (add,7) and (remove,7) are not commutative; in contrast, the operations
(add,7) and (remove,10) are commutative. (Since there is no shared state between different ADT
instances, operations on different ADT instances always commute.)
Every ADT implementation is required to satisfy the following guarantee: no two threads are allowed to concurrently hold locks on non-commuting operations (on the same ADT instance). Specifically, if a thread t holds locks on the operations in the set Ot for an ADT instance A and, at the same
time, a different thread t0 holds locks on the operations in the set Ot0 for the same ADT instance A, then
every operation in Ot must commute with every operation in Ot0 .
This means that the implementation (of the ADT’s synchronization methods) must block, whenever
necessary, to ensure the above requirement. That is, if a thread t holds locks on the operations in Ot ,
and a thread t0 tries to lock the operations in Ot0 where some operation in Ot does not commute with
some operation in Ot0 , then t0 waits (blocked) until it is legal (for t0 ) to hold locks on all operations in
Ot0 .
Furthermore, the only role of the locking operations is to enforce concurrency control as described
above. In particular, the locking operations are required to not have any effect on the standard ADT
state (i.e., the specification of the standard operations).
Finally, in order to ensure progress we require that: (i) locking operations on an ADT A are never
blocked when no thread holds locks on A’s operations; and (ii) standard operations and unlockAll()
are never blocked.
4.1. S EMANTIC L OCKING
87
void f(Set x, Set y) {
atomic {
x.lockReadOnly();
int i = x.size();
y.lockAdd(); x.unlockAll();
y.add(i);
y.unlockAll();
}
}
Figure 4.4: Code that follows the S2PL protocol.
Example 4.1.2 Consider example 4.1.1. Here, a thread should not be allowed hold a lock on the set
Lb while another thread holds a lock on Lc , because (for example) size does not commute with the
add operations. However, it is legal to permit a thread to hold a lock on Lc , while another thread
holds a lock on Lc , because add operations commute with each other. Similarly, it is legal to permit Ld
and {(add, 1), (remove, 1), (contains, 1)} to be simultaneously locked by different threads, because,
according to the Set semantics, operations on different values commute.
Notice that an ADT instance that satisfies the above requirements, also stratifies Definition 3.15
(Section 3.3.7), and the progress condition (Section 3.3.6). Therefore, each ADT instance can be seen a
single transactional library.
4.1.3
Automatic Atomicity
We now describe how atomic sections in a client can be automatically realized by a compiler using the
semantic locking capabilities provided by the underlying ADTs.
The S2PL Protocol Our synchronization is based on a semantics-aware two-phase locking protocol [20]
(S2PL). We say that an execution π follows S2PL, if the following conditions are satisfied by each
transaction t in π:
(C1) t invokes a standard operation p of an ADT instance A, only if t currently holds a lock on operation
p of A.
(C2) t locks operations only if t has never unlocked operations.
An execution that satisfies S2PL is a serializable execution [20] — i.e., it is equivalent to an execution in
which no two transactions are interleaved. Therefore, a serializable execution in which all transactions
are completed can be seen as an execution in which all transactions are executed atomically.
Example 4.1.3 Consider a transaction t that executes ”f (s1 , s2 )” where f is the function shown in
Figure 4.4, and s1 , s2 are two different Sets. This transaction follows the S2PL rules.
88
C HAPTER 4. C OMPOSITION OF T RANSACTIONAL L IBRARIES VIA S EMANTIC L OCKING
The S2PL protocol enables substantial parallelism. Consider two transactions t and t0 that execute
”f (s1 , s2 )”. In this case, all operations locked by t commute with all operations locked by t0 (even
though they work on the same ADT instances). Hence, it is possible for the two transactions to run in
parallel, while guaranteeing serializability.
Notice that the first condition of the S2PL protocol is equivalent to the client protocol from Chapter 3
(when considering a single ADT instance).
The OS2PL Protocol The S2PL protocol does not guarantee deadlock-freedom — in order to avoid
deadlocks we use the Ordered S2PL Protocol (OS2PL). We say that an execution follows the OS2PL
protocol if the execution follows S2PL and satisfies the following additional condition:
(C3) There exists an irreflexive and transitive relation @ on ADT instances such that if a transaction t
locks operations of ADT instance A after it locks operations of ADT instance A0 , then A0 @ A.
The rule requires that ADT operations be locked according to a consistent order on the ADT instances.
Notice that A and A0 may represent the same ADT instance in the above rule. Hence, the rule implies
that a transaction should not invoke multiple locking operations on the same ADT instance. Following
this rule ensures that an execution cannot reach a deadlock caused by the locking provided by the ADTs.
4.2
Automatic Atomicity Enforcement
In this section, we present an algorithm for compiling atomic sections. The algorithm inserts semantic
locking operations into the atomic section to ensure that every transaction follows Ordered S2PL. This
algorithm uses only the locking operations lockAll() and unlockAll() of each ADT. Figure 4.15
shows the code produced by our algorithm for the atomic section presented in Figure 4.1. In essence, this
algorithm uses a locking granularity at the ADT instance level: two transactions cannot concurrently
invoke operations on the same ADT. In Section 4.3, we present a refined algorithm that exploits the
specialized locking operations of the ADTs (such as lockAdd() and lockValue(7)), permitting more
fine-grained concurrency.
The algorithm we describe here is a simple one, whose correctness is easy to establish. We improve
the results produced by this simple algorithm using a series of optimizations.
In the sequel, we say that an ADT instance A is locked by transaction t if the operations of A are
locked by t.
4.2.1
Enforcing S2PL
Ensuring that all transactions follow S2PL is relatively straightforward. For every statement x.f(...)
in the atomic section that invokes an ADT method, we insert code, just before the statement, to lock the
ADT instance that x points to, unless it has already been locked by the current transaction. At the end of
4.2. AUTOMATIC ATOMICITY E NFORCEMENT
89
1 LV(x) {
2
if(x!=null && !LOCAL_SET.contains(x)) {
3
x.lockAll();
4
LOCAL_SET.add(x);
5 }}
Figure 4.5: Code macro with the locking code.
void f(Set x, Set y) { atomic {
LOCAL SET.init(); // prologue
LV(x); int i = x.size();
LV(y); y.add(i);
foreach(t :
LOCAL SET) t.unlockAll(); // epilogue
}}
Figure 4.6: Atomic section that follows the S2PL protocol.
the atomic section, we insert code to unlock all ADT instances that have been locked by the transaction.
We achieve this as follows.
Locked Objects Each transaction uses a private set, denoted LOCAL SET, to keep track of all ADT
instances that it has currently locked. This set is used to avoid locking the same ADT multiple times
and to make sure that all ADTs are eventually unlocked.
Prologue and Epilogue At the beginning of each atomic section, we add code that initializes LOCAL SET
to be empty. At the end of each atomic section, we add code that iterates over all ADTs in the
LOCAL SET, and invokes their unlockAll operations.
Locking Code We utilize the macro LV(x) shown in Figure 4.5 to lock the ADT instance pointed to by
a variable x. The macro locks the object pointed by x and adds it to LOCAL SET. It has no impact when
x is null or points to an object that has already been locked by the current transaction.
Figure 4.6 shows an example for an atomic section with inserted locking code that ensures the S2PL
protocol.
4.2.2
Lock Ordering Constraints
The basic idea sketched above does not ensure that all transactions lock ADTs in a consistent order.
Hence, it is possible for the transactions to deadlock. We now describe an extension of the algorithm
that statically identifies a suitable ordering on ADT instances and then inserts locking code to ensure
that ADT instances are locked in this order.
We first describe the restrictions-graph, a data structure that captures constraints on the order in
which ADT instances can be locked. We utilize this graph to determine the order in which the locking
90
C HAPTER 4. C OMPOSITION OF T RANSACTIONAL L IBRARIES VIA S EMANTIC L OCKING
1 void g(Map m, int key1, int key2, Queue q) {
2 atomic {
3
Set s1 = m.get(key1);
4
Set s2 = m.get(key2);
5
if(s1!=null && s2!=null) {
6
s1.add(1);
7
s2.add(2);
8
q.enqueue(s1);
9
}
10 }}
Figure 4.7: Atomic section that manipulates a Map, a Queue, and two Sets.
operations are invoked.
A Static Finite Abstraction At runtime, an execution of the client program can create an unbounded
number of ADT instances. Our algorithm is parameterized by a static finite abstraction of the set of ADT
instances that the client program can create at runtime. Let PVar denotes the set of pointer variables that
appear in the atomic code sections. The abstraction consists of an equivalence relation on PVar. For any
x ∈ PVar, let [x] denote the equivalence class that x belongs to. The semantic guarantees provided by
the abstraction are as follows. Any ADT instance created by any execution corresponds to exactly one
of the equivalence classes. Furthermore, at any point during an execution, any variable x is guaranteed
to be either null or point to an object represented by the equivalence class [x].
Note that the abstraction required above can be computed using any pointer analysis (e.g., see [65])
or simply using the static types of pointer variables. In our compiler, we utilize the points-to analysis
of [12] to compute this information. Note that even though various pointer analyses give more precise
information than that captured by the above abstraction, our implementation requires only this information.
Example 4.2.1 The atomic section in Figure 4.7 has 4 pointer variables (m, q, s1 and s2). The equivalence relation consisting of the three equivalence classes {m}, {q} and {s1, s2} is a correct abstraction
for this atomic section. This abstraction can be produced using static type information, without the need
for a whole program analysis.
The Restrictions-Graph Each node of the restrictions-graph represents an equivalence class in PVar
(and, hence, is a static representation of a set of ADT instances that may be created at runtime). An
edge u → v in the restrictions-graph conservatively indicates the possibility of an execution path along
which an ADT instance belonging to u must be locked before an ADT instance belonging to v (within
the same transaction). We identify these constraint edges as follows.
4.2. AUTOMATIC ATOMICITY
{q} E NFORCEMENT
{m}
{s1,s2}
91
{q}
{s1,s2}
Figure 4.8: Restrictions-graph for the atomic{m}
section in Figure
4.7.
1 atomic {
2 sum=0;
3 for(int i=0;i<n;i++) {
4
set = map.get(i);
5
if(set!=null) sum += set.size();
6 }}
Figure 4.9: Atomic section for which the restrictions-graph has a cycle.
We write l:
x.f(...) to denote a call to a method f via the variable x ∈ PVar at the program
location l. Consider an atomic section with two calls l:
x.f(...) and l’:
x’.f’(...) such
that location l’ is reachable from location l (in the CFG of the atomic section). Obviously, we need to
lock the object pointed by x before location l and to lock the object pointed by x’ before location l’.
Clearly, we can lock the object pointed to by x before we lock the object pointed to by x’. However,
when can we lock these two objects the other way around? If the value of x’ is never changed along
the path between l and l’, then the object pointed by x’ can be locked before l. However, if x’ is
assigned a value along the path between l and l’, then, in general, we may not be able to lock the
object pointed to by x’ (at location l’) before the object pointed to by x, as we may not know the
identity of the object to be locked. In such a case, we conservatively add an edge [x] → [x’] to the
restrictions-graph.
Example 4.2.2 In Figure 4.7, the object pointed by m should be locked before the object pointed by s1,
because the call s1.add(1) can only be executed after the call m.get(key1), and the value of s1 is
changed between these calls.
Example 4.2.3 Figure 4.8 shows a restrictions-graph for the atomic section in Figure 4.7. According to
this graph the objects pointed by m should be locked before objects pointed by s1 or s2. This is the only
restriction in the graph. For example, the graph does not restrict the order between the objects pointed
by m and the objects pointed by q. Moreover, it does not restrict the order between objects pointed by
s1 and the objects pointed by s2 (even though they are represented by the same node).
The calls in l and l’ can be the same call (i.e., l = l’). This is demonstrated in the atomic
section of Figure 4.9: the call set.size() is reachable from itself (because of the loop), and set can
be changed between two invocations of this call. A possible restrictions-graph is shown in Figure 4.10.
92
C HAPTER 4. C OMPOSITION OF T RANSACTIONAL L IBRARIES VIA S EMANTIC L OCKING
{map}
{set}
{q,queue}
Figure 4.10: Restrictions-graph for the atomic section in Figure 4.9.
{m}
{s1,s2,set}
{map}
{q,queue}
{m}
Figure 4.11: Restrictions-graph for two atomic sections: the section in Figure 4.1 and the section in
{s1,s2,set}
Figure 4.7.
{map}
The restrictions-graph is computed for a set of the atomic sections. We write G(S) to denote the
restrictions-graph for a set S of atomic sections. Figure 4.11 shows a restrictions-graph for the atomic
sections from Figure 4.1 and Figure 4.7.
4.2.3
Enforcing OS2PL on Acyclic Graphs
We now describe an algorithm to insert locking code into a set of atomic sections S to ensure that all
transactions (from S) follow the OS2PL protocol. This technique is applicable as long as the restrictionsgraph G(S) is acyclic. In Section 4.2.5, we show how we handle programs with cyclic restrictionsgraphs.
We sort the nodes in G(S) using a topological sort. This determines a total-order ≤ts on the equivalence classes. We define the relations < and ≤ on the variables in PVar as follows. We say that x < y
iff [x] <ts [y] and that x ≤ y iff [x] ≤ts [y]. Note that ≤ is only a preorder on PVar and not a total
order. Variables belonging to different equivalence classes are always ordered by <, whereas variables
belonging to the same equivalence class are never ordered by <.
The relation < is used to statically determine the order in which ADT instances belonging to different equivalent classes are to be locked. However, we cannot do the same for ADT instances belonging
to the same equivalence class. Instead, we dynamically determine the order in which ADT instances
belonging to the same equivalence class are locked, as described below.
Locking Code Insertion Consider any statement l:
x.f(...) in an atomic section that invokes an
ADT method. We define the set LS(l) to be the set of variables y such that
1. y ≤ x, and
2. There exists a (feasible) path, within the same atomic section, from l to some statement of the
form l’:
y.g(...), i.e., a statement that invokes an ADT method using y.
4.2. AUTOMATIC ATOMICITY E NFORCEMENT
93
1 LV2(x,y) {
2 if(unique(x)<unique(y)) {
3
LV(x); LV(y) ;
4 } else {
5
LV(y); LV(x) ;
6 }}
Figure 4.12: Locking two equivalent variables in a unique order .
1 void g(Map m, int key1, int key2, Queue q) {
2 atomic {
3
LOCAL SET.init(); // prologue
4
LV(m); Set s1 = m.get(key1);
5
LV(m); Set s2 = m.get(key2);
6
if(s1!=null && s2!=null) {
7
LV2(s1,s2); s1.add(1);
8
LV(s2); s2.add(2);
9
LV(q); q.enqueue(s1);
10
}
11
foreach(t :
LOCAL SET) t.unlockAll(); // epilogue
12 }}
Figure 4.13: The atomic section from Figure 4.7 with the non-optimized locking code . The locking
was created by using the order: m < s1,s2 < q.
The set LS(l) identifies the variables that we wish to lock before statement l. We insert locking code
for all variables in this set as follows:
• If y < y 0 , then the locking code for y is inserted before the locking code for y 0 .
• If y and y 0 are in the same equivalence class, the locking order is determined dynamically (since,
we do not calculate a static order for such variables). This is done by using unique object identifiers (e.g., their memory addresses). Let unique(y) denote the unique identifier of the object
pointed by y. These identifiers are used by the inserted code to (dynamically) determine the order
in which the variables are handled. Figure 4.12 demonstrates the case of two variables; in the
general case objects are sorted to obtain the proper order. (For simplicity, we have omitted the
handling of null pointers, which are straight-forward to handle.)
Figure 4.13 shows the atomic section of Figure 4.7 with the inserted code, and Figure 4.14 shows the
atomic section of Figure 4.1 with the inserted code. For both atomic sections, we used the graph in
Figure 4.11.
94
C HAPTER 4. C OMPOSITION OF T RANSACTIONAL L IBRARIES VIA S EMANTIC L OCKING
1 atomic {
2 LOCAL SET.init(); // prologue
3 LV(map); set=map.get(id);
4 if(set==null) {
5
set=new Set(); LV(map); map.put(id, set);
6}
7 LV(map); LV(set); set.add(x);
8 if(flag) {
9
10
LV(map); LV(queue); queue.enqueue(set);
LV(map); map.remove(id);
11 }
12 foreach(t :
LOCAL SET) t.unlockAll(); // epilogue
13 }
Figure 4.14: . The atomic section from Figure 4.1 with the non-optimized locking code . The locking
was created by using the order: map < set < queue.
4.2.4
Optimizations
The algorithm described above is simplistic in some respects. In this section we present a sequence of
code transformations whose goal is to reduce the overhead of the synthesized code and to increase the
parallelism permitted by the concurrency control. In particular, we present transformations that remove
inserted code that can be shown to be redundant, and transformations that move calls to unlockAll so
as to release locks on objects as early as possible (as determined by a static analysis). Figure 4.15 shows
the optimized version of Figure 4.14.
In the sequel, we use the term “path” to denote feasible execution paths within a single atomic
section.
Removing Redundant LV(x) In some cases, the code LV(x) inserted at a location l may be redundant.
Our compiler removes redundant instances of LV(x) by repeatedly using the following rules:
• If the object pointed to by x at l is locked along all (feasible) paths from the beginning of the
atomic section to l, then the code LV(x) has no effect at l and can be removed. For example, in
Figure 4.14, LV(map) can be removed from line 9 because the Map has already been locked.
• If the object pointed to by x at l is never used along any feasible path from l to the end of the
atomic section, then the code LV(x) is not required at l and can be removed.
Figure 4.16 shows the code from Figure 4.14 after removing redundant instance of LV(x).
Removing Redundant LOCAL SET Usage Our algorithm uses LOCAL SET to avoid locking the same
object multiple times and to ensure that all objects are unlocked before the end of the atomic section.
4.2. AUTOMATIC ATOMICITY E NFORCEMENT
95
1 atomic {
2 map.lockAll(); set=map.get(id);
3 if(set==null) {
4
set=new Set(); map.put(id, set);
5}
6 set.lockAll(); set.add(x);
7 if(flag) {
8
queue.lockAll(); queue.enqueue(set); queue.unlockAll();
9
map.remove(id);
10 }
11 map.unlockAll(); set.unlockAll();
12 }
Figure 4.15: The optimized version of Figure 4.14. Note that a large portion of the locking code is
removed, and the set LOCAL SET is not explicitly used. Also, the object pointed by queue is unlocked
before the end of the section.
1 atomic {
2 LOCAL SET.init(); // prologue
3 LV(map); set=map.get(id);
4 if(set==null) {
5
set=new Set(); map.put(id, set);
6}
7 LV(set); set.add(x);
8 if(flag) {
9
10
LV(queue); queue.enqueue(set);
map.remove(id);
11 }
12 foreach(t :
LOCAL SET) t.unlockAll(); // epilogue
13 }
Figure 4.16: The code from Figure 4.14 after removing redundant instance of LV(x).
96
C HAPTER 4. C OMPOSITION OF T RANSACTIONAL L IBRARIES VIA S EMANTIC L OCKING
1 atomic {
2 if(map!=null) map.lockAll(); set=map.get(id);
3 if(set==null) {
4
set=new Set(); map.put(id, set);
5}
6 if(set!=null) set.lockAll(); set.add(x);
7 if(flag) {
8
if(queue!=null) queue.lockAll(); queue.enqueue(set);
9
map.remove(id);
10 }
11 if(map!=null) map.unlockAll();
12 if(set!=null) set.unlockAll();
13 if(queue!=null) queue.unlockAll();
14 }
Figure 4.17: The code from Figure 4.16 after removing the code that uses LOCAL SET.
Often, we can achieve these goals without using LOCAL SET. LOCAL SET is not needed for a pointer
variable x if the following conditions hold:
(1) There is no path containing an occurrence of LV(x) and another occurrence of LV(y) where x
and y may point to the same object.
(2) The value of x is never modified along any path from an occurrence of LV(x) to the end of the
atomic section.
(3) The value of x is null at the end of any path to the end of the atomic section that contains no
occurrence of LV(x).
Because of (1) we know that re-locking is not possible; and because of (2) and (3) we know that we
can release all the objects used via x by calling to ”if(x!=null) x.unlockAll()” at the end of the
atomic section.
If the conditions are satisfied for x, we replace all instances of LV(x) with
if(x!=null) x.lockAll()
and, at the end of the section we insert the code
if(x!=null) x.unlockAll()
If, after all applications of the above transformation, the set LOCAL SET is not used for any variable
in an atomic section, we remove the set, and the corresponding prologue and epilogue. Figure 4.17
shows the code from Figure 4.16 after applying this optimization.
4.2. AUTOMATIC ATOMICITY E NFORCEMENT
97
1 atomic {
2 if(map!=null) map.lockAll(); set=map.get(id);
3 if(set==null) {
4
set=new Set(); map.put(id, set);
5}
6 if(set!=null) set.lockAll(); set.add(x);
7 if(flag) {
8
9
10
if(queue!=null) queue.lockAll(); queue.enqueue(set);
if(queue!=null) queue.unlockAll(); //this line was moved
map.remove(id);
11 }
12 if(map!=null) map.unlockAll();
13 if(set!=null) set.unlockAll();
14 }
Figure 4.18: The code from Figure 4.17 after moving an unlockAll operation.
Early Lock Release Our basic algorithm unlocks all objects at the end of the atomic section. In some
cases, it is possible to unlock some objects at an earlier program point (before the end of the atomic
section) without violating the locking protocol. We now describe the conditions under which we perform
such an early lock release.
It is safe to move any instance of “if(x!=null) x.unlockAll()” occurring at the end of the
atomic section to a program point l if the following conditions are satisfied:
(1) The object pointed to by x is not used between l and the end of the atomic section.
(2) No object is locked between l and the end of the atomic section.
(3) Every path to the end of the atomic section passes through l or ends with a null value for x.
The compiler tries to find the earliest program point l (as measured by the length of the shortest path
from the beginning of the atomic section to l) that satisfies these conditions. If such a point l is found,
we move the code “if(x!=null) x.unlockAll()” to location l. Because of (1) and (2), we know
that the protocol rules are not violated. Because of (3), we know that the relevant object will eventually
be unlocked.
Figure 4.18 shows the code from Figure 4.17 after applying this optimization. Note that, the unlocking code of ”queue” has been moved to line 9 of Figure 4.18.
Removing Redundant If-Statements In some cases, the inserted if-condition ”if(x!=null)” is not
needed. For any location l and variable x for which we can prove that x is never null at l, we remove
the condition “if(x!=null)” from l.
Figure 4.15 shows the code from Figure 4.18 after applying this optimization.
98
C HAPTER 4. C OMPOSITION OF T RANSACTIONAL L IBRARIES VIA S EMANTIC L OCKING
Choosing a Good Locking-Order When sorting the acyclic restrictions-graph using a topological sort
(Section 4.2.3) several orders are possible. The chosen order may affect the generated locking. For
example, for Figure 4.1’s code, if “queue” precedes “map” then the Queue will be locked before the
Map — as a result, it may act as a global lock. In order to find a good ordering, we consider several
possible orders of the restrictions-graph (using a variant ofthe topological sort algorithm in [83]).
Given an order, we count the number of variables y for which we insert locking code for y (e.g.,
LV (y)) before a statement x.f(...) where y 6≡ x (in this case y < x). Intuitively, this number
represents the number of “early lockings”. We choose a locking-order for which this number is minimal.
Note that this is only a heuristic. User hints or dynamic analysis (as in Autotuner [49]) can be
potentially used to find better orderings.
4.2.5
Handling Cycles via Coarse-Grained Locking
We now show how we can enforce atomicity and deadlock-freedom when the restrictions-graph G(S)
has cycles.
We say that an atomic section a is a serial-section, if G({a}) is cyclic; Figure 4.9 shows an example
for a serial-section.
The Idea The idea is to find n + 1 disjoint sets of atomic sections S1 , . . . , Sn and B such that: S =
S1 ∪ . . . ∪ Sn ∪ B and for every 1 ≤ i ≤ n the graph G(Si ) is acyclic. (Note that any serial-section
must be included in the set B).
For each set Si we enforce the locking protocol by using the technique in the previous sections
— hence, the protocol is independently enforced for every set Si . We further add coarse-grained synchronization to make sure that transactions executing atomic sections a and a0 are allowed to execute
concurrently iff there exists Si such that a, a0 ∈ Si . This ensures that all concurrent transactions always
lock ADT instances in the same order and, hence, follow OS2PL. Any atomic section in B is always
executed in isolation (hence, all the serial-sections are executed in isolation).
Simplified Version via Global Read/Write Lock In our compiler we have implemented a simplified
version in which we only find two sets S1 and B (i.e., n = 1), and use the technique in Section 4.2.3
on the atomic sections in S1 . If B is not empty, we use a global read/write lock (denoted by RW )
as follows. At the beginning of an atomic section in S1 we acquire RW in a read-mode; and at the
beginning of an atomic section in B we acquire RW in a write-mode. In both cases, we release RW
at the end of the atomic section. This ensures that atomic sections from S1 can be executed in parallel
(while following the protocol), and atomic sections from B cannot be executed in parallel.
Discussion Note that with the above modification, the generated code does not follow OS2PL. Instead,
it follows a generalization of the OS2PL protocol that is sufficient to guarantee both serializability as
well as deadlock-freedom.
4.3. U SING S PECIALIZED L OCKING O PERATIONS
99
void lockReadOnly(); @{(size),(contains,*)}
void lockAdd(); @{(add,*)}
void lockValue(int i); @{(add,i),(remove,i),(contains,i)}
Figure 4.19: Locking methods from Figure 4.3 and their locking specifications. Each method is annotated with a symbolic set that describes the operations locked by the method.
4.3
Using Specialized Locking Operations
The algorithm presented in Section 4.2 utilizes only the simple locking operations lockAll() and
unlockAll(). In this section, we describe how to extend the earlier algorithm to utilize more fine-
grained locking operations, such as lockAdd() and lockValue(1), which can be used to lock a subset
of an ADT’s operations. For example, this algorithm generates the locking code shown in Figure 4.2
(instead of the locking code in Figure 4.15).
Annotation Language We use the annotation language from Section 3.4.1. Each locking method
p (except lockAll and unlockAll) is annotated with a symbolic set. This symbolic set defines the
operations locked by an invocation of p (their meaning is identical to Section 3.4.1).
Example 4.3.1 The methods in Figure 4.19 are annotated with the following symbolic sets:
SY1 = {(size), (contains, ∗)}
SY2 = {(add , ∗)}
SY3 = {(add , i ), (remove, i ), (contains, i )}.
The method lockReadOnly() is annotated with SY1 ; hence, the locking operation (lockReadOnly)
locks all the operations with the methods size, and contains. The method lockAdd() is annotated
with SY2 ; hence the locking operation (lockAdd) locks all the operations with the method add. The
method lockValue(int i) is annotated with SY3 ; hence, for any integer v, the locking operation
(lockValue,v) locks the operations (add,v),(remove,v), and (contains,v).
Inferring Calls to Locking Operations Our algorithm for inferring the specialized locking operations
to be inserted into an atomic section consists of multiple steps.
In the first step, we analyze the atomic sections, to infer for every pointer variable x and code
location l, a symbolic set SYx,l that conservatively describes the set of future ADT operations that
may be invoked on the ADT instance that x points to. We use a simple backward analysis (abstract
interpretation) to compute this information. As in Section 4.2, we do not distinguish between pointer
variables belonging to the same equivalence class (i.e., the information computed is the same for all
variables in the same equivalence class).
100
C HAPTER 4. C OMPOSITION OF T RANSACTIONAL L IBRARIES VIA S EMANTIC L OCKING
1 void g(Map m, int key1, int key2, Queue q) {
2 atomic { //{lockAdd()}
3
Set s1 = m.get(key1); //{lockAdd()}
4
Set s2 = m.get(key2); //{lockAdd()}
5
if(s1!=null && s2!=null) { //{lockAdd()}
6
s1.add(1); //{lockAdd(), lockValue(2)}
7
s2.add(2);
8
q.enqueue(s1);
9
}
10 }}
Figure 4.20: The code section from Figure 4.7 annotated with the calls inferred for the variables s1 and
s2.
In the second step, we use this information to identify a suitable semantic locking operation opx,l .
The essential requirement is that the set of operations locked by operation opx,l should be a superset
of the operations denoted by SYx,l . In general, there may be many locking operations that satisfy this
constraint. Hence, in general, the algorithm may infer several candidates for opx,l , any one of which
may be used.
These two steps are equivalent to the static algorithm used in Section 3.4.2. But, in contrast to
Section 3.4.2 in which all shared memory is treated as a single library, we separately repeat these steps
for each equivalence class.
Figure 4.20 illustrates the inferred candidate locking operations for the code section from Figure 4.7
for the single equivalence class consisting of variables s1 and s2. Note that two possible candidates are
inferred for the location between line 6 and line 7.
Another example is shown in Figure 4.21: this figure illustrates the inferred candidate locking operations for the code section from Figure 4.7 for the equivalence class consisting of the variable m.
Finally, recall that the algorithm presented in Section 4.2 identifies a set of pairs (x, l) such that
we insert a locking operation on variable x at program location l. We use the same algorithm now,
except that we insert an invocation of the locking operation opx,l instead of lockAll(). Specifically,
the locking code macro (Figure 4.5) is modified to take the locking operation to be invoked as an
extra parameter. At every place where a call to this macro is inserted, for variable x at location l
(by Section 4.2’s algorithm), we use the inferred call opx,l .
For example, in Figure 4.7, for s1 at the location before line 6, we infer the call lockAdd().
Hence, we insert a (conditional) call to s1.lockAdd() at this point. (Instead of inserting a call to
s1.lockAll().)
4.4. I MPLEMENTING ADT S WITH S EMANTIC L OCKING
101
1 void g(Map m, int key1, int key2, Queue q) {
2 atomic { //{lockReadOnly()}
3
Set s1 = m.get(key1); //{lockReadOnly(),lockKey(key2)}
4
Set s2 = m.get(key2);
5
if(s1!=null && s2!=null) {
6
s1.add(1);
7
s2.add(2);
8
q.enqueue(s1);
9
}
10 }}
Figure 4.21: The code section from Figure 4.7 annotated with the calls inferred for the variable m. The
Map ADT contains a locking method lockReadOnly() (locks its read-only operations), and a locking
method lockKey(int k) (locks the Map operations on key k).
4.4
Implementing ADTs with Semantic Locking
An ADT with semantic locking can be implemented in several different ways. For example, an ADT
with the API lockAll(), lockReadonly(), unlockAll() can be implemented by using a single
read-write lock in a straightforward manner1 .
Using the Technique Presented in Section 3.5. The technique from Section 3.5 can be used to implement ADTs with semantic locking. In order to use this technique, we need to avoid cases in which two
threads are simultaneously allowed to invoke non-commutative operations. So, we change the technique
from Section 3.5 as follows:
• At the end of a locking operation m (i.e., at the end of a mayUse operation m), we insert code
that sometimes blocks the current thread. Let n be the node which is owned by the current thread.
We insert code (at the end of m) that waits until all nodes in the following set are unlocked:
{n0 | n0 6= n ∧ n0 is reachable from n}
This ensures that different threads are never simultaneously allowed to invoke non-commutative
operations.
• Since different threads are never simultaneously allowed to invoke non-commutative operations,
the optimizations from Section 3.5.2 will not be useful. Hence, it is not required to insert code at
the beginning of the standard (base) operations (as described in Section 3.5.2).
• Notice that, the extensions from Section 3.5.3 and Section 3.5.4 can be used in a straightforward
manner.
1
Let RW be the read-write lock. lockAll() will lock RW in a write-mode, lockReadonly() will lock RW in a
read-mode, and unlockAll() will unlock RW.
102
C HAPTER 4. C OMPOSITION OF T RANSACTIONAL L IBRARIES VIA S EMANTIC L OCKING
Semantic-2PL
1500
400
Graph
70% Find successors
20% Insert
10% Remove
1000
300
500
1
0.8
0.6
0.4
0.2
0
0
1
2
4
8
Global
ForesightLib
2000
Graph
50% Insert
50% Remove
1500
200
1000
100
500
0
16
45% Find successors
45% Find predecessors
9% Insert
1
22
4
4
8
8
16
600
2
Cache
800
90% Get, 10% Put
Size=50K
600
16
8
16
8
16
Cache
90% Get, 10% Put
Size=5000K
400
200
0
0
8
4
(c)
400
4
1
16
1000
200
2
35% Find successors
35% Find predecessors
20% Insert
10% Remove
(b)
800
Graph
1
Graph
0
1
(a)
3000
2500
2000
1500
1000
500
0
2PL
1
(d)
2
4
8
16
1
(e)
2
4
(f)
Figure 4.22: Graph and Cache: throughput (operations/millisecond) as a function of the number of
threads (1-16).
4.5
Performance Evaluation
In this section we present an experimental evaluation of the approach presented in this chapter. We
compare the performance of the following synchronization approaches:
1. the approach presented in this chapter (denoted by Semantic-2PL)
2. a single global lock (denoted by Global)
3. a standard deadlock-free two-phase locking which is created by only using the algorithm in Section 4.2 (denoted by 2PL)2
4. the approach (and the implementation) presented in Chapter 3 (denoted by ForesightLib)
4.5.1
Benchmarks
We use benchmarks in which the atomic sections (composite operations) manipulate several ADTs. We
use the 3 benchmarks from Section 3.6.3 (Graph, Cache, and GossipRouter).
We also use a Java version of the Intruder benchmark [7, 26]. This is a multi-threaded application
that emulates an algorithm for signature-based network intrusion detection [47]. For our study we use
its Java implementation from [7] in which atomic sections are already annotated.
2
This represents an implementation of a common two phase locking in which each ADT is associated with a single
standard lock.
4.5. P ERFORMANCE E VALUATION
103
ADT Implementations For the evaluation, we have created 3 types of Maps (a Standard Map, a Weak
Map and a Multi Map3 ) by changing the implementation discussed in Section 3.7 (as described in
Section 4.4).
Using the same technique, we have created a Set ADT (its synchronization API is similar to Figure 4.3), and a Queue ADT (which only supports lockAll and unlockAll). The Set and the Queue
ADTs are only used in the Intruder benchmark.
For each ADT, we have also created a simpler version that only supports lockAll and unlockAll
(by using a simple Java lock) — this version is used for the realizations of the standard deadlock-free
two-phase locking (2PL).
Methodology We use the evaluation methodology and workloads described in Section 3.6.
4.5.2
Performance
Graph The results for the graph benchmark are shown in Figure 4.22(a)–(d). For all workloads,
Semantic-2PL is faster than ForesightLib. Both Semantic-2PL and ForesightLib outperform Global and
2PL.
Interestingly, in this benchmark (and all benchmarks bellow) 2PL even slower than Global; this can
be explained by the overhead of the multiple locks used by the two-phase locking realization.
Tomcat’s Cache The results for the Cache benchmark are shown in Figure 4.22(e)–(f) In this benchmark, ForesightLib is much faster than the other synchronization approaches. ForesightLib is faster than
Semantic-2PL because of the optimization in Section 3.5.2 (see Section 3.7.3) — without this optimization, ForesightLib is almost identical to Semantic-2PL.
GossipRouter The results for the GossipRouter benchmark are shown in Figure 4.23. In these results,
Semantic-2PL is slightly faster than ForesightLib. Both Semantic-2PL and ForesightLib outperform
Global and 2PL.
Interestingly, in this benchmark Semantic-2PL is realized by (also) using coarse-grained synchronization (as discussed in Section 4.2.5). In fact, some of the atomic sections are never run in parallel (our
compiler identifies them as serial-sections). According the results, our approach still provides scalable
performance. This can be explained by the fact the serial-sections are rarely executed.
Intruder In the Intruder benchmark, we have used the workload which is represented by the configuration ”-a 10 -l 256 -n 16384 -s 1” (see [7]). The results for this benchmark are shown in Figure 4.24.
These results show that Semantic-2PL is much faster than the other synchronization approaches.
3
See Section 3.7.1.
104
C HAPTER 4. C OMPOSITION OF T RANSACTIONAL L IBRARIES VIA S EMANTIC L OCKING
Semantic-2PL
Speedup
600%
400%
Global
2PL
ForesightLib
5000 Messages per client
16 Clients
200%
0%
1
2
4
Cores
8
16
Figure 4.23: GossipRouter. Speedup over a single-core execution.
Semantic-2PL
Speedup
600%
Global
2PL
ForesightLib
-a 10 -l 246 -n 16384 -s 1
400%
200%
0%
1
2
4
Threads
8
16
Figure 4.24: Intruder. Speedup over a single-threaded execution.
Since the Intruder benchmark uses Sets and Queues, the implementation from Section 3.7 is not
applicable for this benchmark. Hence, in order to compare to ForesightLib, we have extended the
implementation from Section 3.7 to support Sets and Queues. As shown in the results, this extension
does not provide scalable performance. We have not found a way to extend Section 3.7’s implementation
in a way that provides better performance. This is because in most of the transaction’s execution time,
they may call to operations of a global Queue which are not right-mover with each other (this is similar
to the Queue shown in Figure 4.1) — as a result, in most of the transaction’s execution time, they are
not able to run in parallel.
Chapter 5
Related Work
5.1
Synchronization Protocols
Synchronization protocols are used in databases and shared memory systems to guarantee correctness
of concurrently executing transactions [21, 86]. Many of them are based on locks (and therefore are
sometimes called locking protocols).
Two-Phase Locking Protocol A widely used locking protocol is the two-phase locking (2PL) protocol [38] which guarantees atomicity of transactions, but does not guarantee deadlock-freedom. In the
2PL protocol, locking is done in two phases, in the first phase locks are only allowed to be acquired
(releasing locks is forbidden); in the second phase locks are only allowed to be released. These restrictions require that locks are held until the final lock is obtained, thus preventing early release of a lock
even when locking it is no longer required. This limits parallelism, especially in the presence of long
transactions. (e.g., a tree traversal must hold the lock on the root until the final node is reached.)
Semantic-aware variants of the 2PL protocol are also described in the literature (e.g., see [21, 86]):
these variants are designed to exploit semantic properties of the shared state for the sake of improved
performance. The approach described in Chapter 4 is based on such a variant of 2PL — in particular,
the compiler described in Chapter 4 is the first general-purpose compiler (which does not rely on rollback mechanisms) that enforces a semantic-aware variant of the 2PL protocol. Several other existing
approaches [51, 61, 62] are also based on similar variants of the 2PL protocols, but these approaches are
based on speculative executions and rollback mechanisms.
Non-Two-Phase Locking Protocol Other (non 2PL) locking protocols rely on the shape of the shared
objects graph — each node of this graph represent a disjoint data item which may be concurrently accessed by several threads (or processes). Most of non-two-phase protocols (e.g.[59, 80]) were designed
for databases in which the shape of shared objects graph does not change during a transaction, and thus
105
106
C HAPTER 5. R ELATED W ORK
are not suitable for more general cases with dynamically changing graphs.
[17, 28, 63] show non-2PL protocols that can handle dynamically changing graphs, but each one
of these protocols is applicable to either a tree or a DAG. In contrast, our domination locking protocol
(Chapter 2) is more general and it does not explicitly require any specific shape.
Interestingly, even when the shape of shared graph is a tree or a DAG, the technique in Chapter 2
does not guarantee any protocol from [17, 28, 63]. Therefore the domination locking protocol is still
required to automatically handle libraries in which the shared memory is a single tree. Moreover, none
of the existing techniques are able to enforce any protocol from [17, 28, 63] on dynamic data structures
(like the data structures mentioned in Chapter 2). Hence, we believe that the technique in Chapter 2
is the first automatic technique to add non-speculative fine-grain synchronization to such dynamic data
structures.
Commutativity and Movers The approach in Chapter 3 is based on right-movers whereas most synchronization approaches are based on commutativity. Indeed, [61] shows that many synchronization
schemes can be based on either right-movers or left-movers — in [61], they use a variant of a ”static”
right-mover which is a special case of our definition for dynamic-right-mover.
Locking Mechanisms Locking mechanisms are widely used for synchronization, some of them utilize
semantics properties of shared operations (e.g., [30, 60]). Usually these mechanisms do not allow
several threads to hold locks which correspond to non-commutative operations. An interesting locking
mechanism is shared-ordered locking [13] which allow several threads to hold locks which correspond
to non-commutative operations. Such locking mechanisms can be seen as special cases of libraries with
foresight-based synchronization.
5.2
Concurrent Data Structures
Many sophisticated concurrent data structures (e.g., [4, 54, 70]) were developed and integrated into
modern software libraries (e.g., see [8, 9, 11, 35]). These data structures ensure atomicity of their
basic operations, while hiding the complexity of synchronization inside their libraries. Unfortunately
as shown in [79] employing concurrent data structures in client code is error prone. The problem stems
from the inability of concurrent data structures to ensure atomicity of client operations composed from
several data structure operations.
In Chapter 3 and Chapter 4 we focus on enabling efficient and automatic atomicity of client operations composed from several data structure operations. This prevents the errors reported in [79] without
the need for the library to directly support composite operations as suggested in [3].
5.3. AUTOMATIC S YNCHRONIZATION
5.3
107
Automatic Synchronization
Automatic Lock Inference
There has been a lot of work on inferring locks for implementing atomic
sections. Most of the algorithms in the literature infer locks for following the 2PL locking protocol [29,
31, 37, 44, 57, 68]. The algorithms in [37, 57, 68] employ a 2PL variant in which all locks are released
at the end of a transaction. In these algorithms, deadlock is prevented by statically ordering locks and
rejecting certain programs. The algorithms in [29, 44] use a 2PL variant in which all locks are acquired
at the beginning of transactions and released at the end of transactions. In these algorithms, deadlock
is prevented by using a customized locking protocol at the beginning of atomic sections. As described
above, 2PL limits parallelism as all locks must be held until the final lock is acquired.
Our algorithms for inferring mayUse operations (Chapter 3 and Chapter 4) are similar to these algorithms; still with the following differences: (i) we deal mayUse operations which can be seen as
generalizations of lock-acquire and lock-release operations — this enables utilizing semantic properties
of shared operations; (ii) lock inference algorithms usually need to consider the structure of a dynamically manipulated state, in Chapter 3 we avoid this by considering a single shared library that can be
statistically identified.
Deadlock Avoidance Wang et al. [85] describe a static analysis and accompanying runtime instrumentation that eliminates the possibility of deadlock from multi-threaded programs using locks. Their
tool adds additional locks that dominate any potential locking cycle, but it requires as a starting point a
program that already has the locks necessary for atomicity.
Ownership Types
Boyapati et al. [22] describe an ownership type system that guarantees data race
freedom and deadlock freedom, but still not atomicity. Their approach can prevent deadlocks by relying
on partial-order of objects, and also permit to dynamically change this partial-order. Interestingly, the
domination locking protocol also relies on the intuition of dynamic ownership where exposed objects
dominate hidden objects.
Semantic Conflict Detection In data-based approaches to conflict detection, a dependence is inferred
between two transactions if they both access the same location, and at least one of the accesses is a
write — such approaches are often imprecise and can lead to spurious conflicts/dependences [51]. A
semantics-based approach (e.g., one that identifies two high-level operations as commuting even though
they access and modify the same data) to identifying dependences/conflicts between transactions can
enable greater parallelism. This idea is quite old and was proposed early on for database transaction
implementations (e.g., see [21, 77, 86]). Similar ideas have also motivated the development of various
software synchronization techniques (e.g.,[51, 61, 62]) — all these approaches require the use of rollback mechanism. The approaches in Chapter 3 and Chapter 4 are both semantics-based approaches that
108
C HAPTER 5. R ELATED W ORK
do not use any rollback mechanism.
The approach in [51] is a semantics-based approach which utilizes a sematic aware variant of the
2PL protocol. This approach is similar to the approach presented in Chapter 4. A notable difference
between them is the way deadlocks are handled: in [51] deadlocks are dynamically detected and resolved by aborting transactions (and using a rollback mechanism), whereas in Chapter 4 the compile
statically ensures deadlock-freedom and never uses a rollback mechanism. In [51], an invocation of an
operation p implicitly locks a set of operations (that contains p and an operation that cancels the effect
of p), whereas in Chapter 4 a transaction is able to lock operations regardless of the operations that have
already been invoked by this transaction — this enables a more versatile interaction between the automatic synchronization algorithm and the locking that is manually implemented inside the data structure;
moreover, the data structure designer does not have to make sure that the effect of each operation can be
canceled by invoking a single inverse operation. Additionally, in [51] all locks are released at the end of
the transactions, while the algorithm in Chapter 4 permits unlocking operations in an early point of the
transactions.
In contrast to [51] and Chapter 4 that permit using several independent libraries (and data structures), in Chapter 3 the the shared state has to be represented as a single data structure. Moreover, the
synchronization in Chapter 3 is practically implemented as a variant of the tree locking protocol [86],
this is in contrast to the existing semantics-based approach which are typically based on simple variants
of the 2PL protocol.
Transactional Memory
Transactional memory approaches (TMs) dynamically resolve inconsisten-
cies and deadlocks by rolling back partially completed transactions. The TM programming model can
be implemented as an extension to the cache coherence protocol [53] or as a code transformation [52].
Preserving the ability to roll back requires that transactions be isolated from the rest of the system,
which prohibits them from performing irrevocable actions such as I/O and operating-system calls. Software transactions are also prohibited from calling libraries that have not been transformed by the TM.
Ad-hoc proposals for specific forms of I/O are present in many TMs [71], but in the general case at
most one transaction at a time can safely perform an irrevocable action [87]. Rollback-free concurrency
control schemes such as ours, in contrast, do not limit concurrent I/O (and other irrevocable actions).
A rollback-free TM was recently purposed by [67]. But this TM does not permit concurrent execution of write transactions — write transactions are always executed sequentially in a manner similar
to [87].
Unfortunately, in spite of a lot of effort and many TM implementations (e.g., see [48]), existing TMs
have not been widely adopted for practical concurrent programming [27, 34, 36, 69, 89].
Chapter 6
Conclusions and Future Work
In this thesis, we show three novel approaches for enforcing atomicity by using fine-grained synchronization. We show that our approaches are applicable to a selection of real-life concurrent programs.
The approaches enable efficient and scalable synchronization by combining compilation techniques,
run-time techniques, and libraries with specialized synchronization.
All our approaches utilize properties of the programs: in Chapter 2 the shape of shared memory is
utilized; whereas the approaches in Chapter 3 and Chapter 4 utilize information about future invocations
of library operations (foresight). Particularly, we show that such preliminary information is useful for
effective synchronization.
Generic Locks In many cases, the combination of a lock (e.g., a read-write lock) and the memory
protected by the lock can be seen as a library with foresight-based synchronization. Hence, in a sense,
the synchronization presented in Chapter 3 generalizes the standard notion of a lock. In the future, it
might be interesting to develop additional synchronization techniques that are based on such generic
forms of locks.
Using Speculations
We showed approaches that avoid any speculation (i.e., rollbacks and aborts are
never used). This indicates that non-speculative synchronization can be effective (despite the fact that
many modern synchronization techniques are based on speculation). In the future, it might be interesting
to investigate the applicability of our approaches to synchronization that utilizes speculation.
One potential method to use speculations is to combine domination locking with optimistic synchronization for read-only transactions. Such a method can, for example, add version numbers to shared
objects. Read-write transactions would synchronize between themselves via domination locking, without any rollbacks, whereas read-only transactions would use the version numbers to ensure consistent
reads. Version numbers could either be managed locally, by incrementing them on each commit (e.g.,
as in [25]), or globally using some timestamp scheme (e.g., as in [32]). The local scheme would pro109
110
C HAPTER 6. C ONCLUSIONS AND F UTURE W ORK
vide better scalability for writers, while the global scheme admits very efficient read-only transactions.
Contention management in such approach would be easier than in purely optimistic schemes such as
transactional memory [48], because read-only transactions can fall back to domination locking after
experiencing too many rollbacks.
New Synchronization Protocols
The existence of novel synchronization protocols (e.g., in Chap-
ter 2 and Chapter 3) indicate that in spite of a lot of research on synchronization protocols (see, for
example [20, 86]), there is still room for new synchronization protocols.
Automatic Libraries with Foresight In Chapter 3 and Chapter 4, internal library synchronization is
implemented manually. It might be useful to develop static algorithms that automatically extend a given
library implementation with internal synchronization. In [43] we show an example for a semi-automatic
algorithm that produces the semantic locking discussed in Chapter 4.
Software Verification Novel synchronization protocols can provide a basis for software verification.
For example, [17] describes a verification technique based on special cases of domination locking (dynamic DAG and Tree locking); by using domination locking, their analysis may be simplified and extended thanks to of the weaker conditions of domination locking.
Our approaches and their realizations can be seen as a set of tools to deal with common programming scenarios in which effective synchronization is required. We believe that similar synchronization
approaches are a promising research direction, in particular semi-automatic approaches which are based
on combinations of compile-time and run-time techniques.
Bibliography
[1] http://wala.sourceforge.net.
[2] sourceforge.net/projects/tammi.
[3] gee.cs.oswego.edu/dl/jsr166/dist/jsr166edocs/jsr166e/ConcurrentHashMapV8.html.
[4] docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentHashMap.html.
[5] www.devdaily.com/java/jwarehouse/apache-tomcat-6.0.16/java/org/apache/el/util/ConcurrentCache.java.shtml.
[6] docs.oracle.com/javase/6/docs/api/java/util/WeakHashMap.html.
[7] sites.google.com/site/deucestm/.
[8] guava-libraries. code.google.com/p/guava-libraries/.
[9] Java, api specification. http://docs.oracle.com/javase/7/docs/api/.
[10] Jgroups toolkit. www.jgroups.org/index.html.
[11] libcds, concurrent data structure library. http://libcds.sourceforge.net/.
[12] Wala. http://wala.sourceforge.net.
[13] D. Agrawal and A. El Abbadi. Constrained shared locks for increasing concurrency in databases.
In Selected papers of the ACM SIGMOD symposium on Principles of database systems, 1995.
[14] Rakesh Agrawal, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, and Inkeri Verkamo.
Advances in knowledge discovery and data mining. chapter Fast discovery of association rules,
pages 307–328. American Association for Artificial Intelligence, Menlo Park, CA, USA, 1996.
[15] Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools (2nd Edition). Addison Wesley, 2006.
[16] C.R. Aragon and R.G. Seidel. Randomized search trees. Foundations of Computer Science, Annual
IEEE Symposium on, 0:540–545, 1989.
111
112
BIBLIOGRAPHY
[17] H. Attiya, G. Ramalingam, and N. Rinetzky. Sequential verification of serializability. In POPL
’10: Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 31–42, New York, NY, USA, 2010. ACM.
[18] J. Barnes and P. Hut. A hierarchical O(N log N) force-calculation algorithm. Nature, 324:446–449,
December 1986.
[19] R. Bayer and M. Schkolnick. Concurrency of operations on B-Trees. Acta Informatica, 9:1–21,
1977.
[20] Philip A. Bernstein, Vassos Hadzilacos, and Nathan Goodman. Concurrency Control and Recovery
in Database Systems. Addison-Wesley, 1987.
[21] Philip A. Bernstein, Vassos Hadzilacos, and Nathan Goodman. Concurrency Control and Recovery
in Database Systems. Addison-Wesley, 1987.
[22] Chandrasekhar Boyapati, Robert Lee, and Martin Rinard. Ownership types for safe programming:
preventing data races and deadlocks. In Proceedings of the 17th ACM SIGPLAN conference on
Object-oriented programming, systems, languages, and applications, OOPSLA ’02, pages 211–
230, New York, NY, USA, 2002. ACM.
[23] Peter Brass. Advanced Data Structures. Cambridge University Press, New York, NY, USA, 2008.
[24] Nathan Bronson. Composable Operations on High-Performance Concurrent Collections. PhD
thesis, Stanford University, December 2011.
[25] Nathan Grasso Bronson, Jared Casper, Hassan Chafi, and Kunle Olukotun. A practical concurrent
binary search tree. In PPOPP, pages 257–268, 2010.
[26] Chi Cao Minh, JaeWoong Chung, Christos Kozyrakis, and Kunle Olukotun. STAMP: Stanford
transactional applications for multi-processing. In IISWC, 2008.
[27] Calin Cascaval, Colin Blundell, Maged Michael, Harold W. Cain, Peng Wu, Stefanie Chiras, and
Siddhartha Chatterjee. Software transactional memory: Why is it only a research toy? Queue,
6(5):46–58, September 2008.
[28] Vinay K. Chaudhri and Vassos Hadzilacos. Safe locking policies for dynamic databases. In PODS
’95: Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of
database systems, pages 233–244, New York, NY, USA, 1995. ACM.
[29] Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani. Inferring locks for atomic sections. In
PLDI, 2008.
BIBLIOGRAPHY
113
[30] P. J. Courtois, F. Heymans, and D. L. Parnas. Concurrent control with readers and writers. Commun. ACM, 14(10), October 1971.
[31] Dave Cunningham, Khilan Gudka, and Susan Eisenbach. Keep off the grass: Locking the right
path for atomicity. In CC, pages 276–290. 2008.
[32] David Dice, Ori Shalev, and Nir Shavit. Transactional locking ii. In DISC, pages 194–208, 2006.
[33] Simon Doherty, David L. Detlefs, Lindsay Groves, Christine H. Flood, Victor Luchangco, Paul A.
Martin, Mark Moir, Nir Shavit, and Guy L. Steele, Jr. Dcas is not a silver bullet for nonblocking algorithm design. In SPAA ’04: Proceedings of the sixteenth annual ACM symposium on Parallelism
in algorithms and architectures, pages 216–224, New York, NY, USA, 2004. ACM.
[34] Aleksandar Dragojević, Pascal Felber, Vincent Gramoli, and Rachid Guerraoui. Why stm can be
more than a research toy. Commun. ACM, 54(4):70–77, April 2011.
[35] Joe Duffy. Concurrent Programming on Windows. Addison-Wesley, 2008.
[36] Joe Duffy. A (brief) retrospective on transactional memory. 2010. http://joeduffyblog.
com/2010/01/03/a-brief-retrospective-on-transactional-memory/.
[37] Michael Emmi, Jeffrey S. Fischer, Ranjit Jhala, and Rupak Majumdar. Lock allocation. In POPL,
pages 291–296, 2007.
[38] K. P. Eswaran, J. N. Gray, R. A. Lorie, and I. L. Traiger. The notions of consistency and predicate
locks in a database system. Commun. ACM, 19:624–633, November 1976.
[39] Erich Gamma, Richard Helm, Ralph E. Johnson, and John Vlissides. Design Patterns: Elements
of Reusable Object-Oriented Software. Addison-Wesley, Reading, MA, 1995.
[40] Guy Golan-Gueta, Nathan Bronson, Alex Aiken, G. Ramalingam, Mooly Sagiv, and Eran Yahav.
Automatic fine-grain locking using shape properties. In OOPSLA, 2011.
[41] Guy Golan-Gueta, G. Ramalingam, Mooly Sagiv, and Eran Yahav. Concurrent libraries with
foresight. In PLDI, 2013.
[42] Guy Golan-Gueta, G. Ramalingam, Mooly Sagiv, and Eran Yahav. Automatic semantic locking.
In PPOPP, 2014.
[43] Guy Golan-Gueta, G. Ramalingam, Mooly Sagiv, and Eran Yahav. Automatic scalable atomicity
via semantic locking. In PPOPP, 2015.
[44] Khilan Gudka, Tim Harris, and Susan Eisenbach. Lock inference in the presence of large libraries.
In ECOOP 2012–Object-Oriented Programming, pages 308–332. Springer, 2012.
114
BIBLIOGRAPHY
[45] Rachid Guerraoui and Michal Kapalka. Principles of Transactional Memory. Synthesis Lectures
on Distributed Computing Theory. Morgan & Claypool Publishers, 2010.
[46] Leo J. Guibas and Robert Sedgewick. A dichromatic framework for balanced trees. In Proceedings
of the 19th Annual Symposium on Foundations of Computer Science, pages 8–21, Washington, DC,
USA, 1978. IEEE Computer Society.
[47] Bart Haagdorens, Tim Vermeiren, and Marnix Goossens. Improving the performance of signaturebased network intrusion detection sensors by multi-threading. Information Security Applications,
pages 188–203, 2005.
[48] Tim Harris, James Larus, and Ravi Rajwar. Transactional memory, 2nd edition. Synthesis Lectures
on Computer Architecture, 5(1), 2010.
[49] Peter Hawkins, Alex Aiken, Kathleen Fisher, Martin Rinard, and Mooly Sagiv. Concurrent data
representation synthesis. In PLDI, 2012.
[50] M. Herlihy, Y. Lev, V. Luchangco, and N. Shavit. A provably correct scalable concurrent skip list.
In OPODIS, 2006.
[51] Maurice Herlihy and Eric Koskinen. Transactional boosting: a methodology for highly-concurrent
transactional objects. In PPoPP, pages 207–216, 2008.
[52] Maurice Herlihy, Victor Luchangco, Mark Moir, and William N. Scherer III. Software transactional memory for dynamic-sized data structures. In PODC, pages 92–101, 2003.
[53] Maurice Herlihy and J. Eliot B. Moss. Transactional memory: Architectural support for lock-free
data structures. In ISCA, pages 289–300, 1993.
[54] Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming. Morgan Kauffman,
February 2008.
[55] Maurice P. Herlihy and Jeannette M. Wing. Linearizability: a correctness condition for concurrent
objects. ACM Trans. Program. Lang. Syst., 12, July 1990.
[56] Maurice P. Herlihy and Jeannette M. Wing. Linearizability: a correctness condition for concurrent
objects. Proc. of ACM TOPLAS, 12(3):463–492, 1990.
[57] Michael Hicks, Jeffrey S. Foster, and Polyvios Prattikakis. Lock inference for atomic sections.
In Proceedings of the First ACM SIGPLAN Workshop on Languages, Compilers, and Hardware
Support for Transactional Computing, June 2006.
BIBLIOGRAPHY
115
[58] Guoliang Jin, Wei Zhang, Dongdong Deng, Ben Liblit, and Shan Lu. Automated concurrency-bug
fixing. In OSDI, 2012.
[59] Zvi M. Kedem and Abraham Silberschatz. A characterization of database graphs admitting a
simple locking protocol. Acta Inf., 16:1–13, 1981.
[60] Henry F. Korth. Locking primitives in a database system. J. ACM, 30:55–79, January 1983.
[61] Eric Koskinen, Matthew Parkinson, and Maurice Herlihy. Coarse-grained transactions. In POPL,
pages 19–30, 2010.
[62] Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul
Chew. Optimistic parallelism requires abstractions. In PLDI, 2007.
[63] Vladimir Lanin and Dennis Shasha. Tree locking on changing trees. Technical report, 1990.
[64] T. Lev-Ami and M. Sagiv. TVLA: A framework for Kleene based static analysis. In Saskatchewan,
volume 1824 of Lecture Notes in Computer Science, pages 280–301. Springer-Verlag, 2000.
[65] Ondřej Lhoták and Laurie Hendren. Scaling java points-to analysis using spark. In Proceedings of
the 12th international conference on Compiler construction, CC’03, 2003.
[66] Richard J. Lipton. Reduction: a method of proving properties of parallel programs. Commun.
ACM, 18(12):717–721, December 1975.
[67] Alexander Matveev and Nir Shavit. Towards a fully pessimistic stm model. In TRANSACT 2012
Workshop, 2012.
[68] Bill McCloskey, Feng Zhou, David Gay, and Eric Brewer. Autolocker: synchronization inference
for atomic sections. In POPL, pages 346–358, 2006.
[69] Paul E McKenney. Is parallel programming hard, and, if so, what can you do about it? Linux
Technology Center, IBM Beaverton, August 2012.
[70] Maged M. Michael and Michael L. Scott. Simple, fast, and practical non-blocking and blocking
concurrent queue algorithms. In PODC, pages 267–275, 1996.
[71] J. Eliot B. Moss. Open Nested Transactions: Semantics and Support. In Poster at the 4th Workshop
on Memory Performance Issues (WMPI-2006). February 2006.
[72] Ramanathan Narayanan, Berkin zis. Ikyilmaz, Joseph Zambreno, Gokhan Memik, and Alok
Choudhary. Minebench: A benchmark suite for data mining workloads. In 2006 IEEE International Symposium on Workload Characterization, pages 182–188, 2006.
116
BIBLIOGRAPHY
[73] Flemming Nielson, Hanne R. Nielson, and Chris Hankin.
Principles of Program Analysis.
Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1999.
[74] Otto Nurmi and Eljas Soisalon-Soininen. Uncoupling updating and rebalancing in chromatic binary search trees. In Proceedings of the tenth ACM SIGACT-SIGMOD-SIGART symposium on
Principles of database systems, PODS ’91, pages 192–198, New York, NY, USA, 1991. ACM.
[75] Christos H. Papadimitriou. The serializability of concurrent database updates. J. ACM, 26(4):631–
653, 1979.
[76] M. Sagiv, T. Reps, and R. Wilhelm. Parametric Shape Analysis via 3-valued Logic. ACM Trans.
on Prog. Lang. and Systems (TOPLAS), 24(3):217–298, 2002.
[77] Peter M. Schwarz and Alfred Z. Spector. Synchronizing shared abstract types. ACM Trans. Comput. Syst., 2(3):223–250, August 1984.
[78] Ohad Shacham. Verifying Atomicity of Composed Concurrent Operations. PhD thesis, Tel Aviv
University, 2012.
[79] Ohad Shacham, Nathan Bronson, Alex Aiken, Mooly Sagiv, Martin Vechev, and Eran Yahav.
Testing atomicity of composed concurrent operations. In OOPSLA, 2011.
[80] A. Silberschatz and Z.M. Kedam. A family of locking protocols for database systems that are
modeled by directed graphs. Software Engineering, IEEE Transactions on, SE-8(6):558 – 562,
November 1982.
[81] Daniel Dominic Sleator and Robert Endre Tarjan. Self adjusting heaps. SIAM J. Comput., 15:52–
69, February 1986.
[82] Alexandru Sălcianu and Martin Rinard. Purity and side effect analysis for Java programs. In
VMCAI, pages 199–215, 2005.
[83] Yaakov L. Varol and Doron Rotem. An algorithm to generate all topological sorting arrangements.
The Computer Journal, 24(1):83–84, 1981.
[84] Jons-Tobias Wamhoff, Christof Fetzer, Pascal Felber, Etienne Rivière, and Gilles Muller. Fastlane:
Improving performance of software transactional memory for low thread counts. In PPoPP, 2013.
[85] Yin Wang, Stéphane Lafortune, Terence Kelly, Manjunath Kudlur, and Scott A. Mahlke. The
theory of deadlock avoidance via discrete control. In POPL, pages 252–263, 2009.
[86] Gerhard Weikum and Gottfried Vossen. Transactional information systems: theory, algorithms,
and the practice of concurrency control and recovery. Morgan Kaufmann Publishers Inc., San
Francisco, CA, USA, 2001.
BIBLIOGRAPHY
117
[87] Adam Welc, Bratin Saha, and Ali-Reza Adl-Tabatabai. Irrevocable transactions and their applications. In SPAA, pages 285–296, 2008.
[88] Hongseok Yang, Oukseh Lee, Josh Berdine, Cristiano Calcagno, Byron Cook, and Dino Distefano.
Scalable shape analysis for systems code. In In CAV, 2008.
[89] Richard M. Yoo, Yang Ni, Adam Welc, Bratin Saha, Ali-Reza Adl-Tabatabai, and Hsien-Hsin S.
Lee. Kicking the tires of software transactional memory: Why the going gets tough. In SPAA,
2008.
‫אוניברסיטת תל אביב‬
‫הפקולטה למדעים מדויקים‬
‫ע"ש ריימונד ובברלי סאקלר‬
‫בית הספר למדעי המחשב‬
‫סנכרון עדין אוטומטי של תוכניות מקביליות‬
‫גיא גולן גואטה‬
‫בהנחיית פרופ' מולי שגיב ופרופ' ערן יהב‬
‫ובייעוצו של ד"ר ג‪ .‬רמלינגהם‬
‫חיבור לשם קבלת תואר‬
‫דוקטור לפילוסופיה‬
‫הוגש לסנאט של אוניברסיטת תל אביב‬
‫אפריל ‪0215‬‬
0
‫תמצית‬
‫אחד האתגרים המרכזיים בכתיבת תוכנה מקבילית הוא סנכרון (‪ :)synchronization‬הבטחה שגישות‬
‫ועדכונים מקבילים למידע משותף אינם מתנגשים אחד עם השני באופן שגורם לתוצאה לא רצויה‪ .‬אטומיות‬
‫(‪ )atomicity‬היא תכונת נכונות מרכזית של סנכרון‪ .‬תכונת האטומיות מתייחסת לקטעי קוד והיא אומרת‬
‫שבכל ריצה של התוכנית קטעים אלו יראו כאילו הם בוצעו באופן אטומי‪ .‬במקרים רבים‪ ,‬זה נחשב קשה‬
‫לממש סנכרון שמבטיח אטומיות באופן יעיל וסקלאבילי (‪.)scalable‬‬
‫בתזה זו אנחנו עוסקים בבעיה של אכיפת תכונת האטומיות על ידי בנייה של מספר פרוטוקולי סנכרון ופיתוח‬
‫של שיטות לאכוף אותם על קטעי קוד‪ .‬הטכניקות שאנחנו מפעילים עושות שימוש במידע סטטי ודינמי של‬
‫התוכניות ומבוססות על הפעלות שיטות קומפילציה ושיטות העובדות בזמן‪-‬ריצה‪ .‬בפרט‪ ,‬אנחנו מציגים שלוש‬
‫גישות לביצוע סנכרון‪ )1( :‬גישה שבה מנצלים את הצורה של הזיכרון המשותף על מנת להפוך ספרייה‬
‫סדרתית (כזו הלא מכילה סנכרון) לספרייה אטומית (ספרייה שבה מנקודת המבט של הלקוח שלה‪ ,‬כל‬
‫פעולה נראית כאילו היא בוצעה באופן אטומי)‪ .‬גישה זו מבוססת על הפרוטוקול ‪,Domination Locking‬‬
‫פרוטוקול זה הינו פרוטוקול המאפשר סנכרון עדין (‪ )fine-grained synchronization‬הוא מתוכנן עבור‬
‫סנכרון של תוכנת מונחית אובייקטים הכוללת מניפולציות של מצביעים לאובייקטים‪ .‬אנחנו מראים דרך‬
‫לאכוף את ‪ Domination Locking‬בספריות שבהן לזיכרון המשותף יש צורה של יער דינמי‪ )2( .‬גישה‬
‫שבה הופכים ספרייה אטומית לספרייה טרנזקציונית‪ .‬ספרייה טרנזקציונית הינה ספרייה שמאפשרת‬
‫להבטיח אטומיות של סדרה של פעולות (בניגוד לספרייה אטומית שבה האטומיות המובטחת היא של פעולות‬
‫בודדות בלבד)‪ .‬הרעיון בגישה זו הוא לבנות ספרייה המנצלת מידע המתקבל מהלקוחות שלה לגבי האופן בו‬
‫הם ישתמשו בה‪ .‬מידע זה מגביל את המקרים שהספרייה צריכה לקחת בחשבון וכך מאפשר לסנכרן בצורה‬
‫אפקטיבית את הסדרות של הפעולות‪ .‬גישה זו מבוססת על פרוטוקול סנכרון חדש המבוסס על הרעיון של‬
‫מוזז‪-‬ימינה דינמי (‪ .)dynamic right-mover‬בנוסף‪ ,‬אנחנו מראים שבהרבה מקרים הקיימים בתוכניות‬
‫‪ Java‬מעשיות‪ ,‬אפשר באמצעות קומפיילר לחשב באופן אוטומטי את המידע שהלקוח צריך להעביר לספרייה‪.‬‬
‫(‪ )3‬גישה שמאפשרת לעבוד עם מספר ספריות טרנזקציוניות יחד‪ .‬גישה זו עובדת על מקרה פרטי של ספריות‬
‫טרנזקציוניות שבו הסנכרון מבוסס על מנעולים המנצלים תכונות סמנטיות של פעולות הספרייה‪.‬‬
‫בתזה אנחנו מנסחים את הגישות בצורה פורמאלית ומראים שהן מבטיחות אטומיות מבלי ליצר שגיאות‬
‫סנכרון כמו קיפאון (‪ .)deadlock‬בנוסף אנחנו מממשים את הגישות ומראים שהן מובילות לסנכרון יעיל‬
‫וסקלאבילי‪.‬‬
‫‪3‬‬
4
‫תקציר‬
‫פרק ‪ – 1‬הקדמה‬
‫מקביליות הינה תכונה נפוצה של מערכות תוכנה בגלל שהיא מאפשרת לקצר זמני תגובה של פעולות‪ ,‬להעלות‬
‫תפוקה של פעולות‪ ,‬ולספק נצילות גבוהה יותר של מכונות מרובי ליבות‪ .‬למרות זאת‪ ,‬כתיבת תוכנה‬
‫אפקטיבית המכילה מקביליות נחשבת משימה קשה המועדת לשגיאות‪ .‬זאת‪ ,‬בשל הצורך לקחת בחשבון‬
‫אינטראקציות רבות ועדינות של חלקי התוכנית הרצים במקביל‪.‬‬
‫אטומיות‪ .‬אטומיות הינה תכונת נכונות יסודית של קטעי קוד בתוכניות מקביליות‪ .‬אינטואיטיבית‪ ,‬קטע קוד ‪S‬‬
‫הוא אטומי בתוכנית‪ ,‬אם לכל ריצה של התוכנית קיימת לתוכנית ריצה (זהה או שונה) עם התנהגות שקולה‬
‫שבה קטע הקוד ‪ S‬מתבצע באופן רציף באופן שאינו משולב עם חלקים אחרים של התוכנית‪ .‬במילים אחרות‪,‬‬
‫קטע קוד אטומי הינו קטע קוד שניתן לראותו כאילו הוא מתבצע באופן מבודד משאר חלקי התוכנית‪ .‬קטעים‬
‫אטומיים משפרים את היכולת להבין תוכניות מקביליות מכיוון שניתן להניח (מבחינת נכונות התוכנית)‬
‫שקטעים אלה מתבצעים תמיד באופן רציף (כאילו הם מבוצעים ברצף ללא הפרעה)‪.‬‬
‫הספרות הרלוונטית (של מערכות בסיסי נתונים ומערכות זיכרון משותף) מגדירה ועוסקת במספר גרסאות‬
‫שונות של אטומיות‪ ,‬כאשר לכל גרסה יש את התכונות הסמנטיות שלה והיא מיועדת לקבוצה מסוימת של‬
‫התנהגויות ושיקולים‪ .‬לדוגמא‪ ,‬לינאריזביליות (‪ )linearizablity‬הינה גרסה של תכונת אטומיות המשמשת‬
‫לתיאור מימושים של ספריות משותפות (ובאופן זהה‪ ,‬אובייקטים משותפים)‪ :‬אינטואיטיבית‪ ,‬פעולה היא‬
‫לינאריזבילית אם ניתן לראותה כאילו היא מתבצעת בבת‪-‬אחת בנקודה מסוימת בין התחלת הפעולה לסיום‬
‫הפעולה‪ .‬תכונת הלינאריזביליות לא מתייחסת (באופן ישיר) לאופן המימוש של הספרייה‪ ,‬במקום זאת היא‬
‫מתיחסת להתנהגות הספרייה מנקודת המבט של הלקוח שלה‪.‬‬
‫הבעיה‪ .‬בתזה זו אנחנו עוסקים באכיפת אטומיות של קטעי קוד‪ ,‬על ידי יצירה אוטומטית של סנכרון יעיל‬
‫וסקלאבילי (‪ .)scalable‬אחד האתגרים המרכזיים בבעיה זו הינו להבטיח אטומיות בצורה סקלאבילית באופן‬
‫שמגביל את המקביליות רק היכן שיש צורך בכך‪ .‬הסנכרון צריך להיות עם תקורה נמוכה מספיק על מנת‬
‫שיהיה כדאי להשתמש בו – כלומר‪ ,‬הוא צריך להיות יעיל יותר מאלטרנטיביות פשוטות של סנכרון (כמו‬
‫מנעול גלובלי פשוט המשמש לאכיפת אטומיות על ידי מניעת ביצוע מקבילי של הקטעים האטומיים)‪.‬‬
‫פתרונות סנכרון לאכיפת אטומיות‪ ,‬הממומשים באופן מעשי‪ ,‬הם בדרך כלל פתרונות הנכתבים ספציפית באופן‬
‫ידני לתוכניות מסוימות – הבעיה עם פתרונות אלה היא שהם נוטים להכיל באגים רבים‪ .‬שיטות סנכרון‬
‫אוטומטיות מאפשרות למתכנת לסמן את הקטעים שצריכים להיות אטומיות‪ ,‬ואז הקומפיילר (עם שיתוף‬
‫פעולה של סביבת זמן‪-‬הריצה) אוכף את האטומיות שלהם באופן אוטומטי‪ .‬למרות הנוחות הפוטנציאלית‬
‫בשיטות האוטומטיות הקיימות‪ ,‬שיטות אלה אינן בשימוש נפוץ בגלל סיבות שונות הכוללות‪ :‬תקורת זמן ריצה‬
‫גבוהה‪ ,‬ביצועים גרועים‪ ,‬ויכולת מוגבלת לתמוך בפעולות לא‪-‬הפיכות (כגון פעולות קלט‪/‬פלט)‪.‬‬
‫סינכרוניזציה ייעודית‪ .‬בתזה זו אנחנו מציגים מספר גישות לאכיפה אוטומטית של אטומיות‪ ,‬כאשר כל גישה‬
‫מיועדת עבור מחלקה מסוימת של תוכניות‪ .‬הרעיון הוא לייצר סנכרון שאוכף אטומיות על ידי ניצול תכונות‬
‫ספציפיות של התוכניות‪ .‬לכל גישה אנחנו מתארים פרוטוקול סנכרון ייעודי‪ ,‬ומראים דרך להבטיח אותו על‬
‫ידי שילוב שיטות קומפילציה ושיטות זמן‪-‬ריצה‪ .‬פרוטוקולי הסנכרון מתוכננים להבטיח אטומיות באופן יעיל‬
‫‪5‬‬
‫וסקלאבילי מבלי להוביל לשגיאות סנכרון כמו קיפאון (‪ .)deadlock‬כל זאת מבלי להשתמש במנגנוני‬
‫ספקולציה ומנגנוני הרצה‪-‬מחדש (‪.)rollback mechanisms‬‬
‫פרק ‪ – 2‬הוספה אוטומטית של נעילות עדינות‬
‫בפרק ‪ 0‬אנחנו מציגים גישה שמנצלת את הצורה של הזיכרון המשותף על מנת להפוך ספרייה סדרתית (כזו‬
‫הלא מכילה סנכרון) לספרייה אטומית (בפרט‪ ,‬לספרייה לינאריזבילית)‪ .‬גישה זו מבוססת על המאמר‬
‫”‪ “Automatic Fine-Grain Locking using Shape Properties‬שהוצג בכנס ‪.OOPSLA’2011‬‬
‫ספרייה הינה מודול שכומס (‪ )encapsulates‬זיכרון משותף עם קבוצה של פרוצדורות היכולות להיקרא על‬
‫ידי חוטים (‪ )threads‬המתבצעים במקביל‪ .‬בהינתן קוד של מימוש ספרייה‪ ,‬המטרה שלנו היא להוסיף‪ ,‬באופן‬
‫אוטומטי‪ ,‬נעילות עדינות (‪ )fine-grained locking‬שמבטיחות שהקוד של כל פרוצדורה יהיה אטומי תוך‬
‫כדי כך שהוא מאפשר רמה גבוהה של מקביליות בביצוע הפרוצדורות‪ .‬בפרט‪ ,‬אנחנו מעוניינים בסנכרון שבו‬
‫לכל אובייקט משותף יש מנעול משלו ומנעולים יכולים להיות משוחררים לפני סיום ביצוע הפרוצדורות‬
‫(יכולת זו לפעמים נקראת ‪ .)early-lock-release‬הרעיון המרכזי של גישה זו הוא להשתמש בצורה המוגבלת‬
‫של גרף האובייקטים על מנת לאפשר את היצירה האוטומטית של הנעילות העדינות‪.‬‬
‫הפרוטוקול ‪ .Domination Locking‬הגישה מבוססת על פרוטוקול נעילות עדינות חדש הנקרא‬
‫‪( Domination Locking‬בקיצור ‪ .)DL‬הפרוטוקול ‪ DL‬הינו קבוצה של תנאים המבטיחים אטומיות והעדר‬
‫קיפאון‪ .‬הפרוטוקול ‪ DL‬מיועד לטפל במבני נתונים רקורסיביים (עם מניפולציות דינמיות) על ידי שימוש‬
‫בתכונות של מסלולים במבני נתונים כאלו‪.‬‬
‫הפרוטוקול ‪ DL‬מבחין בין שני סוגי אובייקטים‪ :‬אובייקטים חשופים ואובייקטים מוחבאים‪ .‬אובייקטים‬
‫חשופים משמשים כתווך בין הקוד של הספרייה לקוד לקוח שלה (הקוד שמשתמש בספרייה)‪ ,‬מצביעים‬
‫לאובייקטים אלו עוברים בין קוד הספרייה לקוד הלקוח – דוגמא נפוצה לאובייקט כזה היא אובייקט השורש‬
‫של מבנה נתונים‪ .‬לעומת זאת‪ ,‬אובייקטים מוחבאים הינם אובייקטים המנוהלים על ידי הספרייה שקיומם‬
‫אינו חשוף ללקוח של הספרייה – דוגמא נפוצה היא האובייקטים מתחת לשורש של מבנה הנתונים‪.‬‬
‫הפרוטוקול מנצל את העובדה שכל פרוצדורה צריכה להתחיל עם אובייקטים חשופים (אחד הוא יותר) על מנת‬
‫לעבור על גרף האובייקטים ולהגיע לאובייקטים מוחבאים‪.‬‬
‫הפרוטוקול דורש שאובייקטים מוחבאים העוברים כפרמטרים לפרוצדורה ינעלו באופן דומה לפרוטוקול ה‪-‬‬
‫‪ .two-phase-locking‬לעומת זאת‪ ,‬אובייקטים מוחבאים מטופלים באופן שונה‪ .‬חוט ‪ t‬רשאי לנעול אובייקט‬
‫מוחבא ‪ u‬רק אם האובייקטים ש ‪ t‬מחזיק את המנעול שלהם הם דומיננטיים על ‪( u‬אנחנו אומרים שקבוצה ‪S‬‬
‫של אובייקטים הם דומיננטיים על אובייקט ‪ u‬אם כל המסלולים‪ ,‬בגרף האובייקטים‪ ,‬מאובייקט חשוף ל ‪u‬‬
‫מכילים אובייקט ב ‪ .)S‬בפרט‪ ,‬אובייקטים מוחבאים יכולים להינעל על ידי פרוצדורה אפילו אחרי שהיא‬
‫שיחררה מנעולים מאובייקטים אחרים (באופן זה הפרוטוקול מאפשר שחרור מוקדם של מנעולים)‪.‬‬
‫פרוטוקול זה הינו הכללה ממשית של מספר פרוטוקולי נעילות עדינות קשורים כמו‪dynamic tree :‬‬
‫‪ locking‬ו ‪.DAG locking‬‬
‫אכיפה אוטומטית של ‪ .Domination Locking‬בפרק אנחנו מציגים טכניקה אוטומטית לאכוף את‬
‫הפרוטוקול ‪ DL‬על קוד של ספרייה סדרתית נתונה‪ .‬הטכניקה הינה ישימה לספריות בהן הזיכרון המשותף‬
‫‪6‬‬
‫הינו גרף אובייקטים שיש לו צורה של יער דינמי‪ .‬הטכניקה מתירה לצורה של גרף האובייקטים להשתנות‬
‫באופן דינמי כל עוד שהצורה שלו היא יער בין הפעלות שונות של הפרוצדורות (מספיק שזה יתקיים עבור‬
‫הספרייה המקורית בריצות ללא מקביליות)‪.‬‬
‫הטכניקה מבצעת את סכמת הנעילות הבאה (תיאור לא פורמאלי)‪ :‬בזמן ריצה‪ ,‬פרוצדורה מחזיקה מנעול על‬
‫כל אחד מהאובייקטים המוצבעים ישירות מהמשתנים הלוקליים שלה (קבוצה זו של אובייקטים נקראת הסקופ‬
‫המיידי)‪ .‬כאשר אובייקט יוצא מהסקופ המיידי‪ ,‬הפרוצדורה צריכה לשחרר את המנעול על האובייקט אם יש‬
‫לו (לכל היותר) אובייקט קודם אחד בגרף האובייקטים (כלומר‪ ,‬במידה והוא לא מפר את תכונת היער של גרף‬
‫האובייקטים)‪ .‬במידה ויש לאובייקט כזה מספר אובייקטים קודמים אז אנחנו יודעים שבנקודה מסוימת‪,‬‬
‫בהמשך הריצה‪ ,‬מצב זה ישתנה ולכן לבסוף האובייקט ישוחרר (בגלל שלספרייה יש צורה של יער בסוף ריצה‬
‫סדרתית של פרוצדורה)‪ .‬מימוש סכמת הנעילות הנ"ל מתבצע על ידי טרנספורמציית קוד פשוטה המוסיפה‬
‫מוני יחוס (‪ )reference counters‬לקוד של הספרייה‪.‬‬
‫אבלואציה‪ .‬אנחנו מראים שטכניקה זו מוסיפה נעילות עדינות המובילות לסנכרון יעיל וסקלאבילי‪ ,‬עבור‬
‫מספר מבני נתונים בהם מאוד קשה לייצר נעילות כאלה באופן ידני‪ .‬אנחנו מדגימים את הישימות שלה על שני‬
‫עצים מאוזנים (‪ Treap‬ועץ אדום שחור)‪ ,‬ערמה מסוג ‪ skew-heap‬ושני מימושים של מבני נתונים ייעודיים‬
‫לאפליקציות שלהם‪.‬‬
‫פרק ‪ – 3‬ספריות טרנזקציונית‬
‫ספריות לינאריזביליות מבטיחות שהפעולות שלהן יראו כפעולות המתבצעות באופן אטומי‪ .‬לעיתים קרובות‬
‫לקוחות של ספריות זקוקים לכך שסדרה של מספר פעולות תיראה כאילו היא בוצעה באופן אטומי (קוד‬
‫המייצר סדרה של מספר פעולות נקרא פעולה מורכבת)‪ .‬בפרק ‪ 3‬אנחנו עוסקים בהרחבה של ספריות‬
‫לינאריזביליות לספריות התומכות באטומיות של פעולות מורכבות כלשהן (הקוד שמחליט על סדרת הפעולה‬
‫שייך ללקוח של הספרייה)‪ .‬אנחנו מציגים גישה חדשה שבה ספרייה מבטיחה אטומיות של פעולות מורכבות‬
‫על ידי ניצול מידע המתקבל מקוד הלקוח‪ .‬אנחנו קוראים לספריות אלו ספריות טרנזקציוניות‪.‬‬
‫המתודולוגיה הבסיסית שלנו דורשת שהלקוח יסמן את קטע הקוד של הפעולה המורכבת שעבורו נדרשת‬
‫האטומיות וייתן לספרייה מידע הצהרתי על הפעולות ספרייה (פרוצדורות) שבה הוא הולך להשתמש בנקודות‬
‫השונות של קוד הפעולה המורכבת – מידע זה הינו מידע לגבי הפעולות העתידיות הפוטנציאליות שיכולת‬
‫להתבצע במסגרת הפעולה המורכבת‪ .‬הספרייה אחראית להבטיח את האטומיות של הפעולה המורכבת‪ ,‬והיא‬
‫רשאית לנצל את המידע על הפעולות העתידיות לטובת סנכרון אפקטיבי‪.‬‬
‫הגישה המוצגת בפרק זה מבוססת על המאמר ”‪ “Concurrent Libraries with Foresight‬שהוצג בכנס‬
‫‪.PLDI’2013‬‬
‫סנכרון על בסיס מידע על העתיד‪ .‬בפרק זה אנחנו מציגים פורמליזם של הגישה‪ .‬אנחנו מציגים באופן‬
‫פורמאלי את המטרות של הגישה ונותנים תנאי נכונות מספיקים עבורה‪ .‬כל עוד הלקוח והספרייה מקיימים את‬
‫התנאים‪ ,‬מובטח שכל הפעולות המורכבות יתבצעו באופן אטומי מבלי להוביל לקיפאון‪ .‬תנאי הנכונות שלנו‬
‫הינם כלליים ומאפשרים מרחב גדול של מימושים‪ .‬תנאים אלה מבוססים על המושג של מוזז‪-‬ימינה דינמי‬
‫(‪ )dynamic right-mover‬המכליל מושגים מקובלים כמו מוזז‪-‬ימינה סטטי (‪)static right-mover‬‬
‫וקומוטטיביות (‪ .)commutativity‬הגישה שלנו מפרידה בין מימוש הספרייה ללקוח שלה‪ .‬לפיכך‪ ,‬נכונות‬
‫‪7‬‬
‫הלקוח אינה תלויה באופן שבו מימוש הספרייה משתמש במידע על העתיד‪ .‬הלקוח צריך רק לוודא שהמידע‬
‫המועבר לספרייה הטרנזקציונית הינו מידע נכון‪.‬‬
‫הסקה אוטומטית‪ .‬בנוסף‪ ,‬אנחנו מציגים אנליזה סטטית המסיקה את המידע הנדרש עבור הגישה שלנו‪.‬‬
‫אנליזה זו מאפשרת לקומפיילר להוסיף (באופן אוטומטי) לקוד הלקוח קריאות המעבירות לספרייה את המידע‬
‫הנדרש על הפעולות העתידיות‪ .‬דבר זה מפשט את העבודה של כתיבת הפעולות המורכבות‪.‬‬
‫טכניקת מימוש‪ .‬בפרק זה גם אנחנו מציגים טכניקה כללית המאפשרת להרחיב ספריות לינאריזביליות קיימות‬
‫לספריות טרנזקציוניות‪ .‬הטכניקה מבוססת על וריאציה חדשה של פרוטוקול ה ‪ tree-locking‬שבו מבנה העץ‬
‫נקבע על ידי המשמעות הסמנטית של פעולות הספרייה‪.‬‬
‫אנחנו משתמשים בטכניקת המימוש על מנת לבנות ספריית ‪ Java‬לשימוש כללי התומכת במספר מבני נתונים‬
‫של מפות‪ .‬הספרייה שלנו מאפשרת פעולות מורכבות העובדות בו‪-‬זמנית עם מספר מופעים של מפות (בשלב‬
‫זה אנחנו מתמקדים במפות בגלל עבודות קודמות שהראו שפעולות מורכבות על מפות הן מאוד נפוצות בקוד‬
‫של מערכות תוכנה)‪.‬‬
‫אבלואציה‪.‬אנחנו משתמשים בספריית המפות ובאנליזה הסטטית על מנת לאכוף אטומיות של פעולות‬
‫מורכבות‪ ,‬כולל כאלה העובדות עם מספר מופעים של מפות‪ .‬אנחנו מראים שהגישה שלנו ישימה למגוון‬
‫פעולות מורכבות הקיימות בתוכניות ‪( Java‬השייכות לפרויקטי קוד‪-‬פתוח)‪ .‬בנוסף אנחנו מראים את‬
‫הפוטנציאל הביצועי של הגישה על ידי כך שאנחנו מראים שהיא מספקת סנכרון יעיל וסקלאבילי‪.‬‬
‫פרק ‪ – 4‬הרכבת ספריות טרנזקציונית באמצעות נעילות סמנטיות‬
‫בפרק ‪ 4‬אנחנו מציגים גישה לטיפול בפעולות מורכבות המשתמשות במספר ספריות טרנזקציוניות יחד‪.‬‬
‫חלק מהפרק מבוסס על המאמר הקצר ”‪ “Automatic Semantic Locking‬שהוצג בכנס ‪.PPOPP’2014‬‬
‫הגישה המוצגת הינה שילוב של הגישה מפרק ‪ 3‬וגישות ידועות להסקה סטטית של מנעולים‪ .‬בגישה זו‪ ,‬אנחנו‬
‫מגבילים את הסנכרון שיכול להיות ממומש בספריות הטרנזקציוניות לסנכרון עם מאפיינים הדומים לנעילות –‬
‫סנכרון זה הינו דומה לסנכרון המבוסס על מנעולים סמנטיים כמתואר בספרות של בסיסי נתונים‪.‬‬
‫בפרק זה המידע על הפעולות הפוטנציאליות העתידיות מתבטא על ידי נעילה של קבוצת הפעולות שיתכן‬
‫ויופעלו (בהקשר זה‪ ,‬פעולה מזוהה על ידי פרוצדורה עם ערכים ספציפיים המועברים לה כפרמטרים)‪.‬‬
‫הסנכרון בספרייה הטרנזקציונית צריך לוודא שלא יתכנו בו‪-‬זמנית שתי פעולות מורכבות המחזיקות מנעול‬
‫על פעולות לא‪-‬קומוטטיביות‪.‬‬
‫בפרק זה אנחנו מתארים אלגוריתם סטטי שאוכף אטומיות של קטעי קוד המשתמשים במספר ספריות‬
‫טרנזקציוניות כאלה‪ .‬אלגוריתם זה מתבסס על גרסה סמנטית של הפרוטוקול ‪.two-phase-locking‬‬
‫אנחנו מממשים את האלגוריתם הסטטי ומראים שהוא מייצר סנכרון יעיל וסקלאבילי‪.‬‬
‫‪8‬‬
‫פרק ‪ – 5‬עבודות קודמות‬
‫פרק זה סוקר עבודות קודמות רלוונטיות ומסביר קשרים לחומר המוצג בתזה זו‪.‬‬
‫פרק ‪ – 6‬מסקנות‬
‫בתזה אנחנו מציגים שלוש גישות חדשות לאכיפת אוטומטית של אטומיות על ידי הוספת סנכרון עדין לקטעי‬
‫קוד‪ .‬אנחנו מראים שהגישות הן ישימות למגוון תוכניות מקביליות מהעולם האמיתי‪.‬‬
‫הגישות שלנו מייצרות סנכרון יעיל וסקלאבילי על ידי שילוב שיטות קומפילציה‪ ,‬שיטות זמן‪-‬ריצה‪ ,‬וספריות‬
‫עם סנכרון ייעודי‪.‬‬
‫כל הגישות שאנחנו מציגים מנצלות תכנות סמנטיות של תוכניות‪ :‬הגישה בפרק ‪ 0‬מנצלת את הצורה של‬
‫הזיכרון המשותף‪ ,‬הגישות בפרקים ‪ 3‬ו ‪ 4‬מנצלות מידע על הפעולות הפוטנציאליות העתידיות של הספרייה‬
‫הנקראות מקוד הלקוח שלה‪ .‬בפרט‪ ,‬אנחנו מראים ששימוש במידע סמנטי מקדים על תוכניות הוא שימושי‬
‫ומאפשר סנכרון אפקטיבי‪.‬‬
‫במקרים רבים‪ ,‬השילוב של מנעול (לדוגמא‪ ,‬מנעול המאפשר כותב‪-‬בודד והרבה קוראים) וזיכרון משותף‬
‫המוגן על ידו‪ ,‬יכול להראות כספרייה טרנזקציונית‪ .‬לפיכך‪ ,‬במובן מסוים‪ ,‬הסנכרון המוצג בפרק ‪ 3‬מכליל את‬
‫הרעיון הסטנדרטי של מנעול‪.‬‬
‫בנוסף‪ ,‬הגישות שלנו אינן מבוססות על סנכרון ספקולטיבי (בפרט‪ ,‬אין שימוש במנגנוני הרצה‪-‬מחדש‬
‫‪ .)rollback‬דבר זה מראה שסנכרון ללא ספקולציה יכול להיות אפקטיבי (אף על פי ששיטות סנכרון‬
‫מודרניות רבות מבוססות על ספקולציות)‪ .‬בעתיד‪ ,‬יתכן ויהיה מעניין להבין את הישימות והתועלת של‬
‫הרעיונות שלנו תוך שילוב שימוש בספקולציות‪.‬‬
‫‪9‬‬