Efficient and Consistent Transaction Processing in

Transcription

Efficient and Consistent Transaction Processing
in
Wireless Data Broadcast Environments
Dissertation
zur Erlangung des
akademischen Grades des
Doktors der Naturwissenschaften
an der
Universität Konstanz
Fachbereich Informatik und
Informationswissenschaft
vorgelegt von
André Seifert
Begutachtet von
1. Referent:
2. Referent:
Prof. Dr. Marc H. Scholl, Universität Konstanz
Prof. Dr. Daniel A. Keim, Universität Konstanz
Tag der Einreichung: 05.01.2005
Tag der mündlichen Prüfung: 27.04.2005
ii
“Mit Worten verhält es sich wie mit Sonnenstrahlen — je mehr man sie kondensiert, um so tiefer dringen diese.”
– Robert Southey
Zusammenfassung
Die hybride, d.h., push- und pull-basierte Datenkommunikationsmethode wird sich wahrscheinlich
als primärer Ansatz für die Verteilung von Massendaten an große Benutzergruppen in mobilen
Umgebungen durchsetzen. Eine wesentliche Aufgabenstellung innerhalb hybrider Datenkommunikationsnetze ist es, Klienten einen konsistenten und aktuellen Blick auf die vom Server entweder
über einen Breitband-Broadcastkanal oder mehrere dedizierte Schmallband-Unicastkanäle bereitgestellten Daten zu geben. Ein ebenso wichtiges Forschungsgebiet innerhalb hybrider Datenkommunikationssysteme stellt die Pufferverwaltung der mobilen Endgeräte dar, welche die Diskrepanz
zwischen der Struktur und dem Inhalt des Broadcast-Programmes und den klienten-spezifischen
Informationsbedürfnissen und Datenzugriffsmustern aufzulösen versucht. Weiterhin kommt dem
Klientenpuffer die Aufgabe zu, die sequentielle Zugriffscharakteristik des Broadcastkanals weitgehend zu verbergen und er kann darüberhinaus als Speicherort verwendet werden, um veraltete —
aber für den Klienten immer noch nützliche — Datenobjekte vorzuhalten, die demnächst vom Server physikalisch gelöscht werden sollen oder dort bereits gelöscht worden sind.
Die vorliegende Dissertation stellt zunächst verschiedene drahtlose Netzwerktypen vor, welche
derzeit zur mobilen Datenkommunikation zur Verfügung stehen und zeigt anschließend, daß die
Mehrheit heutiger drahtloser Netze über eine Asymmetrie in der Bandbreitenkapazität, dem Datenvolumen sowie der Servicelast verfügt. Es wird aufgezeigt, daß die hybride Datenkommunikation,
welche die traditionelle Pull- und die relativ neue Pushtechnik vereint, eine attraktive Kommunikaiii
iv
Zusammenfassung
tionsvariante zur Schaffung skalierbarer und flexibler mobiler Datendienste darstellt. Es folgt ein
kurzer Überblick über die verschiedenen umweltbezogenen und systemimmanenten Einschränkungen, welche mobile Computersysteme ausgesetzt sind und wir schlussfolgern daraufhin, daß es
in mobilen drahtlosen Umgebungen wesentlich schwieriger als in traditionellen stationären Datenübertragungsnetzen ist, gute Performanceergebnisse in Verbindung mit starken semantischen
Konsistenzgarantien für Transaktionen zu erreichen. Im gleichen Zuge werden mögliche Techniken
präsentiert, um Datenkonflikte zwischen parallel laufenden Transaktionen zu vermeiden bzw. deren Anzahl zu verringern und es werden Möglichkeiten vorgestellt, wie Datenkonflikte erkannt und
aufgelöst werden können. Wenn man hybride Datenkommunikationsnetze mit hoher Performance,
Skalierbarkeit und Verlässlichkeit entwerfen und realisieren möchte und darüber hinaus auch noch
strenge Anforderungen an die Datenkonsistenz und -aktualität des Systems stellt, müssen — neben
der Transaktionskontrolle — diverse andere performance- und missionskritische Aspekte betrachtet
werden. Um dieser Forderung nachzukommen, beschäftigt sich die Arbeit u.a. mit den Themen des
Broadcast-Schedulings und der Broadcast-Indexierung und es werden hierzu in der Literatur
vorgeschlagene Ansätze präsentiert sowie evaluiert.
Im Anschluß an die Darlegung der praktischen Notwendigkeit und dem zunehmenden Interesse
an der Gewährleistung einer zeitnahen und konsistenten Bereitstellung von Masseninformation über
mobile Breitband-Broadcastkanäle, schließt sich eine Diskussion über die Herausforderungen und
vielfältigen Probleme, welche hiermit verbunden sind, an. In diesem Zusammenhang wird behauptet, daß die momentan vorhandenen Definitionen von Isolationsgraden nicht für die Implementation
von Transaktionsprotokollen, welche für Nur-Lese-Transaktionen kreiert werden, geeignet sind, da
diese eventuell ungewollte — obwohl korrekte — Datenzugriffe aufgrund nicht vorhandener Datenaktualitätsgarantien erlauben. Um diesem Problem Abhilfe zu schaffen, werden vier neue Isolationsgrade, welche zahlreiche nützliche Datenkonsistenz- und -aktualitätsgarantien für Nur-LeseTransaktionen zur Verfügung stellen, definiert und es werden geeignete Implementierungen dieser
Isolationsgrade für hybride Datenkommunikationsnetze präsentiert. Um Performanceunterschiede
zwischen den neu definierten Isolationsgraden bzw. Protokollen zu ermitteln, wurden zahlreiche
empirische Experimente durchgeführt, welche zeigen, daß der Strict Forward BOT View Consi-
v
stency Isolationsgrad und seine Implementation, welche die Bezeichnung MVCC-SFBVC trägt,
die besten Performanceergebnisse unter den verglichenen Transaktionsprotokollen erzielt.
Um die Antwortzeiten von mobilen Anwendungen zu verkürzen und um eine hohe Skalierbarkeit von hybriden Datenkommunikationssystemen zu erreichen, spielt die Pufferverwaltung der
mobilen Klienten (d.h. Endgeräten) eine wesentliche, wenn nicht die entscheidende Rolle. Da
existierende Pufferverwaltungsstrategien nur eine ungenügende Unterstützung für MehrversionsTransaktionsprotokolle bieten, stellt diese Arbeit eine neue Pufferersetzungs- und -vorabrufstrategie
vor, welche den Namen MICP trägt. Das Acronym MICP steht dabei für Multi-version Integrated Caching und Prefetching und stellt eine hybride Pufferverwaltungsmethode dar, welche
sowohl Datenseiten als auch Datenobjekte verwalten kann. Während Datenseiten nach dem traditionellen LRU-Verfahren ersetzt werden, führt das MICP-Verfahren Objektersetzungs- und vorabrufentscheidungen auf der Basis zahlreicher performance-kritischer Informationen durch, wozu u.a. die Aktualität und Häufigkeit vorangegangener Objektzugriffe, die prognostizierte Änderungswahrscheinlichkeit der gespeicherten Datenobjekte sowie deren Wiederbeschaffungskosten
zählen. Um auf bestimmte speicherungsrelevante Ereignisse, wie z.B., daß bestimmte gespeicherte
Objektversionen für die Ausführung der momentan laufenden Transaktion(en) nutzlos geworden
sind, reagieren zu können, ist der MICP-Puffermanager eng an den Transaktionmanager gekoppelt. Um zu vermeiden, daß nützliche — jedoch nicht-wiederbeschaffbare — Objektversionen mit
wiederbeschaffbaren Objektversionen um verfügbare Pufferressourcen konkurrieren müssen, teilt
das MICP-Verfahren den vorhandenen Speicherplatz des Klientenpuffers in zwei unterschiedlich
große Segmente auf: die sogenannte REC- und NON-REC-Partition. Zur Beurteilung der Effizienz der MICP-Puffermanagementstrategie wurden umfangreiche Simulationsstudien durchgeführt,
welche zeigen, daß mobile Klienten, die nicht das MICP-Verfahren zur Ausführung von Nur-LeseTransaktionen einsetzen, einen durchschnittlichen Performanceverlust von etwa 19% erleiden.
Schließlich widmet sich die Arbeit dem Problem, Serialisierbarkeit von Lese-SchreibTransaktionen in Verbindung mit guten Antwortzeiten und einer niedrigen Transaktionsabbruchsrate in hybriden drahtlosen Datenkommunikationsnetzen zu erreichen. Um diese Ziele zu verwirklichen, stellt die Arbeit eine Familie von fünf Mehrversions-Transaktionsprotokollen vor, die den
vi
Zusammenfassung
Namen MVCC-* trägt. Die einzelnen Protokolle der MVCC-*-Familie unterscheiden sich dabei
hinsichtlich der Scheduling-Performance, den Datenaktualitätsgarantien, welche den Leseoperationen der Transaktionen zugesichert werden, sowie der Speicher- und Zeitkomplexität. Es werden die Performanceabweichungen zwischen den einzelnen Protokollen der MVCC-*-Familie,
welche aufgrund unterschiedlicher Datenaktualitätsgarantien und Schedulingentscheidungen entstehen, quantifiziert und außerdem werden die Performanceergebnisse mit denen, welche für das
bekannte Snapshot Isolation Protokoll entstehen, verglichen. Da die MVCC-*-Protokollfamilie
für Schedulingentscheidungen nur einfache Lese- und Schreiboperationen zur Laufzeit und keine
semantischen Informationen über ihre zugrunde liegenden Transaktionen verwendet, skizziert und
evaluiert die Arbeit diverse Möglichkeiten, welche die vorgeschlagenen Transaktionsprotokolle erweitern, um Datenkonflikte zu vermeiden bzw. zu reduzieren. Hierzu gehört u.a. die Spezifikation
von alternativen Schreiboperationen für ursprüngliche Änderungsoperationen, die Reduktion der
Datengranularität auf welcher die Transaktionskontrolle basiert sowie die Erhöhung der Anzahl
der vom System vorgehaltenen Versionen der Datenobjekte. Anschließend wird verdeutlicht, daß
die MICP-Pufferverwaltungsstrategie auch dann dem LRFU-Verfahren bezüglich der Pufferperformance überlegen sein kann, wenn diese in Verbindung mit der Ausführung von Lese-SchreibTransaktionen eingesetzt wird.
“It is with words as with sunbeams — the
more they are condensed, the deeper they
burn.”
– Robert Southey
Abstract
Hybrid, i.e., push and pull-based, data delivery is likely to become a method of choice for the distribution of information to a large user population in many new mobile and stationary applications.
One of the major issues in hybrid data delivery networks is to provide clients with a consistent and
current view on the data delivered by the server through a broadband broadcast channel and a set of
dedicated narrowband unicast channels while minimizing the users’ response times. Another complementary problem in hybrid data delivery is the caching policy of the clients that needs to solve
the mismatch between the server’s broadcast schedule and the individual user’s access pattern, to
compensate for the sequential access characteristics of the air-cache, and to function as a last-resort
source for non-current object versions that have been physically evicted from the server storage
facilities.
In this doctoral thesis, we first discuss the various wireless network types currently available
to provide data communication services and show that the majority of them exhibit asymmetry in
the bandwidth capacity (i.e., those networks which have a significantly higher bandwidth available
from servers to clients than in the reverse direction), data volume, and service load. We then argue
that hybrid data delivery which integrates the traditional pull and the rather novel push techniques
is an attractive communication mode to create highly scalable and flexible data services. A brief
overview of the environmental and system-immanent constraints of mobile computing systems follows and we reason that providing good system throughput results along with strong semantic guarvii
viii
Abstract
antees for transactions is more challenging in mobile, portable environments than in conventional
fixed networks. In the same vein, we present possible techniques to avoid and reduce the number
of data conflicts that may arise and discuss ways to detect and more importantly, to resolve them
once identified. To design and deploy high performance, scalable, and reliable hybrid delivery networks which provide strong semantic guarantees w.r.t. data consistency and currency to its users,
various other performance- and mission-critical issues besides concurrency control (CC) needs to
be addressed. To take account of that, we present and evaluate several strategies proposed in the
literature on major topics such as broadcast scheduling or broadcast channel indexing not being
covered in separate chapters in later parts of the thesis.
After motivating the practical need for timely and consistent data delivery to thousands of information consumers, we discuss the challenges and problems involved in supporting appropriate consistency and currency guarantees to dissemination-based applications. We then argue that
current definitions of isolation levels (ILs) are inappropriate for implementations of CC protocols
suitable for read-only transactions as they allow unwanted, though consistent, data access patterns
due to lack of data currency guarantees. To rectify the problem, we define four new ILs providing various useful data consistency and currency guarantees to read-only transactions and present
suitable implementations of the proposed ILs for hybrid data delivery networks. To evaluate the
performance trade-offs among the newly defined ILs and protocols, respectively, extensive numerical experiments are done, demonstrating that the Strict Forward BOT View Consistency level and
its implementation, termed MVCC-SFBVC, provides the best possible performance results among
the CC protocols studied.
To shorten the response times and to achieve high scalability, client caching is one, if not the
most fundamental technique in order to achieve these goals. In this thesis, we introduce a novel
client cache replacement and prefetching strategy, called MICP, that makes eviction and prefetching
decisions sensitive to various performance-critical factors including the objects’ access recency and
frequency in the recent past, their update likelihood in the near future, their re-acquisition costs, etc.
On top of that, MICP is tightly-coupled to the transaction manager in order to obtain and instantly
react on information indicating that locally stored non-current object versions have become useless
ix
from the CC perspective and can therefore be evicted from the client cache. To prevent that useful,
but non-re-cacheable, object versions compete with re-cacheable object versions for available cache
resources, MICP logically divides the cache into two variable-sized segments, dubbed REC and
NON-REC. To evaluate MICP’s cache management efficiency, we report on extensive experiments
showing that MICP is able to improve transaction throughput results achieved by state-of-the-art
online cache replacement and prefetching policies by about 19% if used for executing read-only
transactions.
Finally, we consider the challenging problem of providing serializability along with good performance and strong semantic data currency and consistency guarantees to mobile applications
issuing read-write transactions. To achieve this goal, we present a suite of five multi-version concurrency control (MVCC) protocols, denoted MVCC-*, that differ from each other in terms of their
scheduling performance, data currency guarantees, and space and time complexity. We quantify
the performance deviations among the protocols of the MVCC-* suite due to applying different read
rules for servicing read requests and, additionally, compare their results with those measured for
the well-known Snapshot Isolation scheme. As the MVCC-* suite is based only on analyzing read
and write operations at runtime and does not exploit semantic information about their constituent
transactions, we outline and partially evaluate (by means of simulation) possibilities of extending
the proposed protocols by conflict reducing and avoiding measures such as specifying alternative
write operations, reducing the data granularity at which CC is applied, increasing the number of
versions managed in the system, etc. Last, but not least, we provide evidence that MICP is superior
to LRFU, being the best cache replacement policy known so far, if used to improve the response
times of read-write transactions.
x
Abstract
“If you wish your merit to be known, acknowledge that of other people.”
– Oriental Proverb
Acknowledgments
First of all, I’m indebted to my dissertation supervisor Marc H. Scholl who — using patience,
motivation, ingenuity, and a laissez-faire management style — managed to get me through to my
PhD. Thanks go to my parents Christine and Frank and my sister Bianca for their unflagging support
throughout the many years of my academic work and their warm and cosy welcome whenever
visiting them during holidays or on other occasions. My thanks go also to my current and exflatmates Denise, Lisa, Stefan, and Tilman, who made living and spending leisure time in Konstanz
a pleasure to me. Finally, thanks to the other members of the Database Research Group and the
anonymous referees of my submitted research papers paving the way to this dissertation for their
insightful, critical, and valuable comments on my research work. A special thanks goes to my PhD
colleague Svetlana for proof-reading large parts of the dissertation manuscript.
xi
xii
Acknowledgments
Contents
Zusammenfassung
iii
Abstract
vii
Acknowledgments
xi
List of Figures
xix
List of Tables
xx
List of Algorithms
xxi
List of Acronyms
xxii
List of Symbols
1
2
xxviii
Introduction
1
1.1
Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
1.3
Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
1.4
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
Background
17
2.1
Basics of Wireless Communication Systems . . . . . . . . . . . . . . . . . . . . .
18
2.2
Wireless Network Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
xiii
xiv
Contents
2.3
3
4
Limitations of Mobile Computing . . . . . . . . . . . . . . . . . . . . . . . . . .
30
2.3.1
Techniques to Avoid or Reduce Reconciliation Conflicts . . . . . . . . . .
32
2.3.2
Techniques to Detect and Resolve Reconciliation Conflicts . . . . . . . . .
35
Hybrid Data Delivery
37
3.1
Why to Use Hybrid Data Delivery . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3.2
Hybrid Data Delivery Networks . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.2.1
Organizing the Broadcast Program . . . . . . . . . . . . . . . . . . . . . .
41
3.2.2
Indexing the Broadcast Program . . . . . . . . . . . . . . . . . . . . . . .
47
Processing Read-only Transactions Efficiently and Correctly
61
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
4.1.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
4.1.2
Contribution and Outline . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
4.2
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
4.3
New Isolation Levels Suitable for Read-only Transactions . . . . . . . . . . . . . .
70
4.3.1
Why Serializability may be Insufficient . . . . . . . . . . . . . . . . . . .
70
4.3.2
BOT Serializability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
4.3.3
Strict Forward BOT Serializability . . . . . . . . . . . . . . . . . . . . . .
76
4.3.4
Update Serializability . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
4.3.5
Strict Forward BOT Update Serializability . . . . . . . . . . . . . . . . . .
83
4.3.6
View Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
Implementation Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
4.4.1
MVCC-BS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
4.4.2
MVCC-SFBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.4.3
MVCC-SFBUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.4.4
MVCC-SFBVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.4
4.5
Performance Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.5.1
System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Contents
4.6
5
xv
4.5.2
Workload Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.5.3
Experimental Results of the Proposed CC Protocols . . . . . . . . . . . . . 110
4.5.4
Comparison to Existing CC Protocols . . . . . . . . . . . . . . . . . . . . 114
Conclusion and Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Client Caching and Prefetching Strategies to Accelerate Read-only Transactions
5.1
5.2
5.3
5.4
121
Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.1.1
Multi-Version Client Caching . . . . . . . . . . . . . . . . . . . . . . . . 122
5.1.2
Multi-Version Client Prefetching . . . . . . . . . . . . . . . . . . . . . . . 124
5.1.3
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
System Design and General Assumptions . . . . . . . . . . . . . . . . . . . . . . 125
5.2.1
Data Delivery Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.2.2
Client and Server Cache Management . . . . . . . . . . . . . . . . . . . . 127
5.2.2.1
Data Versioning . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.2.2.2
Client Cache Synchronization . . . . . . . . . . . . . . . . . . . 129
MICP: A New Multi-Version Integrated Caching and Prefetching Algorithm . . . . 130
5.3.1
PCC: A Probabilistic Cost-based Caching Algorithm . . . . . . . . . . . . 130
5.3.2
PCP: A Probabilistic Cost-based Prefetching Algorithm . . . . . . . . . . 135
5.3.3
Maintaining Historical Reference Information . . . . . . . . . . . . . . . . 138
5.3.4
Implementation and Performance Issues . . . . . . . . . . . . . . . . . . . 141
Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.4.1
System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.4.2
Workload Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.4.3
Other Replacement Policies Studied . . . . . . . . . . . . . . . . . . . . . 147
5.4.4
Basic Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.4.5
Additional Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.4.5.1
Effects of the Version Management Policy on MICP-L . . . . . . 152
5.4.5.2
Effects of the History Size on MICP-L . . . . . . . . . . . . . . 153
xvi
Contents
5.5
6
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Processing Read-Write Transactions Efficiently and Correctly
6.1
6.2
6.3
6.4
6.5
157
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.1.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.1.2
Contribution and Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
System Design and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
6.2.1
Data Delivery Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
6.2.2
Database and Transaction Model . . . . . . . . . . . . . . . . . . . . . . . 163
A New Suite of MVCC Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
6.3.1
MVCC-BOT Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
6.3.2
Optimizing the MVCC-BOT Scheme . . . . . . . . . . . . . . . . . . . . 175
6.3.3
MVCC-IBOT Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.3.4
Optimizing the MVCC-IBOT Scheme . . . . . . . . . . . . . . . . . . . . 195
6.3.5
MVCC-EOT Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
Performance-related Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
6.4.1
Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
6.4.2
Disconnections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
6.4.3
Conflict Reducing Techniques . . . . . . . . . . . . . . . . . . . . . . . . 213
Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
6.5.1
Simulator Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
6.5.2
Workload Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
6.5.3
Comparison with other CC Protocols . . . . . . . . . . . . . . . . . . . . 223
6.5.4
Basic Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 225
6.5.5
Results of the Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . 226
6.5.5.1
Effects of Varying the Data Contention Level . . . . . . . . . . . 227
6.5.5.2
Effects of Specifying Alternative Write Operations . . . . . . . . 228
6.5.5.3
Effects of Intermittent Connectivity . . . . . . . . . . . . . . . . 229
Contents
xvii
6.5.5.4
7
Effects of Using Various Caching and Prefetching Policies . . . . 232
Conclusion and Future Work
237
7.1
Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
7.2
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Bibliography
247
List of Figures
2.1
Demand-assign based multiple access techniques . . . . . . . . . . . . . . . . . .
19
3.1
Various possible organization structures of the broadcast program. . . . . . . . . .
48
3.2
Example illustrating the signature comparison process. . . . . . . . . . . . . . . .
50
3.3
An example illustrating the Hashing A data access protocol . . . . . . . . . . . . .
54
3.4
Tree-indexed broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
4.1
Multi-version serialization graph of MVH1 . . . . . . . . . . . . . . . . . . . . . .
71
4.2
Multi-version serialization graph of MVH3 . . . . . . . . . . . . . . . . . . . . . .
85
4.3
Organization structure of the broadcast program. . . . . . . . . . . . . . . . . . .
96
4.4
An overview of the simulation model used to generate the performance statistics. . 111
4.5
Throughput achieved by the protocols implementing the newly defined ILs . . . . . 113
4.6
Wasted work performed by the protocols implementing the newly defined ILs . . . 114
4.7
Protocols studied with their respective data consistency and currency guarantees. . 116
4.8
Throughout results of various CC protocols compared to MVCC-SFBVC . . . . . 117
4.9
Wasted work performed by various CC protocols compared to MVCC-SFBVC . . 118
5.1
An example illustrating the peculiarities of multi-version client caching. . . . . . . 124
5.2
Organization of the client cache. . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.3
Performance of MICP-L and its competitors under various transaction sizes . . . . 151
5.4
Client cache hit rate of MICP-L and its competitors under various transaction sizes
5.5
Performance of MICP-L and its competitors under various versioning strategies . . 153
5.6
Performance of MICP-L under various cache sizes when HCR is varied. . . . . . . 154
xviii
152
List of Figures
xix
6.1
Structure of the broadcast program. . . . . . . . . . . . . . . . . . . . . . . . . . . 162
6.2
Multi-version serialization graph of MVH4 and MVH5 . . . . . . . . . . . . . . . . 167
6.3
Multi-version serialization graph of MVH6 . . . . . . . . . . . . . . . . . . . . . . 176
6.4
Multi-version serializability graph of MVH8 . . . . . . . . . . . . . . . . . . . . . 195
6.5
Multi-version serialization graph of MVH9 . . . . . . . . . . . . . . . . . . . . . . 212
6.6
Two level history showing lower and higher order operations of two transactions . . 217
6.7
Performance results of the MVCC-* suite and SI under various transaction sizes . . 226
6.8
Performance of MVCC-BOT and MVCC-IBOT and their optimized variants . . . . 227
6.9
Performance of various CC protocols by varying the number of data updates . . . . 228
6.10 Performance gain by providing alternative write operations . . . . . . . . . . . . . 229
6.11 Performance degradation when increasing the client disconnection probability – I . 230
6.12 Performance degradation when increasing the client disconnection probability – II . 231
6.13 Performance deviation between various client caching and prefetching policies – I . 233
6.14 Performance deviation between various client caching and prefetching policies – II
233
List of Tables
2.1
Various characteristics of current and emerging wireless network technologies. . .
28
4.1
Newly defined ILs and their core characteristics. . . . . . . . . . . . . . . . . . . .
95
4.2
Summary of the system parameter settings – I (Read-only transaction experiments) 107
4.3
Summary of the system parameter settings – II (Read-only transaction experiments) 109
4.4
Summary of the workload parameter settings (Read-only transaction experiments) . 111
5.1
Summary of the system parameter settings – I . . . . . . . . . . . . . . . . . . . . 144
5.2
Summary of the system parameter settings – II . . . . . . . . . . . . . . . . . . . . 146
5.3
Summary of the workload parameter settings . . . . . . . . . . . . . . . . . . . . 147
6.1
Definitions of possible conflicts between transactions. . . . . . . . . . . . . . . . . 164
6.2
Summary of the system parameter settings – I (Read-write transaction experiments) 220
6.3
Summary of the system parameter settings – II (Read-write transaction experiments) 222
6.4
Summary of the workload parameter settings (Read-write transaction experiments)
6.5
The MVCC-* suite at a glance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
xx
224
List of Algorithms
3.1
Multi-disk broadcast generation algorithm . . . . . . . . . . . . . . . . . . . . . . .
45
3.2
Access protocol for retrieving data objects by using the integrated signature scheme.
51
3.3
Data access protocol of the Hashing A scheme . . . . . . . . . . . . . . . . . . . .
53
3.4
Access protocol for retrieving data objects by using the (1,m) indexing scheme . . .
57
4.1
Algorithm used by MVCC-SFBS to map read operations to object version reads . . 101
4.2
Algorithm used by MVCC-SFBUS to map read operations to object version reads . 103
5.1
Probabilistic Cost-based Caching (PCC) Algorithm . . . . . . . . . . . . . . . . . . 136
5.1
Probabilistic Cost-based Caching (PCC) Algorithm (cont’d) . . . . . . . . . . . . . 137
5.2
Probabilistic Cost-based Prefetching (PCP) Algorithm . . . . . . . . . . . . . . . . 139
6.1
MVCC-BOT’s scheduling algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.2
CCR processing and transaction validation under MVCC-BOT . . . . . . . . . . . . 168
6.3
CCR processing and transaction validation under MVCC-BOTO . . . . . . . . . . . 177
6.4
MVCC-BOTO ’s scheduling algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 178
6.5
CCR processing and transaction validation under MVCC-IBOT . . . . . . . . . . . 185
6.6
MVCC-IBOT’s scheduling algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 186
6.7
CCR processing and transaction validation under MVCC-IBOTO . . . . . . . . . . 196
6.8
MVCC-IBOTO ’s scheduling algorithm . . . . . . . . . . . . . . . . . . . . . . . . 197
6.9
CCR processing and transaction validation under MVCC-EOT . . . . . . . . . . . . 205
6.10 MVCC-EOT’s scheduling algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 206
xxi
List of Acronyms
2G
Second Generation
2.5G
Second-and-a-half Generation
3G
Third Generation
4G
Fourth Generation
ACL
Asynchronous Connection-oriented
ADAT
Air-Cache Data Access Time
ALOHA
The earliest packet network developed at the University of Hawaii at
Monoa
AIPT
Air-Cache Index Probe Time
ATT
Air-Cache Tuning Time
AWT
Air-Cache Wait Time
BID
Bucket Identifier
BOT
Begin-Of-Transaction
CC
Concurrency Control
CCR
Concurrency Control Report
CCSize
Client Cache Size
CD-SFR-SQ-MVSG
Causal Dependency Strict Forward Read Single Query Multi-Version Serialization Graph
CDMA
Code Division Multiple Access
CPU
Central Processing Unit
CRF
Combined Recency and Frequency Value
CRM
Customer Relationship Management
xxii
xxiii
CSMA
Carrier Sense Multiple Access
DBSize
Database Size
DBS
Direct Broadcast Satellite
EDGE
Enhanced Data Rate for GSM Evolution
EOT
End-Of-Transaction
ERP
Enterprise Resource Planning
FCV
Forward Consistent View
FDD
Frequency Division Duplexing
FDMA
Frequency Division Multiple Access
FIFO
First-In-First-Out
FIR
Fast Infrared
GEO
Geostationary Orbit
GPRS
General Packet Radio Service
GPS
Geographical Positioning Systems
GSM
Global System Mobile
HCR
History Size / Cache Size Ratio
HSCSD
High Speed Circuit Switched Data
IBOT
In-Between-Of-Transaction
ID
Identifier
IEEE
Institute of Electric and Electronic Engineers
IL
Isolation Level
IR
Infrared
IrDA
Infrared Data Association
ISDN
Integrated Services Digital Network
IS
Index Segment
IS-136
Interim Standard 136
IS-95
Interim Standard 95
IS-95B
Second Generation of the IS-95 Standard
xxiv
List of Acronyms
ISM
Industrial, Scientific and Medical
KIWI
Kill It With Iron
LEO
Low-altitude Earth Orbits
LFU
Least Frequently Used
LOS
Line-of-Sight
LRFU
Least Recently Frequently Used
LRFU-P
Prefetch-based Variant of the LRFU Algorithm
LRU
Least Recently Used
MAC
Media Access Control
MBC
Major Broadcast Cycle
MIBC
Minor Broadcast Cycle
MIPS
Million Instructions Per Second
MICP
Multi-Version Integrated Caching and Prefetching Algorithm
MICP-L
Lightweight Multi-Version Integrated Caching and Prefetching Algorithm
MEO
Medium-altitude Earth Orbits
MVSG
Multi-Version Serialization Graph
MVCC
Multi-Version Concurrency Control
MVCC-BOT
Multi-Version Concurrency Control Protocol with BOT data currency
guarantees
MVCC-BOTO
Optimized Multi-Version Concurrency Control Protocol with BOT data
currency guarantees
MVCC-BS
Multi-Version Concurrency Control Protocol with BOT Serializability
Guarantees
MVCC-EOT
Multi-Version Concurrency Control Protocol with EOT data currency guarantees
MVCC-IBOT
Multi-Version Concurrency Control Protocol with IBOT data currency
guarantees
xxv
MVCC-IBOTO
Optimized Multi-Version Concurrency Control Protocol with IBOT data
currency guarantees
MVCC-SFBS
Multi-Version Concurrency Control Protocol with Strict Forward BOT Serializability Guarantees
MVCC-SFBUS
Multi-Version Concurrency Control Protocol with Strict Forward BOT Update Serializability Guarantees
MVCC-SFBVC
Multi-Version Concurrency Control Protocol with Strict Forward BOT
View Consistency Guarantees
MIR
Medium Infrared
MMDS
Multi-channel Multi-point Distribution Service
MMS
Multi-Media Messaging
MOB
Modified Object Buffer
NLOS
Non-Line-of-Sight
OBSize
Object Size
OCC
Optimistic Concurrency Control
OCSize
Client Object Cache Size
OID
Object Identifier
OIL
Object Invalidation List
OFDM
Orthogonal Frequency Division Multiplexing
OVRPL
Object Version Read Prohibition List
OVWPL
Object Version Write Prohibition List
P
Probability-based Cache Replacement Algorithm
P-P
Prefetch-based Variant of the P Algorithm
PDC
Pacific Digital Cellular
PGSize
Page Size
PLE
Prohibition List Entry
P-P
Prefetch-based Variant of the P Algorithm
PSTN
Public Switched Telephone Network
xxvi
List of Acronyms
QoS
Quality of Service
RC
Read Consistency
RF
Radio Frequency
RFP
Read Forward Phase
RFF
Read Forward Flag
RFSTS
Read Forward Stop Timestamp
RLC
Radio Link Control
RPP
Relative Performance Penalty
RTT
Round-Trip Time
SBSize
Server Buffer Size
SCO
Synchronous Connection-oriented
SI
Snapshot Isolation
SIR
Serial Infrared
SMS
Short Message Service
SPOT
Smart Personal Objects Technology
ST-MVSG
Start Time Multi-Version Serialization Graph
SFR-MVSG
Strict Forward Read Multi-Version Serialization Graph
SFR-SQ-MVSG
Strict Forward Read Single Query Multi-Version Serialization Graph
STS
Start Timestamp
TID
Transaction Identifier
TDD
Time Division Duplexing
TDMA
Time Division Multiple Access
TOB
Temporary Object Cache
US
Update Serializability
VFIR
Very Fast Infrared
W2 R
Cache Replacement and Prefetching Algorithm
W2 R-B
Broadcast-based Variant of the W2 R Algorithm
WPAN
Wireless Personal Area Networks
xxvii
WLAN
Wireless Local Area Network
WMAN
Wireless Metropolitan Area Network
WWAN
Wireless Wide Area Network
List of Symbols
BDi
Broadcast disk i
Bcurr
Bucket currently being broadcast
CPre,i
Pre-condition of transaction Ti
CPost,i
Post-condition of transaction Ti
Chit
Average client cache hit rate
CSi
Client object cache segment i
CT (Ti )
Set of transactions that conflict with Ti
CT S(x)
Commit timestamp of data object or data page x
CT SOIL (x)
Commit timestamp of object x maintained in the object invalidation list
CRFn (x)
Combined recency and frequency value of object x over the last n references
D
Database
DS
Database state
DSRFST S(Ti )
Database state as it existed at RFST S(Ti )
DSEOT (Ti )
Database state as it existed at EOT (Ti )
E
Edge of a multi-version serialization graph
FV M(Ti )
Final validation message of transaction Ti
IDCCR,c
Identifier of the current CCR
IDCCR,l
Identifier of the latest CCR received by the client
IDre f ,c
Reference identifier associated with the current object reference
IDre f ,l (x)
Reference identifier assigned to object x when it was last accessed
IDupdate,c
Update identifier associated with the current object update
xxviii
xxix
IDupdate,l (x)
Update identifier assigned to object x when it was last accessed
ISi, j
j-th index segment of data channel i
LMBC
Average MBC length
N
Node of a multi-version serialization graph
Nver (xi )
Number of versions of object x with CTSs equal to or older than xi currently
being kept by the server
Nop, j
Number of operations executed so far by transaction T j
Nob j
Number of objects for which the client retains historical reference information
PT (Ti )
Set of transactions that precede Ti in any valid serialization order
ReadSet(Ti )
Read set of the read-only or read-write transaction Ti
RCt (xi )
Re-acquisition costs of object version xi at time t
RFF(Ti )
Read forward flag of transaction Ti
RFST S(Ti )
Read forward stop timestamp of transaction Ti
Sbcast
Signature file containing the signatures of all objects scheduled for broadcasting
Si
Signature of object i
Si, j
j-th atomic statement of transaction Ti
Smem
Client memory size
Squery
Query signature
ST (Ti )
Set of transactions that succeed Ti in any valid serialization order
ST S(Ti )
Start timestamp of transaction Ti
Tactive (Ti )
Set of read-write transactions that were active during Ti execution time, but
committed before Ti
T (xi )
Estimated weighted time to service a fetch request for object version xi
Ti
Read-only or read-write transaction with the index i
Tidata
Time when the i-th data bucket is periodicially broadcast in the broadcast
cycle
xxx
List of Symbols
Ti,index
j
Time at which index bucket at position p(i, j) in the broadcast cycle is
periodically broadcast
Thit (xi )
Amount of time it takes to service a request for object version xi
Tmiss (xi )
Weighted approximation of the amount of time it takes the client to restore
the current state of some active transaction T j in case T j has to be aborted
due to a fetch miss of xi
Tre−exe, j
Estimated time is takes to restore the current state of an active read-only
transaction T j in case T j has to be aborted due to a fetch miss
Ts
Time at which the server begins broadcasting
Tupdate
Set of read-write transactions
T SC (CCRlast )
Timestamp of the last CCR that has been successfully processed by client C
UPn (x)
Update probability of object x based on its n previous updates
W riteSet(Ti
Write set of read-write transaction Ti
α
Aging factor
bucket id(B)
Function returning the BID of bucket B
dindex
Number of times the index segment is broadcast per MBC
dom(xi )
Domain of data object i
f
Version function
h
Hash function
h(k)
Hash function returning the hash value of key k
height
Height of the index tree
ji
j-th index bucket at the i-th index level of the index tree
k
Hash key
lCCR
Length of the CCR segment
ldata
Length of the data segment
lindex
Length of the index segment
lob ject
Lenght of an object in terms of broadcast ticks
level(i)
i-th level of the index tree
xxxi
nbcast
Number of data, index, and CCR buckets contained in one broadcast cycle
ndata
Number of data buckets required to accommodate all data objects scheduled for broadcasting
nnon−index
Number of non-index buckets disseminated between two consecutive index
files
ni
Number of index buckets at i-th level of the index tree
nCCR
Number of CCR buckets reserved for CC-related information recorded during the last MIBC
oMBC
Offset to then next MBC
pi
Probability of accessing the i-th object of the database
p(i, j)
Position of the j-th index bucket of the i-th index level in the broadcast
cycle
ri [x]
Read operation on data object x by transaction Ti
ri [x, v]
Read operation on data object x by transaction Ti and the value read from
x is v
s
Bucket shift value
scurr
Shift value of the bucket currently broadcast
w
Number of previous MBCs for which the broadcast server maintains a data
update history
wi [x]
Write operation on data object x by transaction Ti
wi [x, v]
Write operation on data object x by transaction Ti and the value written into
x is v
xxxii
List of Symbols
“A new scientific truth does not triumph
by convincing its opponents and making
them see the light, but rather because its
opponents eventually die, and a new generation grows up that is familiar with it.”
– Max Planck
Chapter 1
Introduction
This chapter briefly enumerates the reasons why we are concerned about providing well-defined
data consistency and currency guarantees to read-only and read-write transactions along with good
performance within the context of hybrid data delivery networks, lists the major contributions of
the PhD thesis, and presents the dissertation’s outline.
1.1
Problem Statement
Technological advances in the computer, software, and communications industry over the last two
decades have provided users with the possibility to access, produce, and update information without
the constraint of being connected to a fixed (wireline) network. Nowadays, mobile users experience
instant access to various information-dissemination and business-enabling services nearly anytime,
and anywhere on the globe. The rapid evolution of mobile computing is fostered by the following trends and already existing or fourth-coming applications: (a) Due to higher global industrial
and agricultural productivity along with falling average annual working hours [56], people’s leisure
1
2
Chapter 1. Introduction
time has increased being partially filled by intensive application of wireless infotainment applications such as Short Message Services (SMS), Multi-Media Messaging (MMS), wireless games, etc.
(b) In today’s financial and economical world, a lot of business happens outside the corporate walls
at trade fairs, exhibitions, event centers, customer’s premises and the like. In preparation for and
during business meetings remote wireless access to corporate and non-corporate data via mobile
handheld devices is highly desirable and useful in such occasions. (c) Besides the widening appeal
of mobile computing for the industry, a new class of applications, called location-based services
(LBS), attracts the attention of the public and will certainly contribute to the excitement and flourishment of mobile computing in the years to come. Service examples promoting the use of LBSs
are finding a hassle-free route through jammed roads, booking a hotel with room rates lower than
$100 within a 10 km radius of the current position or getting the closest parking space available. As
location-based information (free rooms and parking lots in a city, travel information and devices for
a region, etc.) are typically of interest to a larger user population, data dissemination technology is
often applied in order to deliver the information to the consumers. (d) Besides the application areas
named above, mobile computing will be promoted by the migration of conventional application
domains onto the wireless platform. Prominent examples include enterprise applications such as
Enterprise Resource Planning (ERP), Sales Force Automation or Customer Relationship Management (CRM), financial services like online banking and stock trading, or E-Commerce systems to
name just a few.
This new, fast evolving computing and communications environment presents several challenges to the deployment of mobile data services such as bursty and unpredictable workloads [51, 97], data volume asymmetry, service load asymmetry, network bandwidth asymmetry [3], etc. To address these challenges, different channel operators (e.g., Hughes Network [70]
or StarBand [79]) have started providing new types of information services, namely broadcast or
dissemination-based services, having the potential to adapt and scale to bursty and unexpected
user demands, and to exploit the asymmetric bandwidth property prevalent in many mobile networks (see Section 2.2 for more details). In particular, the recent announcement of the Smart
Personal Objects Technology (SPOT) by Microsoft [108] once more highlights the industrial in-
1.1. Problem Statement
3
terest in and feasibility of utilizing data broadcasting for wireless data services. In the firm belief
that dissemination-based database systems become as widespread as traditional distributed database
systems, they have to provide the same level of database support as traditional OLTP systems. Thus,
an interesting challenge, and at the same time the main theme of this thesis, is to study the issue of
providing transaction support in the framework of a dissemination-based database system [74,122].
Traditional databases use transactions to ensure that programs transform the system from one
consistent state to another in spite of concurrency and failures. The most common correctness criterion adopted by traditional transactions is based on the notion of serializability which guarantees
that even though transactions run concurrently, it seems to the user as if they execute in serial order.
We believe that in mobile data broadcasting environments, even though transaction processing may
be hindered by the various limitations immanent to mobile systems such as frequent disconnection,
limited battery life, low-bandwidth communication (at least in the direction from the client to the
server), reduced storage capacity, etc., transactions that modify the database (i.e., read-write transactions) should, in general, not execute at consistency levels below serializability. Even though
weaker consistency levels than serializability have the potential to support transactions more efficiently, they, however, have the important drawback that the application programmer needs to
be fully aware of conflicts of its transactions with other transactions in the system, and needs to
specify compatibility sets [49, 52] (which group together transactions that can freely interleave),
(state-independent or state-dependent) commutativity tables [111, 160], or pre- and postconditions
of atomic operation steps of the constituent transactions [11, 25] in order to ensure that data consistency is never violated. As information such as commutativity of operations or interstep assertions
cannot be determined automatically and are all but trivial to specify, semantics-based CC complicates application programming greatly and is inherently error-prone. Thus, we initially chose to
abstract from the semantic details of the operations of each transaction, and concentrate only on
the sequence of read and write operations that result from the transaction execution. By following this approach, we are able to devise CC algorithms that are straightforward to implement and
provide transaction support for any dissemination-based application without the need to analyze its
inherent semantics. Clearly, by using the simple read/write model [161] to facilitate CC, we may
4
lose a great deal of potential to enhance concurrency and ultimately performance, however, we partially compensate for this by employing two well-known performance improving techniques: (a)
multi-versioning and (b) fine-grained object-level rather than coarse-grained page-level CC.
To see how both concepts may contribute to the goal of improving concurrency in a
dissemination-based system, consider the following two examples:
Example 1.
Suppose that a mono-version database contains three objects (i.e., bank accounts) x, y, and z such
that x = 0, y = 0, and z = 0. Suppose further that two invariants need to be maintained by the
transactions, namely x + y = 0 and x + y = z. Now suppose that transactions T1 and T2 are executed
concurrently by the scheduler and the following history is produced:1
H1
= r1 [x, 0] r2 [x, 0] r2 [y, 0] w2 [x, −10] w2 [y, 10] w2 [z, 0] c2 r1 [y, 10] w1 [z, 10] a1
In this mono-version history, transaction T1 reads objects x and y before modifying the value of z
and transaction T2 transfers money ($10) from account x to y and subsequently updates summary
account z. In history H1 transaction T1 observes an inconsistent view of the database since it sees
the partial effects of T2 and, consequently, T1 needs to be aborted by the mono-version scheduler in
order to maintain database consistency.
Now let us consider a multi-version system. In this case, after transaction T2 is committed, there
are two distinct snapshots in the database: {x0 = 0, y0 = 0, z0 = 0} and {x2 = −10, y2 = 10, z2 = 0}.
The first snapshot is in the past and is consistent with the current read set of T1 . Thus, T1 can now
be serialized on this snapshot by reading the old version of object y and the resulting history is as
follows:
H2
= r1 [x0 , 0] r2 [x0 , 0] r2 [y0 , 0] w2 [x2 , −10] w2 [y2 , 10] w2 [z2 , 0] c2 r1 [y0 , 0] w1 [z1 , 0] c1
Note that in contrast to history H1 , H2 is serializable in the order T1 < T2 and the database system
selects the version order x0 x2 , y0 y2 , z0 z1 z2 .
Example 2.
Another way to improve the concurrency of a shared resource is to implement CC on a fine (record1 Note
that as H1 is a mono-version history, we ignore the version subscripts on objects x, y, and z, respectively.
5
or object-level) rather than a coarse (page- or table-level) granularity level. To illustrate the difference between both approaches, we use the same example as above and assume again a multi-version
system. However, and in contrast to the example discussed above, the scheduler does not any more
maintain information on the read and write operations of transactions at the object-level, but rather
at the page-level. Now let us assume that object x is stored on page p and object y and z are
contained in page q. Then, history H2 changes as follows:
H3
= r1 [p0 , 0] r2 [p0 , 0] r2 [q0 , 0] w2 [p2 , −10] w2 [q2 , 10] w2 [q2 , 0] c2 r1 [q0 , 10] w1 [q1 , 10] c1
The resulting history H3 is not any more serializable since for any version order of p and q, there
is a cycle in the corresponding multi-version serialization graph. The example nicely illustrates the
occurrence of so-called false or permissable conflicts that occur due to information grouping. To
eliminate most of those conflicts, we consider CC based on objects rather than pages.
Allowing multiple versions of the same object to exist in the database has not only implications
on the system’s scheduling power and data storage requirements, but also on the currency of the
data provided to the users. For example in history H2 , when transaction T1 tries to read object
y, it can either be provided with the latest version of y or an older version of y (i.e., the second
latest version). While choosing the first alternative would produce a non-serializable history, using the second option results in the serializable history H2 . The example clearly illustrates that
multi-versioning does not only influence the system’s scheduling flexibility, but also the currency
of the data provided to the application users. Additionally, the example provides evidence that
serializability alone is insufficient to ensure that transactions see up-to-date data. Note, however,
that data freshness is the most important requirement after data consistency to be satisfied by any
transaction issued in a multi-version dissemination-based environment. The reason is that in realworld information-centered applications such as stock trading and monitoring systems, sport and
traffic information systems, etc., the values of the data objects change frequently and are highly
dynamic [41], thus for the objects to be meaningful to the users they must be up-to-date or at least
“close” to it [145, 163]. Unfortunately, currently available definitions of isolation levels (ILs) either
do not specify the degree of data currency they provide or those that do define it, do, however, not
6
ensure serializability.
To rectify the problem and to take account of the fact that the majority of the transactions initiated in broadcast-based environments is of the read-only type [124] (and any read-only transaction
is easily serializable with all the other committed read-only and read-write transactions in the system by letting it read from a consistent, but not necessarily current database snapshot), we define
four new ILs that provide useful data consistency and currency guarantees to read-only transactions.
In contrast to read-write transactions, which modify the state of the database, read-only transactions
may not necessarily need so strong data consistency guarantees such as serializability, thus clients
may want to allow read-only transactions to be executed under slightly weaker ILs which, however, still guarantee them to observe a transaction-consistent database state. Note that in order to
observe a transaction-consistent database, a read-only transaction must not see the partial effects
of any update transactions, but rather for each update transaction, it must see either all or none of
its effects. Two of our newly defined ILs, namely Strict Forward BOT Update Serializability and
Strict Forward BOT View Consistency, provide such weaker consistency guarantees along which
firm data currency guarantees (see Section 4.3.2 for information on the various types of data currency). As our ILs are defined in an implementation-independent manner by using a combination
of conditions on serialization graphs and transaction histories, they allow both pessimistic (locking and non-locking) and optimistic (non-locking) CC implementations. To implement the newly
defined ILs, we opted for multi-version (timestamp-based) optimistic CC schemes since they are
appealing in distributed dissemination-based client-server systems as they allow clients to execute
and commit read-only transactions locally using cached information without any extra communication with the broadcast server. Pessimistic schemes, however, may require such communication
when, for example, an object has to be accessed or modified. To efficiently implement the various data consistency and currency guarantees of our newly defined ILs, we take advantage of the
system’s communication structure and periodically broadcast CC-related information to the client
population at nearly no extra costs and also utilize the “read forward” property, if applicable, i.e.,
allow transactions to read “forward” beyond their starting points as long as their underlying data
consistency property is not violated, to optimize our schemes.
7
A fundamental prerequisite for any multi-version CC scheme to be effective is that the server
or even better the client stores all those versions of data objects that are useful from the application
point of view. Obviously, the closer the data is located to the CPU running the application program,
the more responsive the application becomes. Therefore, the issue of client cache management is
very critical to the overall system performance. While the basic idea behind client caching is very
simple (i.e., keep the data objects with the highest utility of caching), in a multi-version storage
space-constrained environment it requires the application of various client mechanisms to make it
effective. The first issue is related to the garbage collection of old object versions that have become
useless for active transactions. To prevent the situation that useless object versions compete with
useful ones for scare client storage space, it is highly desirable that the transaction manager provides
hints to the client cache manager once an object version becomes eligible for garbage collection.
Another, but much more complex issue is to decide which object version to replace, if the cache
is full and space is required for a new, not yet cache-resident object version.2 Making cache replacement decisions in multi-version broadcast-based environments significantly differs from those
in mono-version traditional pull-based systems due to the following two reasons:
• In traditional pull-based systems the cost of a client cache miss (i.e., the requested object
is non-cache-resident) is the round trip time (RTT) of sending the data request to the server
and receiving the object itself through the network interface. If we ignore the variance in
network loads and the effects of caching and disk utilization at the server, the data fetch
latency can be assumed to be the same for all requested objects. In hybrid broadcast-based
systems, however, in which the majority of non-cache-resident data is downloaded from the
broadcast channel, the data access costs are not any more constant, but rather depend on the
current position of the respective objects in the broadcast cycle. As a consequence and in
contrast to traditional cache replacement policies such as Least Recently Used (LRU), Least
Frequently Used (LFU), or Least Recently/Frequently Used (LRFU) [98, 99], replacement
victims cannot be selected exclusively on the probability of their accesses in the future, but
2 When a client cache slot is needed for a new object version being read in from the air-cache or broadcast server, the
object version selected by the cache manager to free its slot is called the replacement victim.
8
also on their re-acquisition costs.
• Multi-version schemes are only effective in providing users with a transaction-consistent view
of the database, if the data to be observed by the clients is actually available in the system.
However, as storage space is not infinite and managing large numbers of versions can be
very costly, it is reasonable to assume that the broadcast server imposes an upper limit on
the number of versions that it keeps around simultaneously. Under such storage constraints
and the realistic assumption that objects are frequently modified [41], situations may occur
where useful object versions need to be evicted from the server since the upper bound on
the number of versions is exceeded. That is, in multi-version systems a local cache miss
may be accompanied with a global data miss resulting in an abort of the affected transaction. Therefore, and in order to avoid/reduce the number of such transaction aborts induced
by data fetch misses, the client cache manager should use an exclusive portion of its available storage space to store object versions which are locally important and potentially useful
for some ongoing transaction, but which have been evicted from the server. Additionally,
when choosing a replacement victim from the set of non-re-cacheable object versions, the
victim should be the object version which has the lowest local access probability and the
lowest re-acquisition costs. Since the object versions under consideration are not any more
re-cacheable, the re-acquisition costs correspond to the estimated amount of time it takes to
re-execute the transaction(s) for which the respective object version might be useful.
A third issue is about prefetching object versions in anticipation of their accesses in the future.
The intuition behind prefetching in a multi-version context is to pre-cache those object versions
that are likely to be requested in the near future in order to avoid any delay when the access eventually occurs. Prefetching is particularly appealing in dissemination-based environments because
it does not place any additional load on shared resources (server disk(s), server processor(s), network, etc.) as in pull-based systems as data flows pass the client anyway and, therefore, impacts
only the client resources. However, despite the much lower costs of prefetching in disseminationbased environments, the risk of excessive and imprudent prefetching still remains, i.e., scarce cache
9
space can be wasted by pre-caching object versions that are not important or are prefetched too
early. Additionally, the danger exists that an object version is replaced by another equally important object version and shortly after its eviction it becomes non-re-cacheable from the server. To
prevent such situations from occurring, prefetching decisions should be based not only on the cost
of re-acquisition and probability of access, but also on the likelihood that it will be re-cacheable
in the next broadcast cycle. As traditional and recently designed client caching and prefetching
algorithms [4, 5, 32, 83, 119, 151] do not adequately address the issues discussed above, we have
developed a new multi-version integrated caching and prefetching policy, called MICP, which is
presented along with an implementable approximation, termed MICP-L, in this thesis.
As we have shown by means of histories H1 and H2 , multiple versions of data objects cannot
only be exploited to improve the degree of concurrency between read-only and read-write transactions, but also between read-write transactions themselves. The higher concurrency can be realized
by providing the scheduler with the flexibility to map so-called “out-of-order” read requests to
appropriate, older versions of data and to select a version order for each object that need not necessarily correspond to the write or commit order of the transactions that created the object versions
in the history, i.e., the scheduler may choose the version order x2 x1 even though T1 committed
before T2 . As result of this flexibility, however, the scheduler may produce undesirable anomalies
caused by trading scheduling power for data freshness. For example in history H2 , T1 observes
the updates of transaction T0 (which is not shown in H2 ), but misses the effects of transaction T2 .
Now suppose that the clients that execute transactions T1 and T2 communicate with each other or,
alternatively, both transactions are executed at the same client: the client running transaction T1
may be confused about the (old) values read from objects x and y since it knows that both objects
were recently modified by transaction T2 . To prevent users of dissemination-based systems from
experiencing such possibly undesirable read phenomena, CC protocols need to incorporate a precise notion of data currency in their specifications, hence allowing them to initiate their read-write
transactions with well-defined data currency guarantees. As existing broadcast-based multi-version
concurrency control (MVCC) protocols providing serializability to read-write transactions do not
explicitly specify the degree of currency they provide, we rectify this shortage by defining a suite of
10
five new MVCC schemes, denoted MVCC-*, that ensure different levels of data freshness to their
users.
1.2
Contribution
In this thesis we broadly address the issue of providing good performance along with appropriate
data consistency and currency guarantees to read-only and read-write transactions in a broadcastbased, bi-directional client-server data dissemination environment. The issues of transforming the
database from one consistent state to another and providing a consistent view of the database in
spite of concurrency have been investigated earlier in the context of centralized, distributed and
mobile databases. However, while the basic problem being addressed is not new, the integration of
precise notions of data currency into the definitions of ILs ensuring reliable data currency guarantees
to read-only transactions is novel. We further contribute by defining various MVCC protocols that
provide appropriate data consistency and currency guarantees to read-only and read-write transactions alike and provably outperform previously proposed CC schemes by significant margins.
Additionally, a novel cache replacement and prefetching policy is proposed that ideally supports
the data requirements of our MVCC protocols.
In particular, this thesis makes the following contributions to the main research issues, namely
read-only transaction management, client caching and prefetching, and read-write transaction management, in the area of broadcast-based, wired and wireless data dissemination networks:
Read-only Transaction Management:
• Taking account of the fact that read-only transactions constitute the vast majority of the
transactions executed in dissemination-based systems [124] and that current definitions of
IL such as Conflict Serializability [120], Update Serializability [61, 162] or External Consistency/Update Consistency [29, 159] are not perfectly suitable for processing read-only transactions as they lack any semantic guarantees as far as data currency is concerned, we specify
four new ILs dedicated to read-only transactions that provide useful data consistency and
currency guarantees to application programmers and users alike. Among the newly defined
1.2. Contribution
11
levels, Strict Forward BOT View Consistency is the weakest one which allows the highest
degree of transaction concurrency. Strict Forward BOT View Consistency ensures causallyconsistent reads and permits a read-only transaction Ti to observe the updates of transactions
that committed after Ti ’s starting point as long as Ti sees a transaction-consistent database
state.
• While the newly defined ILs have applicability to any type of transaction-based database system integrated into any network environment, we present tailor-made CC protocols that support the different ILs in push/pull hybrid data delivery environments. Our protocols, namely
MVCC-BS, MVCC-SFBS, MVCC-SFBUS, and MVCC-SFBVC, are optimistic in nature and
allow clients to validate and subsequently to either abort or commit read-only transactions
without communicating with the server which obviously offloads work from it, thereby making the system more scalable.
• We evaluate the performance of the implementations of our newly defined ILs via simulations. To the best of our knowledge, this is the first simulation study investigating the performance trade-offs of providing various levels of data consistency and currency to read-only
transaction in a mobile hybrid data delivery environment. The results show that implementations that provide Full Serializability such as MVCC-SFBS do not have high performance
penalties (about 1-10% depending on the average size of the read-only transactions) compared to schemes that provide weaker guarantees such as View Consistency. We also conducted a comparison study to examine our protocols’ performance with that of existing CC
schemes which were specifically designed for dissemination-based systems. The results show
that our best performing CC protocol, MVCC-SFBVC, outperforms the other examined protocols by significant margins.
12
Client Caching and Prefetching:
• Caching and prefetching data at clients is one, if not the most effective, technique to improve
the overall system performance of dissemination-based systems. As previously proposed
client cache replacement and prefetching strategies do not optimally support the data storage
and preservation requirements of clients using MVCC protocols to provide a current and
transaction-consistent view of the database regardless of concurrent read-write transactions
in the system, we introduce a new integrated caching and prefetching algorithm, called MICP.
MICP takes account of the dynamically changing cost/benefit values of client-cached and
air-cached object versions by making cache replacement and prefetching decisions sensitive
to the objects’ access probabilities, their re-acquisition costs, their update liklihood, etc. To
solve the problem that newly created or non-current, but re-cacheable, object versions replace
non-re-cacheable versions from the client cache, MICP divides the available cache space into
two variable-sized partitions, namely REC and NON-REC, for caching re-cacheable and
non-re-cacheable object versions, respectively.
• We compare MICP’s performance in terms of transaction throughput w.r.t. one offline and
two state-of-the-art online cache replacement and prefetching policies (P, and LRFU and
W2 R, respectively) and the results show that MICP outperforms the currently best online
caching algorithm LRFU by about 19% and performs only 40% worse than the offline strategy P, which keeps the objects with the highest probability of access in the cache.
Read-Write Transaction Management:
• Another contribution of this thesis is to formally define three new MVCC protocols that
provide serializability along well-defined data currency guarantees to read-write transaction
namely MVCC-BOT, MVCC-IBOT, and MVCC-EOT, to present optimizations of the first two
schemes, to proof their correctness, and to evaluate their performance among each other and
w.r.t. the well-known and frequently implemented Snapshot Isolation scheme. The motivation for proposing yet another suite of CC protocols is the observation that currently available protocols designed to manage read-write transactions do either not provide serializability
1.3. Publications
13
guarantees [121] or enforce serializability in a suboptimal way [102,110]. Furthermore, none
of those protocols that ensure serializability for read-write transactions explicitly specifies the
degree of data currency it provides.
• We discuss techniques that are applicable to identify and prevent false or permissable conflicts among those detected by our MVCC protocols and provide and evaluate ways to resolve
data conflicts without restarting the constituent read-write transaction. We also provide solutions to make our protocols resilient to network failures and present experimental results
quantifying the influence of client disconnections from the network on the performance of
our protocols.
• We explain why our cache replacement and prefetching strategy MICP, initially designed for
exclusive use in applications issuing read-only transactions, can be utilized to service readwrite transactions, too. Performance results demonstrate that the performance penalty of
deploying LRFU instead of MICP when running read-write transactions is about 6% on the
average.
1.3
Publications
The main contributions on which the thesis is based have already been the subject of technical
reports, conference, and journal papers:
• A. Seifert and M. H. Scholl. Processing Read-only Transactions in Hybrid Data Delivery
Environments with Consistency and Currency Guarantees. Tech. Rep. 163, University of
Konstanz, Dec. 2001.
• A. Seifert and M. H. Scholl. A Transaction-Conscious Multi-version Cache Replacement and
Prefetching Policy for Hybrid Data Delivery Environments. Tech. Rep. 165, University of
Konstanz, Feb. 2002.
• A. Seifert and M. H. Scholl. A Multi-Version Cache Replacement and Prefetching Policy for
Hybrid Data Delivery Environments. VLDB 2002, pp. 850-861, Aug. 2002.
14
• A. Seifert and M. H. Scholl. Processing Read-only Transactions in Hybrid Data Delivery
Environments with Consistency and Currency Guarantees. MONET 8(4): 327-342, Aug.
2003.
Publications that arose also from the author’s research work and are not covered by the thesis are
as follows:
• A. Seifert and M. H. Scholl. COBALT — A Self-Tuning Load Balancer for OLTP Systems,
submitted for publication, Oct. 2004.
• J.-J. Hung and A. Seifert. An Efficient Data Broadcasting Scheduler for Energy-Constraint
Broadcast Servers, accepted for publication in the International Journal of Computer Systems
Science and Engineering (IJCSSE), Jan. 2005.
1.4
Outline
The outline of the rest of the thesis is as follows:
• In the next chapter, we first provide background information on the basic concepts of wireless
data communications, highlight the specific characteristics of popular wireless network types,
and discuss the various types of asymmetry prevalent in mobile data networks. We enumerate
the limitations of mobile computing systems and discuss their impact on the provision of data
consistency and currency in the presence of concurrency in mobile computing environments.
• In Chapter 3 we give several reasons why traditional data pull delivery and its inverse approach, i.e., data push delivery are not appropriate for building large scale disseminationbased systems and present the concept of hybrid data delivery as a suitable alternative delivery
mode for such systems. We then present possible configuration options of the communication
media to facilitate hybrid data delivery and identify the major underlying assumptions of the
thesis. The chapter concludes with a survey of the literature discussing technological aspects
relevant for the implementation of hybrid data delivery systems which are not covered in the
remainder of the thesis.
1.4. Outline
15
• Chapters 4, 5, and 6 constitute the main body of this thesis. Chapter 4 points out that currently available definitions of ILs are not appropriate for managing read-only transactions,
since they lack any data currency guarantees and hence may lead to wrong decisions. To resolve this problem, we then propose four new ILs which provide useful data consistency and
currency guarantees to application programmers and users alike. Next, we present suitable
implementations of the proposed ILs which take account of the various limitations and peculiarities of a mobile hybrid data delivery system. The chapter concludes with the presentation
of the experimental results of a detailed performance study, showing the performance deviations among our protocols, and demonstrating the superiority of MVCC-SFBVC (which is
our best performing CC protocol) over other previously proposed schemes.
• In Chapter 5 we introduce a novel multi-version integrated cache replacement and prefetching
algorithm, called MICP. The chapter first motivates the need for yet another cache replacement and prefetching policy and then presents the system design and basic assumptions of
MICP. We describe in detail how MICP determines cache replacement victims, how it organizes the client cache, and how it reduces the computational cost for making its replacement
and prefetching decisions. Again, the chapter completes by a performance study validating
the applicability and practicability of our algorithms in fulfilling the data storage and preservation requirements of read-only transactions. Additionally, the performance penalty of using
other caching and prefetching strategies than MICP is quantified.
• Chapter 6 tackles the challenging problem of providing serializability guarantees along with
good performance to mobile applications that access, modify, and insert information into
a widely shared, universally accessible database. It presents MVCC-*, a suite of five new
MVCC protocols, whose principle design pays particular attention to the peculiarities of mobile hybrid data delivery systems. It also discusses possible extensions of the suite’s protocols
targeted towards identifying, and thus reducing, the number of false or permissable data conflicts among those detected by them. As in previous chapters, the presentation of results of
numerical experiments conducted to show the performance trade-offs among the protocols of
16
the MVCC-* suite themselves and between the suite’s protocols and the Snapshot Isolation
scheme conclude the chapter.
• Finally, Chapter 7 summarizes the contributions and results of our work, and indicates some
possible future research directions.
“Once a new technology rolls over you, if
you’re not part of the steamroller, you’re
part of the road.”
– Stewart Brand
Chapter 2
Background
The ultimate goal of mobile computing is to allow mobile users to access external data resources
and to provide consistent and timely access to their own information collections to anyone, at anytime, from anywhere. The key enabling component to facilitate mobile computing is, besides the
portability of the mobile devices, ubiquitous connectivity, i.e., connectivity at any place, any time.
Wireless data technology provides mobile users with the capability to have access to all types of
data, including plain text, photos, graphics, audio, and video. Due to the importance of wireless
technology for mobile computing, in the following section we attempt to briefly describe the basics
of wireless data communications, highlight the specific characteristics of various wireless network
types, and finally discuss data management issues arising due the various types of asymmetry that
occur in mobile data networks. Then, we enumerate the various limitations of mobile computing
systems such as frequent disconnection, limited battery life, low-bandwidth communication and
reduced storage capacity and discuss their impact on our objective of providing appropriate data
consistency and currency guarantees to both read-only and read-write transactions along with good
performance in spite of the presence of failures (e.g., disconnections) and concurrency in the systems.
17
18
Chapter 2. Background
2.1
Basics of Wireless Communication Systems
A wireless communication system typically consists of numerous wireless communication links
which themselves include in their most primitive form a transmitter, a receiver, and a channel.
Wireless communication links can be classified as simplex, half-duplex or full-duplex. In simplex
systems, communication is constrained in only one direction. A data dissemination server that
broadcasts data to mobile users without the provision of a back-channel or an uplink channel to
the server is an example of a simplex system. Half-duplex systems provide means of two-way
communication, however, they use the same physical channel for data transmission and reception
and, therefore, the mobile user can either receive or transmit information at any given instance of
time. Examples of half-duplex systems are IrDA infrared links or most 802.11 WLAN links. Fullduplex systems such as cellular radio network systems allow mobile users to simultaneously send
and receive data by provision of either two separate radio channels (frequency division duplexing
or FDD) or adjacent time slots on a single radio channel (time division duplexing or TDD).
The two main propagation technologies used in wireless communication systems are infrared
(IR) and radio frequency (RF). Among those, RF is regarded as more flexible and practical as it
propagates through solid obstacles such as walls [36]. To provide radio communication service
simultaneously to as many mobile users as possible over a wide area, wireless multiple-access
techniques and frequency reuse must be adopted. There are two main types of multiple access
techniques, namely demand-assign based and random multiple access [8], and in practice their
deployment depends on the data traffic requirements. If continuous data flow along with high responsiveness is required, demand-assign based multiple access is applied. In this technique, the
available radio channels are divided in a static fashion and each user is exclusively assigned one
or more of those channels by a base station irrespective of the fact that it might not require the
entire bandwidth capacity of the channel. Prominent examples of the demand-assign based multiple access method are frequency division multiple access (FDMA), time division multiple access
(TDMA) [133], and code division multiple access (CDMA) [129]. In FDMA the available radio
frequency spectrum S is divided among U simultaneous users such that each user is allocated with
2.1. Basics of Wireless Communication Systems
19
a channel bandwidth of C = S/U Hz. In TDMA the radio frequency spectrum is divided into time
slots that are allocated among the users. In this system, a radio channel can be thought of as a
particular time slot that cyclically re-occurs every transmission period. In CDMA all users share the
same carrier frequency and are allowed to transmit data simultaneously. CDMA is implemented
with direct sequence spread spectrum or frequency hopping and each user is allocated a different
pseudo-random spreading code or hopping pattern, respectively, which separates users from each
other.
Code
Code
Code
Channel N
Channel N
Channel 2
Channel 1
Channel 2
Channel 1
Channel 1
Channel 2
Frequency
Frequency
Frequency
Channel N
Time
Time
(a) FDMA
Time
(b) TDMA
(c) CDMA
Figure 2.1: Demand-assign based multiple access techniques.
If network bandwidth requirements are highly dynamic with a high peak-to-average rate ratio,
random multiple access systems are used, in which many users share the same radio channel and
transmission occurs in a random, i.e., uncoordinated, or partially coordinated way. Popular examples are the ALOHA protocol [2] and the carrier sense multiple access (CSMA) protocol [92].
ALOHA is a contention-based scheme which allows users to access a channel whenever a message
is ready to be transmitted. The sender then listens to the channel to receive the acknowledgment
feedback to determine whether the transmission was successful or not. In case a packet collision
occurs, the sender waits a randomly determined period of time, and then retransmits the packet. In
the CSMA protocol, users monitor the status of the channel and do only transmit messages, if the
channel is idle. Obviously, packet collisions may still occur since just after a user sends a packet,
another user may be sensing the channel and may detect it to be idle. Consequently, it will send its
own packet, resulting in a collision with the other.
In addition to using demand-assign based and random access methods separately, variants of
both access techniques can be combined to take advantage of both approaches. A hybrid scheme
20
has the advantage that if both underlying access methods are fine-tuned to each other, the available
radio channels may neither become overload nor underloaded in peak-load situations. This is
because the demand-assign based access method assigns radio channels on a request basis and can
therefore act as an admission control. If multiple users requiring network services are likely to
produce random data traffic, they are grouped together by a demand-assign based access technique
to operate through the same radio channel. Users of the same group then access the radio channel
based on a random multiple access technique.
In order to provide wireless communication services to many users over a wide coverage area,
frequency reuse strategies that typically use spatial separation, either in the cellular distance, antenna angle, or signal polarization can be exploited. All those techniques have in common that
they enable multiple radio channels in geographically separate areas or cells to use the same radio
spectrum with relatively low co-channel interference. Applying one or more of those strategies increases the capacity of the radio network and ultimately contributes to the efficient use of the radio
spectrum which is a scarce natural resource of finite limits.
As the knowledgeable reader has certainly recognized, there are many more technological aspects of wireless communication systems such as radio wave propagation and interference, channel
coding, modulation, etc, that were not covered as far. The discussion of those topics, however, is
beyond the scope of this thesis and, therefore, we refer the interested reader to [54, 128, 132].
2.2
Wireless Network Types
To get an overview on the characteristics, capabilities, and limitations of existing and proposed mobile communication networks and to be able to evaluated them w.r.t. to their applicability to support
data-intensive transaction-based mobile applications, we now give a more detailed description of
them. As with wired networks, wireless communication networks can be classified into different
types based on the geographic scope, i.e., the distances over which data can be transmitted:
2.2. Wireless Network Types
21
Wireless Wide Area Networks (WWANs)
WWANs connect large geographic areas, such as cities or countries, via multiple antenna sites
(cellular base stations ) or satellite systems. Currently deployed WWAN technologies correspond
to second-generation (2G) and third-generation (3G) wireless cellular systems and communication
satellite systems whose main feature will be briefly described below. The majority of the 2G cellular networks deployed on Earth is based on the Global System Mobile (GSM) standard which has
been deployed by carriers in Europe, Asia, Australia, South America and some parts of the US.
Other standards include the Interim Standard 136 (IS-136), Pacific Digital Cellular (PDC), and Interim Standard 95 (IS-95) which are used by service providers in North America, South America,
and Australia (IS-136), Japan (PDC), and North America, Korea, Japan, China, South America,
and Australia (IS-95), respectively. In 2G systems, the raw data transfer rate is only 9.6 Kbps,
which is certainly to slow for data-intensive applications. Currently deployed, so-called 2.5G networks which include High Speed Circuit Switched Data (HSCSD), General Packet Radio Service
(GPRS), Enhanced Data Rate for GSM Evolution (EDGE), and IS-95B provide much higher raw
transmission rate of up to 57.6 Kbps, 171.2 Kbps, 384 Kbps and 115.2 Kbps, respectively [132].
In a loaded networks, however, achievable data rates are much lower. For example, the GPRS rate
of 171.2 Kbps (8 × 21.4 Kbps) per channel includes all the Radio Link Control/Media Access
Control (RLC/MAC) overhead. After subtracting the protocol overhead required for the sharing of
the radio channel, the actual data rate the user sees is 130.24 Kbps. This rate, however, will only be
achieved in situations where all 8 slots in a TDMA frame are dedicated to the user and the transmission itself is error-free. Both assumption are highly unlikely in a congested wireless network and
consequently, experienced GPRS throughput rates are only in the range of 10-40 Kbps downstream
and 10-20 Kbps upstream [105]. EDGE networks provide significantly higher data rates ranging
from 30-200 Kbps for the downlink channel and 30-60 Kbps for the uplink channel [156]. However, despite the significant bandwidth improvement of EDGE networks compared to 2G networks,
the transfer of a relatively small replica file of 1 MB will nevertheless take about 40 seconds at a
maximum data rate of 200 Kbps.
As for the transition from 2G to 2.5G, the primary incentives for the transition from 2.5G to 3G
22
are increased data rates and greater network capacity for operators. As 2.5G systems, 3G networks
are characterized by built-in asymmetry in the uplink and downlink data rates. The uplink data rate
is limited by battery power consumption and complexity limitations of the mobile terminals and a
user is able to achieve about 64 Kbps. The bandwidth available in the downlink direction is 3-6
times higher than in uplink direction and pedestrian users are supported by downlink data rates of
up to 384 Kbps. Despite the increased bandwidth capacities and worldwide roaming capabilities
of 3G networks, the rollout of 3G has been delayed primarily for lack of good inexpensive handsets
and other technical issues. As a result, by the end of 2003 [135] only eight commercially operating
3G systems were deployed. Note, however, that good handsets are now starting to appear at the
market in greater numbers, and most of the technical issues have been resolved which will certainly
help 3G networks to grow.
As terrestrial cellular networks, satellite systems are rapidly evolving to meet the demand for
mobile services. Due to its large communication coverage area, satellite systems can be considered
as a complementary technology to cellular networks in un-populated and low traffic areas where
cellular networks are not competitive. However, in addition to providing “out-of-area” coverage
to mobile users, recent developments in satellite technology such as narrow beam antennas and
switchable spot beams enable satellites to be used to off-load congestion within highly populated
cellular network areas and to provide mobile application users with uplink and download bandwidth
that is up to an order of magnitude larger than in cellular networks. In what follows, we briefly
describe the various types of satellite system and discuss their main characteristics.
Satellites systems can be classified according to their orbital altitudes into: (a) low-altitude
Earth orbit (LEO) satellites residing at about 500 − 2000 km above the Earth, (b) medium-altitude
Earth orbit (MEO) satellites circulating at about 5000 − 20000 km above the Earth, and (c) geostationary orbit (GEO) satellites located at 35, 786 km above the Earth.
Today, the majority of satellites links is provided by GEO satellites. GEO satellites are tied
to the rotation of the Earth and are therefore in a fixed position in space in relation to the Earth’s
surface. Thus, each ground station is always able to stay in contact with the orbiting satellite at
the same position in the sky. Due to the high orbit of GEO satellites, their “footprints”, i.e., the
23
ground areas that are covered by their transponders (transmitters), are large in size. GEO satellites
“see” almost a third of the Earth, i.e, it takes only three GEO satellites to cover (almost) the whole
Earth. However, there are various obstacles as well: (a) Due to the long signal paths, the theoretical
propagation delay of the signal to travel the distance from the ground station to the satellite and back
again is 239.6 ms [106]. Therefore, the propagation delay of a data message and its corresponding
reply (one round-trip time (RTT)) takes at least 479.2 ms. In practice, however, signal RTTs are
slightly higher ranging from 540 − 600 ms [65] depending on how fast the satellite and ground
station can process and re-send the signal. (b) As GEO satellites cover a large area with diameter
in the range of about 10, 000 − 15, 000 km, available radio frequencies are inefficiently used. Note
that this problem can be alleviated by using spot beams. However, despite this technology GEO
systems will never be as efficient as equivalent LEO systems. (c) There is another problem due to
the long signal path between the Earth and the satellite. As the strength of a radio signal falls in
proportion to the square of the distance traveled, either a very high signal transmit power is required
or large receiver antennas or an appropriate combination of both.
MEO satellites are mainly used in geographical positioning systems (GPS) and are not stationary in relation to the rotation of the Earth. MEO satellites as well as LEO satellites require the
use of constellations of satellites for constant coverage. That is, as one satellite leaves the ground
station’s sight, another satellite appears on the horizon and the channel is switched to it. Due to orbiting at an altitude of only about 1/3 of GEO satellites, they obviously incur less round-trip delay
(220 − 300 ms [87]) than GEO satellites, but also have smaller footprints.
LEO satellites have become very popular in the last few years as demand for broadband communication has surged. Compared to GEO and MEO satellites, LEOs provide many advantages: (a)
Only low transmission power is required to reach a LEO satellite which opens the door to pocketsized transceivers. (b) Due to orbiting closer to Earth, short RTTs of about 40 − 50 ms [87] are
achievable. Note that there could be additional delays for global LEO networks over the terrestrial
network that could bring the RTT up to 100 ms. (c) As the satellite orbit is closer to Earth, LEOs
are also cheaper to launch. However, deploying LEO satellites is not free of problems: (a) As LEOs
are closer to Earth, their footprint is relatively small. Consequently, about 40 − 80 satellites are re-
24
quired to attain a global coverage. (b) As LEOs orbit close to Earth, they are forced to travel at high
speeds so that gravity does not pull them back into the atmosphere. LEO satellites achieve speeds
in the range from 4, 000 − 27, 000 km/h and thus, circle the Earth in about 1.5 − 10 h. Hence, the
satellite is only in sight for about 5 − 30 minutes and thus inter-satellite hand-overs are frequent.
(c) Satellites experience orbital decay and have physical lifetimes determined almost entirely by
their interaction with the atmosphere. As the atmospheric density increases progressively towards
the ground, the orbital decay is much higher at lower altitudes. Thus, LEOs with 5 − 10 years
have a much short lifetime than MEOs or GEOs. Table 2.1 on page 28 summarizes various key
performance metrics of cellular and satellite systems. As the satellite footprint and its downlink
and uplink data rate is system-specific, the figures in Table 2.1 refer to concrete systems, namely
iPSTAR (GEO) and Teledesic (LEO) [37].
Wireless Metropolitan Area Networks (WMANs)
As the name implies, WMAN technologies enable users to establish fixed wireless connections
within a metropolitan area. The attribute “fixed” refers to the fact that the nodes exchanging radio frequency signals with each other are stationary, unlike mobile wireless technology. WMANs
offer an alternative, wireless means to cabled access networks, such as fiber optic links or coaxial
systems, for delivering broadband services to rural businesses and residences. Compared to cabled
access networks, WMAN technology is less expensive and faster to deploy and has the potential to
lead to more ubiquitous broadcast access as it provides network service options to areas with no or
insufficient wired infrastructure. WMANs technology can be classified into two groups: (a) Multichannel Multi-point Distribution Service (MMDS) and (b) Local Multi-point Distribution Service
(LMDS).
MMDS systems operate in the 2.1 GHz to 2.7 GHz band and provide a line-of-sight (LOS)
service which means that data transmission does not work well around mountains, buildings, or any
other type of signal barrier. MMDS systems have a service range of up to 50 km and maximum
uplink and downlink bandwidth speeds of 256 Kbps and 10 Mbps, respectively [134]. LMDS is
a broadband wireless technology that occupies the largest chunk of spectrum (1, 300 MHz in the
25
US) ever devoted to any wireless service. LMDS utilizes RF technology in the 27.5 − 31.225 GHz
band in the US and 40.5 − 43.5 GHz band in Europe for data transmission. Due to the high transmission frequencies, LMDS is capable of providing downlink and uplink bandwidth in the range of
20 − 50 Mbps and 10 Mbps, respectively [47]. However, compared to MMDS, LMDS has a much
smaller coverage ranging from 3 − 5 km. Like MMDS, LMDS requires LOS and is susceptible
to environmental influences such as rain. Finally, note that since 1999, the IEEE 802.16 working
group [72] is developing specifications to standardize the development of WMAN technologies.
Wireless Local Area Networks (WLANs)
WLANs are smaller-scale wireless networks with a distance coverage range of up to
100 meters [16]. WLANs are around since the late 1980’s and can be seen either as a replacement or as an extension of wired Ethernet. The most prevalent form of WLAN technology is
called Wireless Fidelity (WiFi), which includes a host of standards including 802.11a, 802.11b,
and 802.11g. 802.11b, approved by the IEEE in 1999, is an upgrade of the 802.11 standard, approved in 1997, which raised the transmission speed from 2 to 11 Mbps which is approximately
the same bandwidth capacity of wired Ethernet connections. Technically, 802.11b is a half-duplex
protocol which operates in the highly populated 2.4 − 2.483 GHz industrial, scientific, and medical
(ISM) band and its transmission distances vary from 20 − 100 meters depending on the equipment
used and the configuration selected. 802.11a has been approved in Sep. 1999 and operates in the
5.15 − 5.25, 5.25 − 5.35, and 5.725 − 5.825 bands and thus avoids the inference problems experienced by the 802.11b technology due to other products operating in the same frequency spectrum.
802.11b employs 300 MHz bandwidth in the 5 GHz band which accommodates 12 independent,
non-overlapping 20 MHz channels compared to 3 bands in 802.11b. Each channel supports up to
54 Mbps of throughput, shared among the mobile users operating in the same channel. As higher
frequency signals have more difficulties propagating through physical obstructions than those at
2.4 GHz, the operational range of 802.11a with 5 − 30 meters is somewhat less than 802.11b [16].
802.11g, approved in June 2003, attempts to combine the best of both 802.11a and 802.11b. 802.11g
supports bandwidth up to 54 Mbps by using 802.11a’s orthogonal frequency division multiplexing
26
(OFDM) modulation technique [46], it uses the 2.4 GHz frequency for greater coverage range and
is compatible with equipment based on the earlier 802.11b wireless standard.
Wireless Personal Area Networks (WPANs)
WPANs are short to very short (up to 10 meters) wireless ad-hoc networks that can be used to
exchange information between devices such as PDAs, cellular phones, laptops, or sensors located
within communication distance. Currently, the two dominant WPAN technologies are Bluetooth
and IrDA. Bluetooth [28] named after the Viking king, Harald I Bluetooth, who unified Denmark
and Norway in the 10-th century, operates in the ISM frequency band at 2.402-2.483 GHz in the
US and in most countries in Europe. This band is divided into 79 (1 MHz wide full-duplex)
channels, where each channel provides a data rate of 723.2 Kbps (raw data rate 1 Mbps). Bluetooth,
if applied for ad-hoc networking purposes, can support up to 8 devices – one of them selected
as a master – which together form a piconet similar to, but much smaller than, a IEEE 802.11
cell. To allow network nodes to form networks larger than 8 nodes, a node can act as bridge
between two overlapping piconets and create a larger network called a scatternet. The scatternet
architecture allows devices to communicate with each other that are not directly connected to each
other due to long distances or too many devices sharing the same spatial location. The Bluetooth
specification is driven by the Bluetooth Special Interest Group (SIG) and in its currently valid
version 1.2 [27], it defines two types of radio links to support voice and application data, namely
synchronous connection-oriented (SCO) and asynchronous connection-oriented (ACL1 ). An SCO
link is a symmetric, point-to-point link between the piconet master and a specific slave and can be
considered as a circuit-switched network. SCO provides up to three synchronous 64 Kbps voice
channels which can be used simultaneously. An ACL link is packet-switched and thus, is intended
for packet transfer, both asynchronous and isochronous (i.e., time-sensitive). An ACL link can
support an asymmetric link of maximally 723.2 Kbps in the downlink direction and 57.6 Kbps in
the uplink direction, or a 433.9 Kbps symmetric link [27].
1 It
is obvious that the most appropriate abbreviation for asynchronous connection-oriented is ACO. However, this
acronym has an alternative meaning in the Bluetooth 1.1 specification and is therefore not used in version 1.2.
27
The IrDA communication standard for transmitting data via infrared light waves was developed
by the Infrared Data Association (IrDA) [81] which is a industry-based group of over 150 companies
that was formed in 1993. IrDA is a LOS, point-to-point, ad-hoc data transmission standard. IrDA
operates in half-duplex mode since full-duplex communication is not possible as a device while
transmitting data is blinded by the light of its own transmitter. In order to minimize interference
with surrounding devices, IrDA devices transmit infrared pulse in a cone that extends 15 degrees
half angle off center. IrDA is designed to operate over a distance of up to 1 meter and at data speeds
that fall into four different categories, namely (a) Serial Infrared (SIR) supporting speeds up to
115.2 Kbps, (b) Medium Infrared (MIR) supporting 0.576 Mbps and 1.152 Mbps data rates, (c)
Fast Infrared (FIR) supporting a 4.0 Mbps data rate, and (d) Very Fast Infrared (VFIR) supporting
16.0 Mbps [80]. Although Bluetooth has been invented as an enhancement of IrDA and it actually
copes with IrDA’s limitations by providing omnidirectional rather than unidirectional connections,
by extending the connectivity range between devices from 1 to 10 meters, or by supporting pointto-multipoint connections, the two technologies are quite complementary. While Bluetooth is very
well-suited for building ad-hoc personal area networks, Infrared is more appropriate for establishing
high-speed point-to-point connections, e.g., for synchronizing personal and corporate data between
the mobile device and desktop.
Concluding Remarks on Wireless Network Types
There are at least three interesting observations to be made about the previously described wireless
communication alternatives: First, their is no wireless network type that is universally superior to
the others, i.e., each has advantages and disadvantages w.r.t. the others. Due the different design
and usage scenarios of the various existing wireless network types, the majority of them is complementary to, rather than competitive with each other. To leverage the advantages of each individual
network type, it is desirable to seamlessly incorporate them into a hybrid network that allows data
to flow across the individual network boundaries, using many types of media, either satellite, wireless or terrestrial, transparently. Mobile users should be able to select their favorite network type
based on availability, QoS specifications, and user-defined choices. Currently, the issue of support-
28
Wireless
Network Type
Wireless
Network
Technology
Coverage
Range
Downstream
Bandwidth
Upstream
Bandwidth
GPRS
up to 120 km
10 − 40 Kbps
10 − 20 Kbps
3G (CDMA)
up to 35 km
500 − 1000 ms
[59]
250 − 340 ms
[59]
10000 − 15000
km
160 km
(Teledesic [37])
384 Kbps −
2 Mbps
10 Mbps
(iPSTAR [37])
64 Mbps
(Teledesic [37])
2 Mbps
(iPSTAR [37])
2 Mbps
(Teledesic [37])
MMDS
up to 50 km
10 Mbps
256 Mbps
10 − 20 ms
LMDS
802.16
(Wi-Max)
3 − 5 km
20 − 50 Mbps
10 Mbps
10 − 20 ms
3 − 5 km
70 Mbps
25 Mbps
10 − 20 ms
802.11a
5 − 30 meters
54 Mbps
54 Mbps
10 − 20 ms
802.11b
20 − 100
meters
11 Mbps
11 Mbps
10 − 20 ms
Bluetooth
10 meters
723.2 Kbps
(asynchronous
mode)
57.6 Kbps
(asynchronous
mode)
∼ 10 ms [43]
IrDA
up to 1 meter
16 Mbps
16 Mbps
10 − 20 ms
[43]
WWAN
Satellite (GEO)
Satellite (LEO)
WMAN
RTT
WLAN
WPAN
64 Kbps
540 − 600 ms
40 − 50 ms
Table 2.1: Various characteristics of current and emerging wireless network technologies.
ing global roaming across multiple wireless and mobile networks is one of the most challenging
problems faced by the developers of 4G network technology whose deployment is not expected
until 2006 or even later.
Second, the majority of wireless networks exhibits asymmetry in their network characteristics,
i.e., the network load and service characteristics in one direction are quite different from those
in the opposite direction. Wireless communication asymmetry can take various forms and can
be classified as follows: (a) bandwidth asymmetry, (b) data volume asymmetry, (c) media access
asymmetry, and (d) packet loss asymmetry [3, 17, 18]. Bandwidth asymmetry is the most obvious
form of asymmetry and is characterized by a difference in the bandwidth capacity of the uplink
and downlink channels. Bandwidth asymmetric ratios between the downstream and upstream paths
vary significantly between and even within the same network type and range from about 3:1 to 6:1
in case of cellular packet radio networks to up to 640:1 in case of digital satellite TV networks
(e.g., AirTV) [12, 165]. Bandwidth asymmetry occurs because of the way how available radio re-
29
sources are allocated to the uplink and downlink channels. There are technological, economical,
and usage-related reasons for the increasing popularity of asymmetric networks. Due to the expense of the equipment and the high power capabilities of the transmitter required to provide a high
bandwidth uplink channel, asymmetric wireless networks are often constructed. As will be noted
below, many mobile applications have asymmetric communication requirements and thus, the deployment of bandwidth asymmetric networks is highly desirable. Data volume asymmetry arises
due to the divergence in the amount of data transmitted in uplink and downlink direction. Unlike
full-duplex conversational voice communication, where the traffic volumes of the uplink and downlink are usually similar to each other, many mobile data communication applications, such as web
browsing, streaming live video or file transfer, place higher demands on the downlink than on the
uplink communication capacity. In such applications, short data requests on the order of several
tens of bytes are transmitted through the uplink, whereas much larger data files on the order of
several tens or hundreds of KB are transmitted in the opposite direction. This fact along with the
anticipated dominance of wireless data services in the future has encouraged network providers to
deploy asymmetric communication networks which, in turn, prevents bandwidth waste and capacity
degradation which would otherwise arise [84]. Media access asymmetry occurs due to lower MAC
costs incurred in transmitting data from a central cellular base station or satellite ground station to
a collection of geographically distributed mobile clients than in the opposite direction. The cause is
related to the hub-and-spokes network model underlying the majority of wireless networks. In this
model, a central coordinating entity (e.g., base station) has complete knowledge and control over
the downlink channel(s) and hence suffers a lower MAC overhead than the mobile host that compete for the uplink. Packet loss asymmetry takes places when the wireless network is significantly
more lossy in one direction than in the other. In wireless networks, the uplink path is significantly
more error-prone than the downlink path since mobile hosts transmit at much lower power and
need to contend for uplink channel slots whereas high-powered base station can transmit relatively
loss-free.
Third, some of these network types are broadcast-based (e.g., Direct Broadcast Satellite (DBS),
LDMS, MMDS), which means that they inherently support information dissemination to many
30
users, possibly spread in a large geographic area, over a shared radio link and without any intermediate switching. Data broadcasting or data dissemination is an attractive data delivery model
for a number of newly emerging as well as traditional applications such as stock market and sports
tickers, news delivery, traffic information and guidance systems, video and audio entertainment,
emergency response and battlefield applications, etc. Compared to unicasting, broadcasting has
three major benefits if used for applications that require data to be transmitted to many destinations:
(a) First, it is more bandwidth- and energy-efficient since requests for the same data object by multiple clients can be satisfied simultaneously by the server with a single transmission rather than as
multiple point-to-point transmissions. (b) Second, it reduces the load on both the broadcast server
and mobile clients as locally missing data objects that are scheduled for broadcasting do not need
to be requested from the server, but rather can be downloaded from the broadcast stream as they
pass by. Obviously, this also improves the scalability of the system as the uplink channels as well
as the server can support more clients before they become a system bottleneck. (c) Last, but not
least, broadcast data delivery provides a convenient and cost-efficient way to mitigate the effects of
voluntary and unplanned disconnections and to enforce cache and transaction consistency.
Driven by the numerous advantages of data broadcasting for one-to-many applications and their
gaining popularity amongst the users, we choose periodic data broadcasting as the main data delivery model for this thesis and study various technological aspects centered around the objective
of building efficient data dissemination systems in the next chapter. Before doing so, however, we
briefly enumerate the various limitations of mobile computing systems and discuss their impact on
the objective of providing data consistency and currency guarantees efficiently to mobile users.
2.3
Limitations of Mobile Computing and their Impact on Mobile
Transaction Processing
As the technological trends given in the introductory chapter indicate and everyone actively participating in today’s public life experiences, the wireless media play an increasingly important role
in everybody’s computing environment. However, providing reliable and efficient data commu-
2.3. Limitations of Mobile Computing
31
nication and processing support to mobile users is more challenging than to users connected to a
fixed network such as a wireline local area network (LAN) due to the various constraints placed
upon mobile hosts. Environmental and system-immanent constraints of mobile systems are scarce
local resources (e.g., processors of today’s PDAs run at a speed of 16 to 400 MHz and their memory capacity ranges from 2 to 128 MB [19]), high packet loss rates, large transmission latency,
frequent bandwidth changes, variability of the supporting network infrastructure, poor data security, etc. Most of these will not be eliminated by technological progress in the near future. As a
consequence, a plethora of new research challenges are generated for the mobile computing community [13, 45, 73, 74, 125]. The provision of efficient access to consistent and current data, being
the main theme of the thesis, is one of them. The task of providing efficient transactional databasesupport to mobile users differs from that in stationary systems due to the following non-exclusive
list of facts:
• Mobile computers do not enjoy the continuous connectivity provided by e.g., wireline LANs.
A mobile user may voluntarily (to save battery power or to reduce network service costs) or
involuntarily (due to poor microwave signal reception) disconnect from the mobile computing
network for an arbitrary amount of time. If users wish to keep transactions active even though
being disconnected from the network, transactions may become extremely long lived and
their operations are more likely to conflict with other concurrently active transactions.
• To ensure portability of mobile devices, they are battery-powered and built light and small in
size, making them much more resource-poor than, e.g., desktop devices. As a consequence,
mobile hosts provide either no or only restricted storage capacities to replicate interesting
subsets of the database and thus, the majority of the required data needs to be either directly
requested from the server or alternatively downloaded from the air-cache [146] or broadcast
channel, provided that frequently requested database objects or even a complete database
snapshot is disseminated to the client population. Clearly, if large portions of data need to be
obtained through the wireless network, transaction execution times are prolonged, ultimately
increasing the probability that transactions cannot be successfully reconciled into a common
32
database state.
As the enumeration given above indicates, the constraints inherent to the mobile computing
systems affect transaction processing among others by the fact that data conflicts due to multiple
concurrent read-write transactions operating on the same shared data may occur more frequently
than in fixed distributed database systems. Consequently, conflict avoidance/reduction, detection,
and resolution is a major research issue in mobile transaction processing and various measures
intended to cope with the problem are briefly discussed below.
2.3.1
Techniques to Avoid or Reduce Reconciliation Conflicts
As techniques that completely avoid data conflicts such as processing transactions in a sequential
manner or enforcing application users and their invoked transactions to operate solely on privately
owned non-shared fragments of the database are undesirable from a performance and data accessibility perspective, the essential issue to address here is to investigate and evaluate strategies to
reduce the number of data conflicts naturally arising in the presence of concurrent accesses and
updates to widely shared data objects.
Reducing the probability of data conflicts by improving transactions’ responsiveness
Viable options to reduce the number of conflict situations are manifold and should be exploited as
much as possible by mobile system designers. There are quite a few measures centering around the
goal of optimally utilizing scarce mobile networking and computing resources which help to reduce
transaction response times and as a side effect may diminish the data conflict rate defined as the ratio
of the number of conflicting to non-conflicting data operations issued by concurrent transactions.
As a matter of fact, irrespective of the network technology used, bandwidth provided by wireless
networks is relatively low in comparison to fixed networks and the deliverable bandwidth per user
is even lower if it needs to be shared among users as it is the case in today’s cellular networks (e.g.,
GSM (Europe), PDC (Japan), and IS-95 (US)) around the globe.
Data broadcasting is an energy- and time-efficient approach to compensate, at least partially,
for the prevalent bandwidth limitations of mobile networks. Rather than allotting dedicated RF
33
channels to mobile user individually and answering their requests in a unicast fashion, data broadcasting takes advantage of the fact that data access patterns are typically skewed in nature [68],
and, therefore, it is likely that multiple requests for the same object are pending in the request
queue of the server, thus can be satisfied simultaneously. Consequently, data broadcasting is more
bandwidth-efficient than point-to-point communication as more outstanding requests can be served
at the same time. Another benefit of data broadcasting is its energy-efficiency which is maximized
when a pure push-based data dissemination approach is applied. Using push-based data communication is more energy-efficient than the traditional pull-based method since transmitting data is
about twice as energy-consuming than receiving messages [86, 94, 143]. With pure push-based data
delivery, the server provides data transfer without the reception of specific client requests and enables clients to receive required data via the broadcast channel without switching into the expensive
transmit mode. Note that data broadcasting is also more energy-efficient than unicasting if embedded into a hybrid or even pull-based data delivery system since clients always have the opportunity
to search the broadcast channel for the non-cache-resident object first before requesting it from the
server. Finally, data broadcasting may help mobile clients that run data-intensive applications to
improve their responsiveness. As indicated before, the wireless broadcast channel can be treated as
a global air-cache between the server and the client which extends the traditional memory hierarchy
structure of the wired client-server environment [3]. As such it can be used to reduce the average
data access costs required to fetch non-cached data objects from the server. Note, however, that the
instantaneous costs of retrieving an object from the air-cache is variable and depends on the respective object’s position in the broadcast cycle. Consequently, following the strategy of waiting until
the requested object appears on the broadcast may not always be optimal since fetching the data
directly from the server via some dedicated radio channel may be cheaper especially if the network
and broadcast server load is low and a large number of data objects are scheduled to be broadcast.
Two other closely interrelated approaches to speed-up transaction execution time is to prefetch
data objects in anticipation of their access in the recent future and to apply a judicious cache replacement strategy that always chooses the data object with the lowest utility of caching as replacement
victim. Designing such policies is a non-trivial task and involves the consideration of many fac-
34
tors, ranging from the recent history of client data references to not so obvious issues such as the
structure and contents of the broadcast program or the caching and versioning policy of the server.
For a more in-depth discussion on this topic as well as the presentation of a concrete cache replacement and prefetching policy that is particularly suitable for transaction-oriented data dissemination
applications, we refer to Chapter 5.
Reducing the probability of data conflicts by deploying fine-granularity CC
A slightly different, but equally important conflict-reducing strategy is to apply CC on a fine rather
than a coarse granularity level. The enforcement of the approach avoids or at least diminishes the
number of false or permissible data conflicts otherwise detected by the client transaction manager
or reconciler at the server and, therefore, contributes to the performance improvement of the system. For an illustrative example showing how false conflicts may occur, we refer the reader to the
previously presented Example 2 on page 4.
Reducing the probability of data conflicts by keeping multiple data versions
Another effective method to reduce the number of conflicts among concurrently executing transactions is to use versions for CC. Multi-versioning helps the scheduler to increase its scheduling
power in the sense that it is able to produce more correct, i.e., conflict serializable, histories than
a mono-version scheduler [161] and, therefore, it appears to be a promising technique for synchronizing mobile distributed transactions. In the standard mono-version transaction model, read
operations are always directed to the most recent version of a requested object, thereby limiting the
scheduling power of the transaction manager as transactions may sometimes need to read an older
version of a given object in order for the resulting schedule to remain conflict serializable. Clearly,
greater concurrency due to potentially less data conflicts among transactions does not come for
free. Additional overhead in terms of space and processing power is required to benefit from this
improvement. However, since multiple versions must be supported in a mobile distributed data
communication environment anyway and as MVCC protocols empirically outperform their monoversion counterparts (see Chapters 4 and 6 for comparison), it is highly desirable to take advan-
35
tage of them by designing appropriate protocols. For a representative example demonstrating how
multi-versioning can be used to increase the concurrency in the system, we refer the reader again to
a previously discussed example, namely Example 1 shown on page 4.
Reducing the probability of data conflicts by exploiting semantic information
Another complementary approach to reduce the number of performance-reducing data conflicts
is to implement CC on a more elaborate transactional model than the well-established read/write
model in which synchronization is based on the analysis of read and write operations at system
runtime. The limitation of the read/write model is that it does not incorporate any semantic information of higher level operations into decision making and, therefore, fails to exploit a significant
amount of concurrency since it forbids certain execution orders of concurrent transactions that are
not conflict serializable, but nevertheless leave the database in a consistent state. A fundamental disadvantage of semantics-based CC models, however, is that they significantly complicate application
programming (as e.g., commutativity tables of higher-level operations need to be specified or counters associated with upper and lower bounds need to be maintained in order to facilitate protocols
such as Escrow locking [116]) and they are inherently error prone. Note that an extended discussion of exploiting semantic knowledge to increase concurrency in the context of mobile transaction
processing is provided in Section 6.4.3.
2.3.2
Techniques to Detect and Resolve Reconciliation Conflicts
To deal with intermittent connectivity, to overcome latency and bandwidth problems, and to reduce
server load during service peaks, mobile devices are allowed to replicate subsets or all of the server’s
primary database and to operate on those secondary data copies in an unsynchronized fashion. To
prevent local database copies to diverge significantly from the server’s version, mobile clients are
regularly required to integrate their uncoordinated update operations into the common database
state. However, depending on the length of time since the client’s last reconciliation with the server
and the update frequency of the database during that period of time, locally generated updates
may conflict with other concurrently performed updates in the system and, therefore, some form of
36
conflict detection and resolution is required. The issue of detecting and resolving inconsistencies
between database copies can be addressed by using a syntactic or a semantic approach — or a
hybrid of both [39].
Syntactic reconciliation approaches enforce scheduling correctness by inspecting the read and
write sets of the client transactions executed since the last reconciliation. This approach completely
ignores the semantics of the transactional operations as well as the semantics, structure, and usage
of the data objects themselves when reasoning about the correctness of transaction interleavings.
Obviously, reconciling divergent database states by only analyzing read and write operations fails
to detect some legitimate schedules and thus, may unnecessarily resolve false conflicts.
Semantic reconciliation approaches offset the disadvantage of the syntactic methods by using
semantic information about the operations and the objects for automatically integrating divergent
database copies. Having higher level application-specific knowledge about the constituent client
transactions allows the resolver to use a correctness criterion weaker than Conflict Serializability [120] for preserving database consistency that we call semantic correctness following Bernstein
et al. [24, 25]. Informally speaking, under this criterion a reconciled schedule is semantically correct, if the effect on the database of the interleaved execution of the set of integrated transactions
is the same as a serial schedule of the same transactions. To ensure that only semantically correct
schedules are produced by the resolver, semantic reconciliation requires the application programmer to specify conditions under which transaction steps of concurrent transactions are allowed to be
interleaved. Certainly, the complexity of analyzing all the different transaction types in the system
and specifying conditions for them that they assume to be true before, during, and after their executions place a great burden on the application programmer. Due to this fact and in order to achieve
application-independence, we adopt a purely syntactic reconciliation approach in this thesis. In the
same vein, it is also important to note that both reconciliation approaches are rather complementary
than competitive concepts. Therefore, our approach can be extended to include object and operation
semantics to increase the number of successful reconciliations.
“New capabilities emerge just by virtue
of having smart people with access to
state-of-the-art technology.”
– Robert E. Kahn
Chapter 3
Hybrid Data Delivery
In this chapter, we give reasons why traditional request/response data delivery and its inverse approach, i.e., data push delivery, are inadequate for building large scale dissemination-based systems and introduce the concept of hybrid data delivery, the combination of the traditional request/response mechanism with data broadcasting, as a solution. Then, we present possible configuration options of the communication media to facilitate hybrid data delivery and identify the major
underlying assumptions of the thesis. In the remainder of the chapter we review and evaluate the
various technological approaches proposed to realize hybrid data delivery systems over a broadcast
medium. The key issues addressed here include the organization of the broadcast program, i.e., how
much broadcast channel bandwidth should be allocated to each given data object and the indexing
of the broadcast channel(s), i.e., which indexing technique provides the best performance/energy
consumption trade-off.
3.1
Why to Use Hybrid Data Delivery
Within the last two decades, the two or three tier client-server architecture has been the prevalent
distributed computing model. Within this architecture, the traditional request/response or pull37
38
Chapter 3. Hybrid Data Delivery
based data delivery model has been used. However, pure pull-based data delivery is unsuitable
for one-to-many data dissemination applications since it suffers from scalability problems due to
the limited uplink bandwidth capacity of wireless networks and the existing upper limit on the
service rate of the server at which it serves outstanding requests. Clearly, by following the KIWI
approach: “kill it with iron”, the congestion boundary of the server can be successfully increased.
However, taking a hardware-based approach can be uneconomic, especially if the average server
load significantly deviates from the worst case load and overload situations occur infrequently.
To overcome the scalability problem of the pull-based technique, push-based data broadcasting
or dissemination has been proposed as an alternative data delivery method [6, 30, 164]. Push-based
data delivery is a very attractive communication method for information-feed applications such as
stock quotes and sport tickers, electronic newsletters, mailing lists, etc., whose success critically
depends on the efficient dissemination of large volumes of popular data to a large collection of
users. However, as it is often the case, the push-based data delivery method has weaknesses too: (a)
First, as the server lacks feedback information about the popularity of the objects in the database,
the server has no means to adjust the broadcast content such that it optimally matches the current
data needs of the user population. Consequently, the server may broadcast data objects that nobody
requires and/or it may never deliver data objects that the clients desperately need and/or it may
transmit less popular objects too often or very popular objects too infrequently. Obviously, all those
factors have an adverse effect on the system performance and user satisfaction as precious downlink
bandwidth gets wasted, and the QoS is reduced. (b) The second shortfall of push-based data delivery
relates to the way how data is retrieved from the broadcast channel. As with magnetic tapes, data
in the broadcast channel is accessed sequentially rather than randomly, i.e., the data access time,
defined as the time elapsed from the moment a client issues an object request to the point when
the required object is completely downloaded by the client, depends on the amount of data being
broadcast. Consequently, the more distinct data objects are broadcast and the larger those data
objects are, the higher the average data access latency. If we assume that any data object scheduled
for broadcasting is equally likely to be accessed and all data objects are disseminated with the same
frequency, the average access time for any object is equal to half the amount of time required to
3.2. Hybrid Data Delivery Networks
39
broadcast any member of the broadcast set once. As a consequence, objects to be broadcast need to
be selected judiciously such that the average access time does not become unsustainably long.
As the preceding discussion has shown, both data delivery methods have their limitations and
thus, are not appropriate for data dissemination applications. To overcome these limitation, the
hybrid data delivery approach has been proposed [6, 38, 112, 113, 147], whose basic idea is to
combine the push- and pull-based data delivery methods in a complementary manner by exploiting
the advantages of each delivery type while avoid their shortfalls. To achieve optimal performance
results, hybrid data delivery concentrates on broadcasting only popular data objects and unicasts
the rest of the database as demands arises. The amount of data that is considered popular and needs
to be broadcast depends on various system and workload parameters such as the ratio of uplink to
downlink bandwidth, the skew in the data access distribution, the request rate and its variation over
time, etc. As some of those parameters change with time, often in an unpredictable way, adaptive
hybrid data delivery techniques have been proposed [112, 146, 147] that implement the advantages
of hybrid data delivery, but also adapt to the changes in the workload and client behavior. However,
the modeling of an adaptive hybrid data delivery system is beyond the scope of this thesis and we
refer the interested reader to the literature listed above for technical details.
3.2
Hybrid Data Delivery Networks
The wireless network infrastructure facilitating hybrid data delivery is called a hybrid data delivery
network. From the logical view, a hybrid network is a communication architecture that allows their
network members to establish both point-to-point connections with each other and also enables
them (e.g., the server) to broadcast information to all other active parties of the network via a
broadcast channel. From the physical perspective, hybrid networks may be established either on
the basis of:
1. An all-in-one communication medium. In this configuration, both point-to-point communication as well as data broadcasting is carried out over the same physical medium. A two-way
satellite system which typically consist of asymmetric satellite paths — a broadband satellite
40
path in the downstream direction (server-to-user) for the delivery of the actual broadcast contents and a narrowband upstream path (user-to-server) for the carriage of the user requests —
would be a network architecture belonging into this category.
2. A separate point-to-point and broadcast medium. In this network setup, the physical communication medium used to transfer user requests to the server differs from that used for
information dissemination. An example network system would be a hybrid satellite system
that consists of multiple two-way terrestrial point-to-point channels (e.g, Public Switched
Telephone Network (PSTN) dial-up, Integrated Services Digital Network (ISDN) dial-up,
or leased-line connection) to allow users to request and receive data objects that are not
scheduled for broadcasting and a broadband satellite channel which is used by the server to
disseminate popular data.
3. A separate upstream and downstream medium. In this network topology, the point-to-point
medium is used exclusively to facilitate the upstream data transfer to the server, while the
downstream medium is used for both data broadcasting and unicasting. In this network type,
the available downstream bandwidth is logically divided into two parts, a commonly shared
broadcast channel to facilitate data dissemination and multiple dedicated point-to-point channels to respond to user requests. An exemplary network architecture would again be a hybrid
satellite system that comprises a one-way terrestrial communication path to the server and a
logically divided downstream satellite channel as described above.
To be able to abstract from issues such as which objects to select for broadcasting and how
frequently to disseminate them as well as how much downstream bandwidth is to be dedicated to
broadcasting and unicasting, we make the following assumptions within this thesis:
1. We opt for the separate broadcast and unicast media network model as data communication
model of choice since the broadcast and point-to-point channels are physically independent
and a fixed amount of bandwidth is assigned to each of them. By doing so, we do not face the
problem of dynamically calculating the optimal bandwidth allocation between the broadcast
and point-to-point channels.
41
2. To exempt ourselves from the non-trivial issue to decide which objects to broadcast and
how often, we assume that the data access pattern follows the well-known 80:20 distribution
pattern, i.e., 80% of the data requests are directed to 20% of the data objects, and that broadcasting the most popular 20% of the data objects yields close-to-optimal request response
times.
3. We assume a static client access behavior, i.e., we are not required to continuously adjust the
broadcast contents to react to changes in the client request pattern.
4. Last, but not least, we assume reliable communication connections and, therefore, do not
consider the effects of transmission errors.
In order to successfully deploy a hybrid data delivery system, a number of performance-critical
and other crucial design issues need to be addressed, including: How to enable clients to synchronize with the channel and to interpret the data currently broadcast? How to organize the broadcast
channel such that the average data access time can be minimized and the requested data can be filtered from the channel in an energy-efficient way? The answers to those questions will be provided
in the remainder of this chapter.
3.2.1
Organizing the Broadcast Program
Broadcast program or structure refers to the content and organization of the broadcast and is one
of the most fundamental issues in data dissemination since it decides what, in which order, and
when to transmit from the server to the clients and thus, has a significant impact on the overall
system performance. The ultimate goal is to minimize the data access latency while keeping the
power consumption incurred by tuning to the broadcast channel at (almost) the minimum. However,
before we elaborate on specific broadcast structures, we discuss the general building blocks of the
conceptual design of the broadcast program.
The smallest logical unit of the broadcast program is called a bucket or frame [103], which
physically consists of a fixed number of packets, the basic unit for transmitting information over
the network. We distinguish between three different types of buckets: (a) concurrency control report
42
(CCR) buckets, (b) index buckets, and (c) data buckets. As the name implies, CCR buckets contain
CC-related information which enable mobile clients to continously pre-validate on-going read-only
and read-write transactions and to (pre-)commit transactions as they have completed their executions. Additionally, CCRs help mobile clients to maintain cache consistency without continously
monitoring the broadcast. Index buckets provide mobile users with information about the arrival
times of data objects on the broadcast channel. By accessing the index, mobile clients are able to
predict the point of time when their desired objects appear in the channel. Thus, they can stay in the
energy-efficient doze mode while waiting and tune into the broadcast channel only when the data
object of interest arrives. Note that a typical wireless PC card like ORiNOCO consumes 60 mW
during the doze mode and 805 − 1, 400 mW during the active mode [154] and thus, air indexing
can facilitate considerable energy conservation. Data buckets store a number of data objects and
each object is identified by a unique object identifier (OID), which is independent of the value of
its attributes. The OID can be used as search key to find and identify an object in the broadcast
channel. For reasons of practicability and for cost efficiency considerations, multiple buckets of
the same type are placed together in the broadcast program and we refer to such a set of contiguous buckets as a segment. Consequently, the broadcast program is made up of the following three
segment types: (a) CCR segments, (b) index segments, and (c) data segments.
In order to allow mobile clients to interpret the data instantly as they fly by and to synchronize
with the broadcast channel at any time, buckets are designed to be self-explanatory by including the
following header information in each bucket: (a) the bucket ID (BID), (b) the bucket type, (c) the
offset to next index segment, and (d) the offset to the next major broadcast cycle (see Definition 5
below). To facilitate the subsequent discussion and to provide plausible motivations for the various
design choices of the broadcast program, we provide the following definitions:
Definition 1 (Air-Cache Data Access Time (ADAT)). ADAT is the duration of time starting from
the moment when the client wants to fetch an object from the broadcast channel or air-cache to the
point when the desired object is actually downloaded into the local cache. ADAT can be logically
split into the air-cache index probe time and air-cache wait time that are defined next.
43
Definition 2 (Air-Cache Index Probe Time (AIPT)). AIPT is the period of time between probing
the broadcast channel for information on the broadcast time of the next index segment and getting
to the next index segment in order to obtain information on the position of the requested object in
the broadcast program.
Definition 3 (Air-Cache Wait Time (AWT)). AWT refers to the time interval between inspecting
the first index bucket of the index segment and downloading the requested object from the broadcast
channel.
Definition 4 (Air-Cache Tuning Time (ATT)). ATT is the amount of time of ADAT that the client
spends in active mode and listens to the broadcast channel to find the position of the required object
and to download it. ATT is proportional to the power consumption of the mobile client.
Definition 5 (Major Broadcast Cycle (MBC)). MBC is the amount of time it takes to transmit all
data objects scheduled for broadcasting at least once.
Definition 6 (Minor Broadcast Cycle (MIBC)). An MBC may be further partitioned into a number of MIBCs. Each MIBC contains a sequence of objects with non-decreasing values in the OIDs,
begins with a CCR segment, and is likely to be followed by an index segment.
In the literature, a plethora of broadcast structures have been proposed [4, 75, 77, 112] that can
be distinguished along two dimensions: (a) flat vs. skewed organization and (b) single vs. multiple
channel organization. A flat broadcast organization is the simplest way to generate a broadcast
program and is characterized by the fact that each data object appears exactly once per MBC.
An example structure of a flat broadcast is shown in Figure 3.1(a) on page 48 and its underlying
assumptions are that the data file disseminated by the broadcast server consists of 18 data objects
of equal size and access probability and each bucket in the broadcast accommodates exactly one
of these objects. Additionally, we assume that one index bucket can capture indexing information
of 6 data objects and the entire index — rather than only a portion of it — is broadcast between
successive data segments. For pedagogical and comparison reasons, we retain these assumptions
for the presentation of the other broadcast structures discussed in the rest of the subsection and we
44
may refine them as demand arises. The design of the structure illustrated in Figure 3.1(a) is driven
by the objective to provide clients with the ability to selectively tune into the broadcast channel (by
interleaving an index with the data) and to allow them to efficiently validate their local transactions
and to enforce cache coherence in an energy-conserving way (by interspersing a CCR with the
data). As both the CCR and index segment is broadcast once during an MBC, the average AIPT
is equal to half the distance between two consecutive index segments which is about equal to half
the total length of an MBC. The AWT consists of inspecting the index and waiting for the object
to occur on the channel plus downloading it.1 On average, this is equal to the length of the index
segment, lindex , plus half the length of the data segment, ldata , and the time to download the desired
object, lob ject , i.e., lindex + 12 · ldata + lob ject . As a result, we can derive that the average ADAT is about
equal to the length of one MBC which may be unsustainably high especially if the MBC becomes
large.
To avoid such high data access latencies, an alternative method is to broadcast the index segment, possibly coupled with the CCR segment, multiple (i.e., dindex ) times within an MBC as shown
in Figure 3.1(b). Then, the average AIPT is only half of the sum of the length of the CCR segment,
lCCR , lindex , and the quotient of ldata and dindex , i.e.,
1
2
data
). Obviously, the more
· (lCCR + lindex + dlindex
index segments are interleaved with the data segment, the shorter the average AIPT. However, as a
side effect of interspersing multiple index segments (and possibly CCR segments) into the broadcast program, the size of the MBC increases, which, in turn, translates into a longer average AWT.
Therefore, the essential issue is to find an optimal value of dindex so as to minimize the average
ADAT which is achieved by using the following formula [77]:
s
dindex =
ldata
.
lindex + lCCR
(3.1)
Both broadcast organizations described above are flat broadcast schemes and are only optimal if
every data object is accessed uniformly which is rarely the case in real life. Typically, the probability
distribution of client accesses is highly skewed [68], i.e., some data objects are more important and
are more commonly requested by the clients than others. Skewed broadcast schemes [4, 7, 60,
1 Note
that we assume that all object requests refer to data objects included in the broadcast program.
45
148, 153] take account of this fact and broadcast more popular data objects more frequently which
results in broadcast programs in which some objects may appear more than once within a MBC.
Acharya et al. [4, 7] were the first to propose a non-uniform data broadcast, called “multi-disk”
broadcast generator. The multi-disk broadcast generator splits the data file into n partitions where
each partition consists of objects with the same or similar access probability. Each partition is then
assigned to a separate broadcast disk i which spins with its own relative frequency λi . The spinning
speeds of the individual disks are set in proportion to the average access probability of the objects
within the various partitions and the partitions’ sizes are multiples of 1. Now lets λ denote the
least common multiple of λi , for all i. The multi-disk broadcast generator additionally splits the
contents of each broadcast disk into ci chunks, where ci =
λ
λi .
The broadcast program is then built
by interleaving chunks from the various broadcast disks by using the following algorithm:
4
begin
for i ←− 0 to λ − 1 do
for j ←− 1 to n do
broadcast chunk (i mod ci ) of broadcast disk j
5
end
1
2
3
Algorithm 3.1: Multi-disk broadcast generation algorithm.
To exemplify the multi-disk data broadcast approach within the framework of our running example, we need to modify its underlying assumption that all objects are equally likely to be accessed. Obviously, this assumption is not appropriate to generate a skewed broadcast and we therefore assume a non-uniform data access behavior. More precisely, data objects 1 − 4, 5 − 10, and
11 − 18 are assumed to be accessed with a probability p of
6 4
13 , 13 ,
and
3
13 ,
respectively. Given those
groups of objects with different access probabilities, the multi-disk broadcast generator creates a
3-disk broadcast program with broadcast disks BD1 , BD2 , and BD3 consisting of data objects 1 − 4,
5 − 10, and 11 − 18, respectively. To take account of the varying access probabilities among the
object groups, objects in BD1 are broadcast one and a half times as often as objects on BD2 and
twice as frequently as objects on BD3 . Thus, the broadcast generator spins disks BD1 , BD2 , and
BD3 with a relative frequency of λ1 ←− 6, λ2 ←− 4, and λ3 ←− 3. Consequently, the contents
46
of BD1 , BD2 , and BD3 will be split into 2, 3, and 4 chunks, respectively and each chunk consists
of 2 objects on all disks. Figure 3.1(c) finally shows the resulting multi-disk broadcast program
for the running example. The figure also illustrates a valuable property of the multi-disk broadcast algorithm: the inter-arrival time between successive broadcasts of the same object is fixed.
Note that the generation of regular broadcasts is important for mobile clients for reasons such as to
simplify the client caching and prefetching heuristics or to retrieve data objects from the broadcast
channel without (always) consulting the index. However, the regularity property does not come
without cost. The problem is that the broadcast frequency and, therefore, the amount of broadcast
bandwidth allocated to any data object does not properly reflect its access probability. As derived
in [15], in an optimal broadcast program, the amount of bandwidth allocated to any object should be
proportional to the square-root of its access probability. In our running example and as reflected in
the spinning speeds of the 3 broadcast disks, the access probabilities of the objects “stored” on disks
BD1 , BD2 , and BD3 are
6 4
13 , 13 ,
and
3
13 ,
respectively. The square-root formula for optimal bandwidth
allocation prescribes that disks BD1 , BD2 , and BD3 should get 40%, 33%, and 27%, respectively.
However, the multi-disk scheduling approach gives the same bandwidth to all 3 disks, i.e., popular
data objects (1 − 4) are given to little bandwidth and non-popular data objects (11 − 18) are given
to much bandwidth. In the literature there exist alternative approaches that optimize the multi-disk
approach and provide close to optimal bandwidth allocation to data objects [60,148,153]. However,
those approaches trade-off broadcast program regularity for better bandwidth allocation and may
therefore not be the first choice of the system designer.
The broadcast structures discussed so far assumed that the broadcast server disseminates information over a single channel and all clients are tuned to this channel. An alternative approach
is to conceive an environment, in which the server broadcasts popular data on multiple channels,
and clients listen to one or more channels in parallel depending on the physical properties of the
mobile device. In this respect, however, we argue that there is no need to model separate channels
for data dissemination as long as the data is accessed uniformly. If that is the case, it does not
matter whether we multiplex the index and its underlying data on a single or on multiple channels
as long as CC-related and index data is still broadcast with the same frequency. The reason is that
47
the combined capacity of multiple channels is equivalent to the capacity of a single channel and we
can always find a mapping from a single channel broadcast program to a multi-channel broadcast
program that provides the same performance results.
This finding, however, may not be true if the access pattern on the data objects is skewed.
Then, multiple channel broadcasting may provide a performance advantage over the single channel
approach. This is because the multi-channel broadcast approach allows the interleaving of multiple
small indexes with data objects within each of the multiple channels rather than a single large
index for an entire broadcast channel. If we now assume that data objects with the same or similar
data access probabilities are broadcast over the same broadcast channel, the indexes are built upon
objects with the same (or similar) popularity. Consequently, clients may not any more waste time
inspecting index entries of unpopular data objects despite the fact that the requested data object is
popular. Figure 3.1(d) finally illustrates a suitable multi-channel broadcast program for our running
example. As in previous broadcast organization structures, the program’s underlying assumption is
that each index bucket can accommodate index information of up to 6 objects and thus, the index
segments of broadcast channels 1 and 2, denoted IS1,n and IS2,n , respectively, where n represents
some consecutive index segment number with n ≥ 1, contain only one index bucket. Index segment
IS3,n is double the size of IS1,n and IS2,n , respectively, as it needs to accommodate index information
of 8 objects.
3.2.2
Indexing the Broadcast Program
So far, we have not discussed specific channel indexing techniques that help mobile clients to efficiently find out what is being broadcast at what time. Before doing so, we briefly discuss reasons why an index should be an integral part of any broadcast program of a hybrid data delivery
system: (a) The first reason is related to the key issue of saving scarce battery power of mobile
devices. Without an index, mobile clients need to continously monitor the broadcast channel until
the desired data object arrives. This consumes about an order of magnitude more energy as mobile
clients need to remain in active mode all the time than if an air index was interleaved with the corresponding data buckets and mobile clients could stay in doze mode during waiting time and tune into
48
CCR
Segment
Index
Segment
Broadcast Cycle
1
4
7
10
13
16
2
5
8
11
14
17
3
6
9
12
15
18
Data Segment
(a) Flat single channel broadcast organization —
(1,1) indexing
1
4
7
2
5
8
3
6
9
Index
Segment 2
Minor BC2
CCR
Segment 2
Index
Segment1
CCR
Segment1
Minor BC1
10
13
16
11
14
17
12
15
18
Data Segment 2
Data Segment 1
(b) Flat single channel broadcast organization — (1,2) indexing
6
3
8
2
11
4
13
5
12
7
14
Minor BC 2
Index
Segment 2
Index
Segment1
CCR
Segment1
1
CCR
Segment 2
Minor BC 1
1
10
3
6
2
15
4
17
9
16
5
18
Data Segment1
Data Segment 2
(c) Skewed single channel broadcast organization — (1,2) indexing
Minor BC 2
Channel 2
Channel 3
IS1,1
1
IS2,1
5
IS3,1
IS3,1
2
6
11
IS1,1
3
4
IS2,1
7
8
12
13
14
Index and Data Segment 1
CCR
Segment2
Channel 1
CCR
Segment1
Minor BC 1
IS1,2
IS2,2
IS3,2
1
2
9
10
IS2,2
15
16
IS3,2
IS1,2
3
5
6
17
18
Index and Data Segment 2
(d) Skewed multi-channel broadcast organization — Multi-channel indexing
Figure 3.1: Various possible organization structures of the broadcast program.
4
49
the channel only when the desired object arrives. (b) Second, air indexing may help mobile clients
to reduce their average ADAT. At first glance, this point may seem to be contradictory as the broadcast cycle is lengthened due to the additional indexing information clearly leading to longer average
ADATs for those objects which are air-cache-resident. However, in a hybrid data delivery systems
the air-cache is not the only available source to mobile clients for data retrieval. For performance
reasons, only the hottest database objects are continuously broadcast to the client population and
the rest of the database can be requested from the server as demand arises. To exploit the scalability
advantages of data broadcasting and to avoid the mobile network and server to become the performance bottleneck, mobile clients should always listen to the broadcast first to find out whether the
object of interest is air-cached, before sending a request to the server asking for it. To enable clients
to quickly differentiate air-cache hits from air-cache misses, an air-cache index is indispensable.
If no index is interleaved with the data objects and the object of interest is not air-cached, then a
mobile client needs to wait an entire MBC to find this out. An air-cache index can considerably
speed up that process resulting in much shorter average ADATs for non-air-cache-resident objects.
Note, however, that the question whether an air-cache index reduces or increases the overall average ADAT is system-specific and depends on many tuning and workload parameters such as the
repetition frequency of the index, the relative size of the index compared to its corresponding data,
the number of data objects being broadcasted, the data access patterns of the clients, the average
load on the network and database server, etc.
Irrespective of whether indexing bears a performance trade-off or not, it helps to conserve the
usage of energy and is therefore an indispensable technique for hybrid data delivery systems. In
the literature, we can distinguish between three classes of index methods for broadcast channels:
(a) signature-based, (b) hashing-based, and (c) tree-based indexing. In what follow, we briefly
describe the basic working principles of the techniques and comparatively evaluate them w.r.t. the
two key performance metrics, namely access latency and tuning time.
50
Signature-based Indexing
The signature method is a widely used method with applications in areas such as text retrieval [48],
multimedia database systems [130], image database systems [100], and conventional database systems [34]. Signatures are densely-encoded information about data objects which are significantly
smaller than the actual object itself (< 20%). They are easy to calculate and provide a high degree of
selectivity. A signature of a data object i, denoted Si , is a bit vector generated by first hashing each
attribute value of the data objects into a bit sequence and then superimposing or ORing the bit sequences together. Signatures are (periodically) calculated by the broadcast server for any scheduled
data object and are typically broadcast as a group either once or preferably multiple times within an
MBC. To determine whether the requested object might be contained within the broadcast program,
a query signature Squery is constructed by the same hash function as used by the broadcast server for
the objects, denoted Sbcast , and is then compared to each Si in Sbcast . As a result of the comparison
a candidate list of data objects is returned containing those objects of the broadcast program that
match the query signature, i.e., Squery ∧ Si = Squery . Each object signature and the OID of its
corresponding object is associated with the information where to find the respective object in the
broadcast program and is stored as a triple (Si , OID, BID) in Sbcast . Once the candidate objects are
determined, the objects must be compared directly with the search criteria after the object signature indicates a match in order to eliminate the so-called false drops that may occur due to reasons
such as hash collisions and/or disjunctively combining various signature terms. An example of a
signature file along with a sample query is given below:
Query: Select * From Ticker
Where Symbol =’MSFT’;
10100010 Squery
OID Symbol Price
1
IBM 86.75
2 MSFT 27.5 Bucket 1
3 ORCL 11.2
...
...
...
N
...
...
Si
OID BID
00001011 1
1
10101011 2
1
1
10000011 3
...
... ...
M
11110011 N
Signature File
...
...
...
...
SUNW 4.10
Comparision
S1 Squery
S2 Squery
S3 Squery
S4 Squery
Result
No Match
Match
No Match
False drop
Bucket M
Data Buckets
Figure 3.2: Example illustrating the signature comparison process.
51
In this example the predicate Symbol = ’MSFT’ has a signature of 10100010 and the signature
comparison indicates a match for objects 2 and 4 and rejects the other two objects. After accessing
and inspecting objects 2 and 4 in the broadcast stream, only object 2 matches the search condition
and object 4 is identified as false drop.
Signature schemes proposed in the literature are the (a) simple, (b) integrated, and (c) multilevel signature methods [103]. In the simple signature scheme, a signature bucket is constructed for
each data bucket and broadcast before it. The integrated signature scheme generalizes the simple
scheme by generating a signature bucket for a group of one or more data buckets. As for the simple
scheme, the signature bucket is disseminated before the corresponding data buckets. The multi-level
signature scheme consists of multiple signature levels with each level being broadcast before its
corresponding data buckets. The multi-level scheme combines the simple and integrated signature
methods by using the former to generate the lowest level signatures and the latter to calculate upper
level signatures.
Before concluding the discussion about signature-based air indexing, we specify the algorithm
a mobile user uses to locate and reach data objects for query signature Squery within the air-cache if
the integrated signature scheme is used for indexing. The algorithm as shown below, is executed as
soon as a required data object is chosen by the mobile user and a local cache miss occurred.
8
begin
/* Initial probe
*/
Tune into the broadcast channel, read the header part of the current bucket Bcurr and find
out when the next signature bucket will be broadcast.
Go into doze mode and wake up when the signature bucket is broadcast.
/* Integrated signature probe
*/
foreach Integrate signature of an entire broadcast cycle do
if Squery matches the integrates signature then
Check all data buckets associated with the signature for true signature matches
and download them from the air-cache.
else
Turn into doze mode and wait until the next integrate signature is broadcast.
9
end
1
2
3
4
5
6
7
Algorithm 3.2: Access protocol for retrieving data objects by using the integrated signature scheme.
52
Hashing-based Indexing
Hashing-based schemes differ from tree-based and signature-based indexing schemes by the fact
that they do not require separate indexing information to be broadcast with the data. Rather hashing
parameters are included in the header part of each data bucket. To help mobile users to orientate themselves within the broadcast stream and to enable them to determine the position of the
desired object in the broadcast program, each data bucket header contains the following information: (a) BID, (b) offset to the next MBC oMBC , (c) hash function h, and (d) shift value s. The
shift value is a pointer to the logical bucket B containing data objects with the hash key k such
that h(k) = bucket id(B), where bucket id denotes a function returning the BID of the bucket B.
The shift value is required since hash functions may not be perfect which means that there will be
hash collisions and fractions of the colliding data objects may need to be stored in overflow buckets
which immediately follow the actual bucket assigned to them by the hash function.2 As a result
and with the exception of the first logical data bucket in the broadcast program, the other logical
broadcast buckets might need to be shifted further down in the broadcast cycle which requires each
logical bucket to store redirection information in form of a shift value eventually guiding the user
to the true logical bucket.
After these preliminary remarks, we are in the position to discuss the Hashing A data access
protocol as introduced in [76]. The access protocol involves the steps as presented in Algorithm 3.3.
To get a better understanding of the Hashing A access protocol, Figure 3.3 illustrates a simple
application scenario where a mobile user wants to locate an object x with search key k = 18 in the
broadcast channel. The hash function used to map objects to logical data buckets is h(k) = k mod 5.
In Figure 3.3, buckets that have no fill style denote logical buckets and those that are filled with dots
denote overflow buckets of the preceding logical bucket. Besides, the numbers in the left and right
hand corner of the bucket headers denote the bucket identifier and shift value, respectively. In the
example we assume that the initial probe takes place at the 2-nd physical bucket (BID = 1). The
client probes this bucket and reads the bucket identifier and hash function from its bucket header
2 Note
that grouping all overflow buckets at the end of a broadcast program with each logical bucket having a pointer
to its first bucket of the overflow chain would be an alternative hashing method yielding comparable tuning and data
access times to that of the Hashing A method described below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
53
begin
/* Initial probe phase starts here
*/
Tune into the broadcast channel, read the header part of the current bucket Bcurr and
calculate h(k).
if bucket id(Bcurr ) < h(k) then
Turn into doze mode and wait until bucket h(k) appears on the channel.
else
Turn into doze mode and wake up at the beginning of the next MBC.
/* First probe phase commences here
*/
if bucket id(Bcurr ) = h(k) then
Read the shift value from the header section of the current bucket scurr and go into
doze mode.
if scurr > 0 then
Wake up after scurr number of buckets.
else
Stay tuned to the broadcast channel.
/* Final probe phase starts here
*/
Listen to the broadcast channel until either an object with search key k is encountered or
an object with search key l is observed such that h(l) 6= h(k).
end
Algorithm 3.3: Data access protocol of the Hashing A scheme.
and verifies whether the bucket identifier is smaller than the hash value of the search key 18. Since
this condition holds, the client turns into doze mode and wakes up to proceed with the first probe
at the 4-th physical bucket (BID = 3). If there was no overflow, this bucket would contain the
candidate data objects that might match the search key 18. However, since there is overflow, the
data objects which are mapped to the logical bucket 4 (i.e., h(k) = 3) are shifted by 7 buckets
further down in the broadcast stream. Again, the client goes into doze mode and continues with the
final probe at logical bucket 4 (BID = 10). In order to find out which data objects with search key
18 are actually present in the broadcast program, the client finally needs to examine logical bucket
4 and its associated overflow bucket (i.e., BID = 11) for query matches.
In Figure 3.3 we considered the case where the client tunes into the broadcast channel (to locate
data objects with search key k) at the physical bucket Bcurr and its associated bucket identifier
is smaller than h(k), i.e., guiding or directory information to direct the user to the required data
objects can still be obtained from the current broadcast cycle. However, if that is not the case, i.e.,
54
Initial Probe Phase
BID oMBC
Header
Data Bucket
0
h(k)
s
+15 k mod 5
0
+0 1
+2 2
+6 3
+0
+7 4
+8 5
6
7
8
9
10
11
12
13
14
+8 5
6
7
8
9
10
11
12
13
14
6
7
8
9
10
11
12
13
14
Initial Probe
First Probe Phase
0
+0 1
+2 2
+6 3
+7 4
First Probe
Final Probe Phase
0
+0 1
+2 2
+6 3
+7 4
+8 5
Final Probe
Figure 3.3: An example illustrating the Hashing A data access protocol by using h(k) = k mod 5 as hash
function.
the client makes its initial probe after the bucket storing either the desired data objects themselves
or, alternatively, if overflow exists, a pointer to the first of a chain of buckets keeping them, the
client needs to wait until the next MBC to find out about their presence in the broadcast program.
Obviously, in the overflow case, missing the bucket containing a pointer to the true logical bucket
may become costly as it requires the user to wait until the next MBC to locate the desired data
objects. Its costs is proportional to the ratio of the number of overflow buckets within the broadcast
program to the number of their corresponding logical buckets and can be reduced by using the
Hashing B data access protocol [76], refining the Hashing A scheme by modifying the hashing
function h(k) such that it takes into consideration the minimum size of the overflow chain of the
logical buckets. By doing so, the probability of so-called directory misses can be reduced, and thus,
the average ADAT of the Hashing B protocol may be significantly lower than that of the Hashing
A scheme [76]. For more information on the Hashing B protocol and a comparison study of both
schemes, we refer the interested reader to [76].
Tree-based Indexing
Last, but not least, various tree-based indexing schemes have been proposed in the literature to address the power conservation issue of broadcast channels [35, 75, 76, 77, 142, 167]. Among those
55
proposed tree-based techniques, the (1, m) indexing method [75, 77] is one, if not the most prominent representative, of this category and is therefore briefly described in what follows. Like many
other tree-based indexing schemes, the (1, m) indexing method applies a B+ -tree for air indexing
and broadcasts the whole B+ -tree m times during the transmission of a single instance of the data
file. The index is broadcast at the beginning of every
1
m
fraction of the data file. Figure 3.4(a)
exemplifies how Imielinski et al. adapt the B+ -tree indexing technique for air indexing by storing
the arrival times of the data objects in the leaf nodes of the tree. The figure shows a B+ -tree of order
1 and height 2 which indexes a data file (consisting of data objects stored in 18 buckets) along a
single attribute. In the figure the rectangular boxes in the bottommost level depict the data file and
each box represents a collection of 2 data buckets. The B+ -tree is shown above the data buckets and
each leaf node bucket has 2 pointers to its associated data buckets. Obviously, the entries of the leaf
node buckets are (key value, data bucket pointer) pairs, while non-leaf node buckets contain (key
value, index bucket pointer) pairs.
Besides constructing the B+ -tree3 of the data objects to be broadcast, we need a simple way
of mapping the index tree to the channel-time space. We do so by traversing the B+ -tree in a topdown, leaf-to-right fashion and map each index bucket to the broadcast channel-time space in the
order of its selection. In Figure 3.4(b) we see an example of how to map the index buckets of the
B+ -tree shown in Figure 3.4(a) onto the channel-time space. For reasons of clarity, we represent the
index buckets in Figure 3.4(b) using alphanumeric numbers rather than the values of the keys as in
Figure 3.4(a). Figure 3.4(b) also illustrates how the user is guided through the index when searching
for a required data object. The example assumes that a data object with key 18 is to be accessed
by the user. The index traversal starts after being routed to the root bucket R of the index tree. The
user probes R and is directed to index bucket I2 by the search key comparison and then to the leaf
bucket L5. At each index probe, the user obtains the time offset when the next required child index
node is transmitted enabling it to switch into doze mode between consecutive index probes.
When building the B+ -tree, the construction algorithm will need to know for each index node
when and where (in the multi-channel channel set-up) the data or index buckets it refers to are to
3 B+ -trees can either be constructed by repeatedly applying the B+ -tree insertion algorithm [22,93] or if the tree needs
to be built from scratch by using the more efficient batch-construction algorithm [91].
56
Root
I1
1
4
L1
Data Buckets
6
11
6
8
11
L2
0,1
12
15
L3
2,3
B+−Tree
R
15
23
I2
18
21
18
19
16
L4
4,5
L5
6,7
Level 2
I3
21
22
L6
8,9
23
26
L7
10,11
28
32
28
31
L8
12,13
Level 1
32
34
Level 0
L9
14,15
16,17
(a)
R
I1 L1 L2 L3 I2
L4 L5
L6 I3
L7 L8 L9
(b)
Figure 3.4: Tree-indexed broadcasting: (a) an example B+ -tree and (b) index probing scenario to the data
object with the key 18.
be transmitted so that it may store logical pointers to them. The so-called pointer filling can be
performed as follows:
Let ni denote the number of index buckets at the i-th level, level(i), of the B+ -tree,
0 ≤ level(i) ≤ n and ji , j > 0, denotes the j-th index bucket at the i-th index level in left-toright order. Additionally, let height be the height of the B+ -tree and let nindex denote the number
of index buckets required to store the entire B+ -tree. The index bucket at position p(i, j) will then
periodically be broadcast at time:
height
Ti,index
= Ts + (k + 1) ·
j
∑
!
nl + j + k · (nindex + nnon−index ) ,
k ∈ N,
(3.2)
l=d, d>level(i)
where Ts represents the (logical) time at which the server begins broadcasting data (initially, Ts is
0) and nnon−index denotes the number of non-index buckets disseminated between two consecutive
index files. Note that in order to keep Equation 3.2 as simple as possible, the (1, m) index is assumed
to be broadcast first within a MIBC.
Let m denote the number of times the whole B+ -tree is broadcast during a MBC, nCCR denotes
the number of CCR buckets reserved for CC-related information per MIBC, and ndata denotes the
number of data buckets required to store a single instance of the data file which is assumed to be a
57
multiple of m. Let nbcast further denote the length of a MBC in terms of buckets. Then, the periodic
time when the i-th data bucket, 0 ≤ i < ndata , is broadcast can be computed by using the following
formula:
Tidata
= Ts + (k + 1) ·
k · nbcast ,
i·m ndata ndata
· nindex + nCCR +
+ nindex + nCCR + i mod
+1 +
ndata
m
m
k ∈ N.
(3.3)
To complete the description of the (1, m) indexing algorithm, we finally specify the data access
algorithm the mobile user uses to reach the desired data object(s) (see Algorithm 3.4). As before,
the algorithm is executed as soon as the requested data object(s) are chosen and they are identified
as non-cache-resident at the client.
1
2
3
4
5
6
7
8
9
10
begin
/* Initial probe
*/
Tune into the broadcast channel, read the header part of the current bucket Bcurr and find
out when the next index tree bucket will be broadcast.
Go into doze mode and wake up when the first bucket (root node bucket) of the next
index segment is broadcast.
/* Index probe
*/
+
Traverse the B -tree by successively probing non-leaf nodes until the leaf node in which
the search key belongs is found.
Power down the mobile device between successive index node probes to conserve
energy.
if Search key k matches any leaf node entry of the tree then
/* Air-Cache Hit
*/
Tune into the broadcast channel when the first data bucket containing data objects
with attribute value k is disseminate and download all data objects with matching
attribute value k.
else
/* Air-Cache Miss
*/
Request the required data objects with attribute value k directly from the server via
point-to-point communication
end
Algorithm 3.4: Access protocol for retrieving data objects by using the (1, m) indexing scheme.
Besides the (1, m) indexing technique, in the literature there exists a plethora of other treebased indexing methods with some of them being summarized hereafter. The distributed indexing
58
method [75, 77] has been proposed to cut down the high number of index buckets replicated by
(1, m) indexing within a broadcast cycle. To do so, the index tree is divided into a replicated and
non-replicated part with the latter being broadcast only once in a MBC. This is possible as the nonreplicated index part always indexes only the data objects immediately following the distributed
index dissemination and hence, it is able to reduce the relatively high replication and access latency
costs of the (1, m) indexing technique, while still achieving good tuning time.
Both (1, m) indexing and distributed indexing assume that client data accesses are uniformly
distributed. In practice, this is hardly the case and therefore, unbalanced tree structures that optimize
the tuning time for non-uniform data accesses have been suggested [35, 142]. More precisely, k-ary
Alphabetic Huffman trees were proposed that minimize the average index search cost by reducing
the number of index node probes for data objects with high access probabilities at the expense
of spending more on those with a low popularity. To allow system designers to trade-off between
access latency and tuning time based on the respective application-specific requirements, the flexible
and exponential indexing method has been proposed [76, 167]. Both methods achieve this goal by
providing the system designer with tuning parameters that offer great flexibility in trading access
time against tuning time and vice versa.
An important question that has not been address so far is how the previously discussed indexing
methods perform against each other in terms of ADAT and ATT. Unfortunately, and to be best
of our knowledge, there is no study in the literature comparing all three indexing classes. We
are therefore restricted in our comparative evaluation to the performance results gathered by two
independently conducted performance studies which compare the signature-based and tree-based
indexing methods [69], and hashing-based and tree-based indexing methods [76, 155] in terms of
of ADAT and ATT. In what follows, we present a brief synopsis of the experimental results of
both studies starting with key observations drawn from comparing the Hashing B with the flexible
indexing method [76, 155]:
• Both the tree-based indexing and hash-based indexing techniques have some advantages over
the other.
59
• Tree-based indexing techniques should be used if energy conservation is the main application
requirement and the key size is relatively small compared to the size of the data objects.
• Hash-based indexing techniques should be used if energy efficiency is of minor importance
for the application scenario and the key size is relatively large compared to the size of the
data objects.
The comparison study of the simple signature and distributed indexing methods [69] provided the
following results:
• Both the tree-based indexing and signature-based indexing techniques have some advantages
over the other.
• If the access time is more important for the usage scenario than the tuning time, use signaturebased indexing.
• If the tuning time, i.e., the power conservation, is more important for the envisaged application scenario than the access time, use tree-based indexing.
As a result of those two comparison studies and due to the fact that the optimization of the access
latency and tuning time metrics are contradictory goals, we can conclude that none of the three
indexing methods is superior to any of the others in terms of both access and tuning time. Additionally, the results show that if tuning-time is the most important system design factor, tree-based
indexing techniques should be used. However, to achieve good access times, either signature-based
or hashing-based indexing method should be deployed.
60
“Ten years from now, all transactions
may be done wirelessly.”
– Jeff Bezos CEO, Amazon.com
Chapter 4
Processing Read-only Transactions Efficiently and
Correctly
4.1
Introduction
Consider innovative applications such as road traffic information and navigation systems, online
auction and stock monitoring systems, news/sport/weather tickers, etc. that may employ broadcast
technology to deliver data to a huge number of clients. As those applications primarily consume
data, rather than produce it, the majority of them that need data consistency and currency guarantees initiate read-only transactions. Running such read-only transactions efficiently despite the
various limitations of a wireless broadcasting environment is addressed in this chapter. How transaction processing can be implemented and how data consistency and currency can be guaranteed,
is constrained, among others, by the limited communication bandwidth of mobile networks. Today’s wireless network technology such as cellular or satellite networks offers a client-to-server
bandwidth that is still severely restricted (see Table 2.1 for detailed information on the bandwidth
characteristics of various mobile networks). Fortunately, the server-to-client bandwidth is often
much higher than in the opposite direction and thus makes the broadcasting paradigm an attractive
choice for data delivery and ensures, as shown in this chapter, that read-only transaction processing
61
62
Chapter 4. Processing Read-only Transactions Efficiently and Correctly
algorithms can be implemented efficiently.
4.1.1
Motivation
Irrespective of the environment (central or distributed, wireless or stationary) in which read-only
transactions are processed, they have the potential of being managed more efficiently than their
read-write counterparts especially if special concurrency control (CC) protocols are applied. Multiversion CC schemes [109, 159, 162] appear to be ideal candidates for read-only transaction processing in broadcasting environments since they allow read-only transactions to execute without
any interference with concurrent read-write transactions. If multiple object versions are kept in the
database system, read-only transactions can read older object versions and, thus, never need to wait
for a read-write transaction to commit or to abort in order to resolve the conflict. As with read-write
transactions, read-only transactions may be executed with various degrees of consistency. Choosing lower levels of consistency than serializability for transaction management is attractive for two
reasons: (a) The set of correct multi-version histories that can be produced by a scheduler can be increased and, hence, higher performance (i.e., transaction throughput) can be achieved. (b) Weaker
consistency levels may allow read-only transactions to read more recent object versions. Thus,
weaker consistency levels trade-off consistency for throughput performance and data currency.
While reading current (or at least “close” to current) data is necessary for read-write transactions
to preserve database consistency during updates, such requirements are not necessary for read-only
transactions to be scheduled in a serializable way. That is, read-only transactions can be executed
with serializability correctness even though they observe out-of-date database snapshots. To prevent
read-only transactions from seeing database states that are too old, thus causing users to experience
transaction anomalies related to data freshness, we need well-defined isolation levels (ILs) which
guarantee both data consistency and data currency to read-only transactions. The ANSI/ISO SQL92 specifications [14] define four ILs, namely Read Uncommitted, Read Committed, Repeatable
Read, and Serializability. Those levels do not incorporate any currency guarantees, though, and thus
are unsuitable for managing read-only transactions in distributed mobile database environments.
Theory and practice have pointed out the inadequacy and imprecise definition of the SQL
4.1. Introduction
63
ILs [23] and some redefinitions have been proposed in [10]. Additionally, a range of new ILs were
proposed that lie between the Read Committed and Serializability levels. The new intermediate ILs
were designed for the needs of read-write transactions with only three of them explicitly stating the
notion of logical time. One of those levels, called Snapshot Isolation (SI) [23], ensures data currency to both read-only and read-write transactions forcing them to read from a data snapshot that
existed by the time the transaction started. Oracle’s Read Consistency (RC) level [117] provides
stronger currency guarantees than Snapshot Isolation by guaranteeing that each SQL statement in
a transaction Ti sees the database state at least as recent as it existed by the time Ti issued its first
read operation. For subsequent read operations/SQL statements RC ensures that they observe the
database state that is at least as recent as the snapshot seen by the previous read operation/SQL
statement. Finally, Adya [9] defines an IL named Forward Consistent View (FCV) that extends SI
by allowing a read-only (read-write) transaction Ti (T j ) to read object versions created by read-write
transactions after Ti ’s (T j ’s) starting point, as long as those reads are consistent in the sense that Ti
(T j ) sees the (complete) effects of all update transactions it write-read or (write-read/write-write)
depends on.
The above mentioned levels are not ideally suitable for processing read-only transactions for
the following reasons: (a) All of them are weaker consistency levels, i.e., read-write transactions
executed at any of these levels may violate consistency of the database since none of them requires
the strictness of serializability. Consequently, read-only transactions may observe an inconsistent
database state, if they view the effects of transactions that have modified the database in an inconsistent manner. Inconsistent or bounded consistent reads may not be acceptable for some mobile
applications, thus making non-serializability levels that do not ensure database consistency to such
transactions inappropriate. (b) Another problem arises from the fact that mobile database applications may need various data currency guarantees depending on the type of application and actual
user requirements. The ILs mentioned above provide only a limited variety of data currency guarantees to read-only transactions. All levels ensure that read-only transactions read from a database
state that existed at a time not later than the transaction’s starting point. Such firm currency guarantees may be too restrictive for some mobile applications. Hence, there is a need for definition of
64
new ILs that incorporate weaker currency guarantees. Moreover, we need to define new ILs that
meet the specific requirements of (mobile) read-only transactions.
4.1.2
Contribution and Outline
This chapter’s contributions are as follows: (a) We define four new ILs that provide useful consistency and currency guarantees to mobile read-only transactions. In contrast to the ANSI/ISO
SQL-92 ILs [14] and their modifications by [23], our definitions are not stated in terms of existing
concurrency control mechanisms including locking, timestamp ordering, and optimistic schemes,
but are rather independent of such protocols in their specification. (b) We design a suite of multiversion concurrency control algorithms that efficiently implement the proposed ILs. (c) Finally,
we present the performance results of our protocols and compare them. To our knowledge, this is
the first simulation study that validates the performance of concurrency control protocols providing
various levels of consistency and currency to read-only transactions in a mobile hybrid data delivery
environment.
The remainder of the chapter is organized as follows: In Section 4.2, we introduce some notations and terminology for use throughput this chapter. In Section 4.3, we define new ILs especially
suitable for mobile read-only transactions by combining both data consistency and currency guarantees. Implementation issues are discussed in Section 4.4. Section 4.5 reports on results of an
extensive simulation study conducted to evaluate the performance of the implementations of the
newly defined ILs. Section 4.6 concludes this chapter by presenting the main research results of
this work.
4.2
Preliminaries
Before proposing new ILs providing strong semantic data consistency and data currency guarantees
to read-only transactions, it is necessary to provide a formal framework upon which their definitions
are based on. We do so by establishing our definitions of the important basic notions of database,
database state, and transaction:
4.2. Preliminaries
65
Definition 7 (Database). A database D = {x1 , x2 , . . . , xi } is a finite set of uniquely identified data
objects that can be operated on by a single atomic database operation and each data object xi ∈ D
has a finite domain dom(xi ).
Definition 8 (Database State). A database state DS is an element of the Cartesian product of the
domains of elements of D, i.e., a state associates a value with each data object in the database. More
formally, DS ⊆ dom(x1 ) × dom(x2 ) × . . . × dom(xi ), ∀xi ∈ D.
Transactions are submitted against the database by multiple, concurrent users and it is assumed
that all transaction programs are correctly formed and execute on a consistent database state and,
in the absence of other transactions, will always produce a new consistent database state. In what
follows, we use ri [x, v] (or wi [x, v]) to denote that transaction Ti issued a read (or write) operation
on data object x and the value read from (or written into) x by Ti is v. To keep the notations simple,
we assume that no transaction Ti reads and writes any object x more than once during its lifetime.
Besides, we use bi , ci , and ai to denote Ti ’s transaction management primitives begin, commit, and
abort, respectively, and the set of all read and write operations issued by transaction Ti is denoted
by OPi . We are now ready to formally define the notion of a transaction:
Definition 9 (Transaction). A transaction is a partial order Ti = (∑i , <i ) where:
1. ∑i ⊆ OPi ∪ {bi , ci , ai };
2. ai ∈ ∑i iff ci ∈
/ ∑i ;
3. let p denote either ai or ci ; for any other operation q ∈ ∑i , q <i p;
4. let p represent the transactional primitive bi ; for any other operation q ∈ ∑i , p <i q;
5. if ri [x, v], wi [x, v] ∈ OPi , then either ri [x, v] <i wi [x, v] or wi [x, v] <i ri [x, v].
Condition (1) enumerates the operations performed by a transaction. Statement (2) states that a
transaction contains either an abort or commit operation, but not both, and Point (3) guarantees that
all transaction operations occur before Ti ’s termination. Condition (4) says that all transaction oper-
66
ations are preceded by a begin operation and Point (5) finally ensures that read and write operations
on a common data object are ordered according to the ordering relation <i .
Definition 10 (Read-only and Read-Write Transaction). A transaction is a read-only transaction
if it contains no write operations, and is a read-write transaction otherwise.
As noted above, in practice, multiple transactions are likely to be executed in parallel against
the database, i.e., operations of different transactions may be interleaved. To record the relative
execution order of those operations, the transaction scheduler keeps an internal structure called a
history. Informally, a history is a partial order of the executions of transaction operations where
the operation ordering within transactions is preserved and all pairs of operations that conflict1
are ordered. Histories can be classified into two types: (a) single-version or mono-version and
(b) multi-version histories. A single-version history captures what “happens” in the execution of
a single-version or multi-version database system and is a special case of a multi-version history
since the scheduler or, more precisely, a version function f maps each read operation r j [x, v] on an
object x to its most recent write operation wi [x, v] that precedes it in the history, i.e., wi [x, v] < r j [x, v]
and if wk [x, v] also occurs in the history (i 6= k) then either wk [x, v] < wi [x, v] or r j [x, v] < wk [x, v]. A
multi-version history relaxes this restriction by allowing read requests to be mapped to appropriate,
but not necessarily up-to-date versions of data. This flexibility gives the scheduler the potential
to produce more correct histories (since read operations that arrive too late may not any more be
rejected) which, in turn, may improve the degree of concurrency in the system. A prerequisite to
exploit the performance benefits of multi-versioning is, of course, to maintain multiple versions
of objects in the system. If multiple object versions are allowed to be kept, each write operation
on an object x by transaction Ti produces a new version of it, which we denote by xi , where the
version subscript represents the index of the transaction Ti that wrote the version. Thus, each write
operation by transaction Ti in a multi-version history is always of the form wi [xi ]. If the value v is
written into xi by Ti , we use the notation wi [xi , v]. To indicate that transaction Ti has read a version
of object x that was installed by transaction T j , we denote this as ri [x j ]. In case value v of object
1 Two
operations p[x, v] and q[x, v] are said to conflict if one of them is a write operation.
4.2. Preliminaries
67
version x j has been read by Ti , we represent this by ri [x j , v]. After these notational preliminaries,
we are now in the position to give the formal definition of a multi-version history:
Definition 11 (Multi-Version History). A multi-version history MVH over a set of transactions
T = {T0 , T1 , . . . , Tn } consists of two parts — a partial order (∑T ,<MV H ) of events where:
Sn
1. ∑T = f (
i=0 ∑i )
for some multi-version function f ;
2. for each Ti ∈ T and all operations p, q ∈ Ti , if p <i q, then f (p) <MV H f (q);
3. if f (r j [x, v]) = r j [xi , v], then wi [xi , v] <MV H r j [xi , v];
4. if f (r j [x, v]) = r j [xi , v], i 6= j, and c j ∈ MV H, then ci <MV H c j ;
and a version order, , i.e., there is a total order on the committed object versions in MVH
which may be different from the relative ordering of write or commit operations in MVH.
Point (1) says that the version function f maps all operations of all transactions in the history
into appropriate multi-version operations and Condition (2) states that the mapping respects the
individual transaction orderings. Statement (3) specifies that a transaction may not read an object
version before it has been created. To ensure that Statement (3) always holds, we assume the
existence of an initialization transaction T0 that creates a so-called zero version of each data object
stored in the database. Point (4) finally states that a transaction may only commit, if all other
transactions that created object versions it read are themselves committed.
For notational convenience, we place some additional constraints on the definition of a multiversion history given above, and in what follows, we require that the version order of an object x in
a multi-version history MVH corresponds to the temporal order in which write operations of x occur
in MVH, i.e., whenever write operation wi [xi , v] immediately precedes write operation w j [x j , v] in
MVH, then xi x j . Additionally, we require that the version order in MVH cannot be different
from the order of commit events in MVH, i.e., if the version order of object x is xi x j , then
ci <MV H c j . Finally, we require that each read-write transaction T j that creates a (direct or indirect)
68
successor version of an object read by a read-write transaction Ti commits after Ti , i.e., if there are
two operations ri [xk , v] and w j [x j , v] (i 6= k, j 6= k) in MVH and xk precedes x j in the version order
(i.e., xk x j ), then ci <MV H c j .
To determine whether a multi-version history MVH satisfies certain criteria defined by an IL, a
subhistory of MVH may need to be considered. The projection P of a multi-version history MVH
w.r.t. a single transaction is given below:
Definition 12 (Transaction Projection). Let MVH denote a multi-version history, Ti a transaction
occurring partially or completely in MVH and OP(Ti ) its finite set of transaction operations. A
transaction projection of a multi-version history MVH onto Ti , denoted P(MV H, Ti ), is a subhistory MV H 0 containing transaction operations OP(MV H 0 ) = OP(Ti ), i.e., MV H 0 includes only the
operations issued by Ti .
In certain cases the projection onto transactions that committed within a certain logical time
interval is required in order to reason about scheduling correctness. This gives rise to the following
definition:
Definition 13 (Interval Committed Projection). Let MVH denote a multi-version history, and let
again p and q represent two distinct events in MVH, i.e., p, q ∈ ∑T , and p <MV H q. An interval committed projection of MVH onto the logical time interval [p, q], denoted P(MV H, [p, q]), is
the subhistory MV H 0 obtained from MVH by deleting all events that do not belong to committed
transactions and that have not been committed within the (logical) time interval [p, q] including the
interval boundaries themselves.
It is important to note that both projections preserve the relative order of the original operations.
To validate the correctness of multi-version histories w.r.t. ILs defined in Section 4.3, we need to
formalize possible direct and indirect data dependencies between transactions:
wr
Definition 14 (Direct Write-Read Dependency). A direct write-read dependency (T j δ Ti ) between transactions Ti and T j exists if there is a write operation w j which precedes a read operation
ri in MVH according to <MV H and Ti accesses the object version written by T j . In what follows, we
4.2. Preliminaries
69
wr
denote write-read dependencies by δ or wr.
ww
Definition 15 (Direct Write-Write Dependency). A direct write-write dependency (T j δ Ti ) between transactions Ti and T j exists if there exists a write operation w j which precedes a write
operation wi in MVH according to <MV H , and wi produces the successor object version of some
ww
object version written by w j . We denote write-write dependencies by δ or ww.
rw
Definition 16 (Direct Read-Write Dependency). A direct read-write dependency (T j δ Ti ) occurs
between two transactions Ti and T j if there is a read operation r j and a write operation wi in MVH
in the order r j <MV H wi and wi installs the successor object version of the object version read by r j .
rw
Read-write dependencies are denoted by δ or rw.
If the type of dependency between two distinct transactions does not matter, we say that they
are in an arbitrary dependency:
Definition 17 (Arbitrary Direct Dependency). Two transactions Ti and T j are in an arbitrary
direct dependency in MVH, if there exists a direct rw-, ww-, or wr-dependency between Ti and T j .
?
We denote arbitrary direct dependencies by δ or ?.
Definition 18 (Arbitrary Indirect Dependency). Two transactions transactions Ti and T j are in an
?
?
?
?
arbitrary indirect dependency in MVH, if there exists a sequence hT j δ Tk1 δ Tk2 . . . δ Tkn δ Ti i for
∗
(n ≥ 1) in MVH. We denote arbitrary indirect dependencies by δ or ∗.
We are now in the position to define the direct multi-version serialization graph of a multiversion history MVH, called MVSG(MVH), which in contrast to the Serialization Graph definition
of [26] contains labeled edges indicating which dependencies occur between the transactions.
Definition 19 (Direct Multi-Version Serialization Graph). A direct multi-version serialization
graph MVSG(MVH) is defined on a multi-version history MVH, with nodes representing transactions that successfully terminated, and each labeled edge from transaction Ti to T j corresponds to a
direct read-write, write-write, or write-read dependency, i.e., there is a rw, ww, or wr-dependency
edge from transaction Ti to transaction T j if and only if T j directly rw, ww, or wr-depends on Ti .
70
4.3
4.3.1
New Isolation Levels Suitable for Read-only Transactions
Why Serializability may be Insufficient
Serializability is the standard criterion for transaction processing in both stationary and mobile
computing. Its importance and popularity is related to the fact that it prevents read-write transactions from violating database consistency by assuring that they always transform the database from
one consistent state into another. With respect to read-only transactions, serializability as defined
in [26] guarantees that all read-only transactions perceive the same serial order of read-write transactions. Additionally, serializability requires that read-only transactions serialize with each other.
However, the serializability criterion in itself is not sufficient for preventing read-only transactions
from experiencing anomalies related to data currency as the following example shows:
Example 3.
MV H1 = b0 w0 [x0 , 2:40 pm] b1 r1 [z0 , cloudy] w0 [y0 , 2:50 pm] c0 w1 [z1 , blizzard] c1
b2 r2 [z1 , blizzard] r2 [x0 , 2:40 pm] w2 [x2 , 2:50 pm] c2 b3 r3 [x0 , 2:40 pm] b4
r4 [x2 , 2:50 pm] b5 r5 [z1 , blizzard] r5 [y0 , 2:50 pm] r3 [y0 , 2:50 pm] c3 w5 [y5 , 3:00 pm]
c5 r4 [y5 , 3:00 pm] c4
[x0 x2 , y0 y5 , z0 z1 ]
History MV H1 might be produced by a flight scheduling system supporting multiple object versions, which is the rule rather than an exception in mobile distributed database systems. In MV H1 ,
transaction T0 is a blind write transaction that sets the take-off times of flights x and y, respectively,
and T1 is an event-driven transaction initiated automatically by the airport weather station to indicate an imminent weather change. Due to the inclement weather forecast, the Air Traffic Control
Center instantly delays both scheduled flights by 10 minutes through transactions T2 and T5 . At
the same time, two co-located members of the ground staff equipped with PDAs query the airport
flight scheduling system in response to passengers’ requests to check the actual take-off times of
flights x and y (T3 and T4 ). While one of the employees (who invokes transaction T3 ) may locate
the required data in his local cache, the other (who invokes transaction T4 ) may have to connect to
4.3. New Isolation Levels Suitable for Read-only Transactions
71
the central database in order to satisfy his data requirements. As a consequence, both persons read
from different database snapshots without serializability guarantees being violated, which can be
easily verified by sketching the multi-version serialization graph (MVSG) of MV H1 .
T1
wr
wr,ww
T0
T4
wr
T5
T2
wr
ww
T3
wr,
wr
rw
wr
Figure 4.1: Multi-version serialization graph of MVH1 .
As the above example clearly illustrates, serializability by itself may not be a sufficient requirement for avoiding phenomena related to reading from old database snapshots. This shortage is
eliminated in the following subsections.
4.3.2
BOT Serializability
Influenced by the findings of the previous example, we now define two new ILs that combine
the strictness of serializability with firm data currency guarantees (see below for the definition of
this notion). Unlike the ANSI definition of serializability, our definition ensures well-defined data
currency to read-only transactions. The existing ANSI specification of serializability and its redefinition by [10] do not contain any data currency guarantees for read-only transactions. Under those
levels, read-only transactions are allowed to be executed without any restrictions w.r.t. the currency
of the observed data. We will define our ILs in terms of histories. We associate a directed graph
with each newly defined isolation level ILi . A multi-version history MVH provides ILi guarantees,
if the corresponding graph is acyclic.
In what follows, we define only such ILs that are especially attractive for the mobile broadcasting environment where clients run data-dissemination applications forced to read (nearly) up-todate database objects and are expected to be rarely disconnected from the server. Based on some
72
research done on real-time transactions [1, 63], we divide data currency requirements into three
categories: transactions with (a) strong, (b) firm, and (c) weak requirements.
Definition 20 (Strong Data Currency Requirements). We say that a read-only transaction Ti has
strong data currency requirements, if it needs to read committed data that is (still) up-to-date by Ti ’s
commit time. Since all read operations of Ti must be valid at the end of the transaction’s execution,
we also say that Ti runs with End of Transaction (EOT) data currency guarantees.
Note that the EOT data currency property requires only that writes of committed read-write transactions must not interfere with operations of read-only transactions, i.e., object updates of uncommitted transactions are not considered by that property.
The firm currency requirement, in turn, provides slightly weaker currency guarantees.
Definition 21 (Firm Data Currency Requirements). We say that a read-only transaction Ti has
firm data currency requirements, if it needs to observe committed data that is at least as recent as of
the time Ti started its execution.
Firm currency requirements are attractive for the processing of read-only transactions in mobile
broadcasting environments for mainly two reasons: (a) First, and most importantly from the data
currency perspective, they guarantee that read-only transactions observe up-to-date or nearly up-todate data objects, which is an important criterion for data-dissemination applications such as news
and sports tickers, stock market monitors, traffic and parking information systems, etc. (b) Second,
and contrary to the strong currency requirements, they can easily and instantaneously be validated
at the client site without any communication with the server.
For some mobile database applications, however, weak data currency requirements may suffice.
Definition 22 (Weak Data Currency Requirements). We say that a read-only transaction Ti has
weak data currency requirements, if it sees a database state the way it existed at some point in time
before its actual starting point.
Despite the unquestionable attractiveness of weaker currency requirements, especially for ap-
73
plications running on clients with frequent disconnections, we believe that the majority of datadissemination applications require firm currency guarantees which is supported by the literature [145,163]. Thus, in this thesis we focus on firm currency guarantees and leave out the extension
of known ILs by strong and weak data currency requirements as future work. Prior to specifying
a new IL that provides serializability along with firm data currency guarantees, some additional
concepts are to be introduced.
As defined so far, a multi-version history MVH consists of two components: (a) a partial order
of database events (∑T ) and (b) a total order of object versions (). Now, we extend the definition of a multi-version history by specifying for each read-only transaction a start time order that
relates its starting point to the commit time of previously terminated read-write transactions. The
association of a start time order with a multi-version history was first introduced in the context of
the Snapshot Isolation (SI) level [23] to provide more flexibility for implementations. According
to the SI concept, the database system is free to choose a starting point for a transaction as long as
the selected starting point is some (logical) time before its first read operation. Allowing the system
to choose a transaction’s starting point without any restrictions is inappropriate in situations where
the user expects to read from a database state that existed at some time close to the transaction’s
actual starting point. Thus, for applications/transactions to work correctly, the database system
needs to select a transaction’s starting point in accordance with the order of events in MVH. We
now formally define the concept of start time order.
Definition 23 (Start Time Order). A start time order of a multi-version history MVH over a set
of committed transactions T = {T0 , T1 , . . . , Tn } is a partial order (ST , <ST ) of events such that:
1. ST =
Sn
i=0
{ci , bi };
2. ∀Ti ∈ T, bi <ST ci ;
3. If Ti , T j ∈ T , then either c j <ST bi or ci <ST b j or (bi <ST c j and b j <ST ci );
4. If wi , w j ∈ MV H, wi w j , and c j <ST bk , then ci <ST bk .
74
According to Statement 1 the start time order relates begin and commit operations of committed
transactions in MVH. Point 2 states that a transaction’s starting point always precedes its commit
point. Condition 3 states that a scheduler S has three possibilities in ordering the start and commit
points of any committed transactions Ti and T j in MVH. A scheduler S may choose Ti ’s (T j ’s)
starting point after T j ’s (Ti ’s) commit point or, if both transactions are concurrent, neither starts its
execution after the other transaction has committed. Condition 4 finally specifies that if S chooses
Tk ’s starting point after T j ’s commit point and T j overwrites the object installed by Ti then Ti ’s
commit point must precede Tk ’s starting point in any start time order.
For notational convenience, in what follows, we do not specify a start time order for all committed transactions in MVH. Instead, we only associate with each MVH the start time order between
read-only and read-write transactions. After laying these foundations, we are ready to define the
begin-of-transaction (BOT) data currency property required for the definition of the BOT Serializability IL.
Definition 24 (BOT Data Currency). A read-only transaction Ti possesses BOT data currency
guarantees if for all read operations invoked by Ti the following invariants hold:
1. If the pair w j [x j ] and ri [x j ] is in MVH, then c j <ST bi
2. If there is another write operation wk [xk ] of a committed transaction Tk in MVH, then either
(a) ck <ST bi and xk x j , or
(b) bi <ST ck .
Condition 1 and Condition 2 ensure that all read operations performed by a read-only transaction Ti are from a snapshot of committed data values valid as of Ti ’s starting point. Note that we
ignore transaction aborts in our definition of BOT data currency since subsequent definitions that
incorporate this criterion only consider MVHs of committed transactions. On the basis of the BOT
data currency property, the serializability IL can be extended as follows.
75
Definition 25 (BOT Serializability). A multi-version history MVH over a set of read-only and
read-write transactions is BOT serializable, if MVH is serializable in the sense that the projection
of MVH onto all committed transactions in MVH is equivalent to some serial history and the BOT
data currency property holds for all read-only transactions in MVH.
Note that we do not explicitly define data currency guarantees for read operations of read-write
transactions at this stage, but we make up for it in Chapter 6. To determine if a given multiversion history MVH satisfies the requirements of the BOT Serializability IL, we use a variation
of the MVSG, called start time multi-version serialization graph (ST-MVSG), which is defined as
follows:
Definition 26 (Start Time Multi-Version Serialization Graph). Let MVH denote a history
over a set of read-only and read-write transactions T = {T0 , T1 , . . . , Tn } and commit(MVH) represent a function that returns the committed transactions of MVH. A start time multi-version serialization graph for history MVH, denoted ST-MVSG(MVH), is a directed graph with nodes
N = commit(MV H) and labeled edges E such that:
1. There is an edge Ti → T j (Ti 6= T j ) if T j ?-depends on Ti ;
2. There is an edge Ti → T j (Ti 6= T j ) whenever there exists a set of operations
{ri [x j ], w j [x j ], wk [xk ]} such that either w j wk and ck <ST bi , or bi <ST c j .
Theorem 1. Let MVH be a multi-version history over a set of committed transactions
T = {T0 , T1 , . . . , Tn }. Then, MVH is executed under BOT Serializability, if ST-MVSG(MVH) is
acyclic.
Proof. We prove Theorem 1 by contraposition. We try to show that, if any of the conditions of
Definition 26 are false, then ST-MVSG(MVH) contains a cycle. Suppose that MVH is a nonserializable multi-version history, i.e., there exists a serialization order hT0 , T1 , . . . , Tk , T0 i such that
T1 ?-depends on T0 , . . . ?-depends on T1 , and T0 ?-depends on Tk . By Definition 26, there is an
edge Ti → T j in ST-MVSG(MVH) whenever T j ?-depends on Ti . Thus, ST-MVSG(MVH) contains
a cycle whenever MVH is non-serializable. Next, suppose that the BOT data currency property is
76
violated. That is, there exists a pair of operations ri [x j ] and w j [x j ] such that either w j wk and
ck <ST bi , or bi <ST c j . By Definition 26, there exists an edge Ti → T j . Since Ti wr-depends on T j ,
there is also the edge T j → Ti in ST-MVSG(MVH). Therefore, ST-MVSG(MVH) contains again a
cycle.
4.3.3
Strict Forward BOT Serializability
The currency requirements of the BOT Serializability IL may not be ideally suited for processing read-only transactions in mobile broadcasting environments for at least two reasons: (a) First,
mobile read-only transactions are mostly long-running in nature due to such factors as interactive
data usage, intentional or accidental disconnections, and/or high communication delays. Therefore,
disallowing a long-lived read-only transaction to see object versions that were created by committed read-write transactions after its starting point might be too restrictive. (b) Another reason for
allowing read-only transactions to read “forward” beyond their starting points is related to version
management costs. Reading from a snapshot of the database that existed at the time when a readonly transaction started its execution can be expensive in terms of storage costs. If database objects
are frequently updated, which is a reasonable assumption for data-dissemination environments [41],
multiple previous object versions have to be retained in various parts of the client-server system architecture. Allowing read-only transactions to view more recent data than permitted by the BOT
data currency property is efficient, since it enables purging out-of-date objects sooner, thus allowing
to keep more recent objects in the database system. An IL that provides such currency guarantees
while still enforcing degree 3 data consistency is called Strict Forward BOT Serializability. Prior
to defining this IL, we formulate a rule that is sufficient and practicable for determining whether a
read-only transaction Ti may be allowed to see the (complete) effects of an update transaction that
committed after Ti ’s starting point without violating serializability requirements.
Read Rule 1 (Serializable Forward Reads). Let Ti denote a read-only transaction that
needs to observe the complete effects of an update transaction T j that committed after
Ti ’s starting point as long as the serializability requirement holds.
Further, let Tbei f ore
77
denote a set of read-write transactions that committed before Ti ’s starting point and let
Tai f ter represent a set of read-write transactions that committed after Ti ’s starting point,
but before the commit point of T j and whose effects have not been seen by Ti , i.e.,
∀k(Tk ∈ Tai f ter ) : (bi <ST ck ∧ck <MV H c j ∧ if wk [xk ] occurs in P(MV H, [bi , c j ]), then there is no ri [xk ]
in MVH. Ti is forced to read “forward” and see the effects of T j :
1. If the invariant ReadSet(Ti ) ∩ W riteSet(Tai f ter ∪ T j ) = 0/ is true, i.e., the intersection of the
actual read set of Ti and the write set of all read-write transactions that committed between
Ti ’s starting point and T j ’s commit point (including T j itself) must be an empty set.
Otherwise, Ti is forced to observe the database state that was valid as of the time Ti started, i.e.,
Ti is obliged to read the most recent version of an object produced by a read-write transaction in
Tbei f ore . In what follows, we denote the fact that Ti is permitted to read forward on the object versions
produced by T j , by Ti →s f r T j .
Note that in Read Rule 1 the read set and the write set refer to data objects and not to their
dedicated versions. This will be the case throughout the chapter if not otherwise specified.
An example illustrating how the invariants of Read Rule 1 are applied to decide whether a readonly transaction Ti can safely observe the effects of update transactions that committed after its
starting point is as follows:
Example 4.
MV H2 = b0 w0 [x0 ] w0 [y0 ] w0 [z0 ] c0 b1 r1 [y0 ] b2 r2 [x0 ] w2 [z2 ] b3 c2 b4 r4 [x0 ] r3 [z2 ] w3 [y3 ] c3
w4 [x4 ] c4
[x0 x4 , y0 y3 , z0 z2 , c0 <ST b1 ]
In MV H2 , T0 blindly writes the objects x, y, and z. After T0 ’s commit point, the read-only transaction
T1 starts running and reads the previously initialized value of y. T2 subsequently observes the value
of x and produces a new version of z, which is, in turn, read by T3 . In the meantime, T4 is started and
accesses object x. Thereafter, T3 creates a new version of object y. Finally, T4 updates the initialized
value of x and commits. Now suppose transaction T1 wants to read object z and thereafter object x.
If we assume that versions {x0 , x4 } and {z0 , z2 } are maintained in the database by the time when
78
T1 ’s read requests arrive, the scheduler has to decide which version of the objects z and x T1 can
safely observe. If T1 runs at the BOT Serializability IL, the scheduler’s decision is straightforward
since T1 needs to access the most recent object versions that existed by its starting point. In this
case, T1 would have to read the versions created by T0 . However, if the underlying IL requires
that T1 should see the updates of transactions that committed after its BOT point as long as the
serializability criterion is not violated, the scheduler has to check for every object T1 intends to read
whether there exists any committed object version that was installed after T1 ’s starting point and,
if so, whether Read Rule 1 is satisfied. With regard to objects x and z, the reader can easily see
from MV H2 that both objects were updated after T1 ’s BOT point. Hence, the scheduler has to verify
for both recently created object versions whether invariant ReadSet(T1 ) ∩W riteSet(Ta1f ter ∪ T2 ) = 0/
and/or ReadSet(T1 ) ∩ W riteSet(Ta1f ter ∪ T4 ) = 0/ holds. Object z is requested first and therefore
the scheduler intersects the current read set of T1 (ReadSet(T1 ) = {y}) with the write set of all
transactions that committed after T1 ’s BOT point and the commit point of T2 that installed the latest
version of z (W riteSet(Ta1f ter ∪ T2 ) = {z}). Since the result of the intersection is an empty set, the
scheduler allows T1 to read the most up-to-date version of z. Now we repeat the same procedure
for object x. This time, however, the read set of T1 consists of two objects (ReadSet(T1 ) = {y, z})
and the write set of the transactions that committed between T4 ’s commit point and T1 ’s start point
comprises also two objects (W riteSet(Ta1f ter ∪ T4 ) = {x, y}). Since the intersection of the read and
write sets is non-empty, T1 is not allowed to read “forward” on object x as Read Rule 1 would
otherwise be violated. Therefore, T1 is forced to observe the object version of x that existed by its
BOT point, namely x0 .
It can be shown that Read Rule 1 produces only correct read-only transactions in the sense
that they are serializable w.r.t. all committed update transactions and all other committed read-only
transactions in a multi-version history MVH:
Theorem 2. In a multi-version history MVH that contains a set of read-write transactions Tupdate
such that all transactions in Tupdate are serializable, each read-only transaction Ti satisfying Read
Rule 1 is serializable w.r.t. Tupdate as well.
79
Proof. Let Tbei f ore denote a set of read-write transactions that committed before Ti ’s starting point
and let Tai f ter represent a set of read-write transactions that committed after Ti ’s starting point and
whose effects have not been seen by Ti . Additionally, let T fiorward denote the set of read-write
transactions that committed after Ti ’s starting point, but whose effects have been observed by Ti .
Suppose, by the way of contradiction, that MVH contains a cycle hTi → T j0 → . . . → T jn → Ti i,
where Ti is a read-only transaction and T jn with n ≥ 0 is either a read-only or read-write transaction.
By assumption, the set of read-write transactions Tupdate = Tbei f ore ∪ Tai f ter ∪ T fiorward in MVH is
serializable, thus the cycle can only occur if at least one read-only transaction is part of it.
For now suppose the existence of a so-called single-query cycle, i.e., Ti is the only read-only
transaction involved in it. Then, in order for the cycle to be formed, Ti requires both an incoming and
outgoing edge. Since Ti is a read-only transaction which, by definition, performs read operations
only, the outgoing edge Ti → T j0 is to a read-write transaction T j0 that wrote the successor version of
an object read by Ti and the incoming edge T jn → Ti is to a read-write transaction T jn that installed
an object version read by Ti . By Read Rule 1, Ti is guaranteed to see either all or none of the
effects of any read-write transaction and thus, the outgoing edge Ti → T j0 and the incoming edge
T jn → Ti must involve two distinct read-write transactions, i.e., T j0 6= T jn . A further prerequisite
for a cycle to occur is that T jn ?-depends or *-depends on T j0 . According to Definition 11 and
the specified version and commit order constraints, such a dependency implies that T j0 committed
before T jn in MVH, i.e., c j0 <MV H c jn . Since T j0 rw-depends on Ti , it also follows that T j0 is part
of Tai f ter . Additionally, since T j0 committed before T jn and T jn wr-depends on Ti , it further follows
that T jn is included in T fiorward . However, if all those dependencies and ordering relationships exist,
Read Rule 1 would be violated since it enforces Ti to see the complete effects of T j0 whenever the
condition ReadSet(Ti ) ∩W riteSet(Tai f ter ∪ T j0 ) = 0/ holds. Since T jn ?-depends or *-depends on T j0
and T j0 committed before T jn in MVH, the condition ReadSet(Ti ) ∩W riteSet(Tai f ter ∪ T j0 ) = 0/ holds
whenever ReadSet(Ti ) ∩W riteSet(Tai f ter ∪ T jn ) = 0/ is true. Consequently, if Ti is allowed to see the
effects of T jn , it is also allowed to observe the effects of T j0 and therefore, T j0 cannot rw-depend on
Ti which is a prerequisite for the cycle to occur.
Now let us assume that a multi-query cycle exists. Again, for such a cycle to be produced, Ti
80
needs to have at least one incoming and one outgoing edge to two distinct read-write transactions
T j0 and T jn . However, and in contrast to the previous case, T jn does not need to ?-depend or *-depend
on T j0 any more. Now, and in order for a cycle to occur, it suffices that another read-only transaction
T jm with 0 < m < n (directly or indirectly) wr-depends on T j0 and T jn (directly or indirectly) rwdepends on T jm , i.e., the cycle may have the form hTi → T j0 → . . . → T jm → . . . → T jn → Ti i, where
m > 0 and n > m. In order for T j0 to rw-depend on Ti , T j0 needs to be a member of Tai f ter and the
condition ReadSet(Ti ) ∩ W riteSet(Tai f ter ∪ T j0 ) 6= 0/ needs to be true. Similar, for T jn to rw-depend
j
j
on T jm , T jn needs to be part of Ta fmter and the condition ReadSet(T jm ) ∩ W riteSet(Ta fmter ∪ T jn ) 6= 0/
needs to hold. Now suppose without loss of generality that T jn committed before T j0 in MVH, i.e.,
c jn <MV H c j0 . Then, in order for the cycle to occur, T jm must have seen the effects of T j0 . But this
j
clearly contradicts the condition ReadSet(T jm ) ∩W riteSet(Ta fmter ∪ T j0 ) = 0/ of Read Rule 1 since it
j
can only be fulfilled if there exists no other read-write transaction T jk ∈ Ta fmter that committed before
T j0 in MVH and that does not allow Ti to read “forward” on its effects. Therefore, if a scheduler
operates according to Read Rule 1, single- and multi-query cycles cannot occur in MVH.
The following new IL incorporates the Serializable Forward Reads property, and is defined as
follows:
Definition 27 (Strict Forward BOT Serializability). A multi-version history MVH over a set
of read-only and read-write transactions is a strict forward BOT serializable history, if all of the
following conditions hold:
1. MVH is serializable, and
2. if the pair ri [x j ] and w j [x j ] of a read-only transaction Ti and a read-write transaction T j is in
MVH, then either:
(a) c j <ST bi and there is no write operation wk [xk ] of a committed transaction Tk in MVH
such that ck <ST bi and x j xk , or
(b) bi <ST c j , w j [x j ] <MV H ri [x j ], Ti →s f r T j , and there is no write operation wk [xk ] of a
committed transaction Tk in MVH such that x j xk , ck <MV H ri [x j ], and Ti →s f r Tk . 4.3. New Isolation Levels Suitable for Read-only Transactions
81
To check whether a given history MVH is strict forward BOT serializable, we again use a
variant of the MVSG, called strict forward read multi-version serialization graph, which is defined
as follows:
Definition 28 (Strict Forward Read Multi-Version Serialization Graph). A strict forward read
multi-version serialization graph for a multi-version history MVH, denoted SFR-MVSG(MVH), is
a directed graph with nodes N = P(MV H), where P(MV H) denotes the committed projection onto
MVH and labeled edges E such that:
1. There is an edge Ti → T j (Ti 6= T j ), if T j ?-depends on Ti .
2. There is an edge Ti → T j (Ti 6= T j ), whenever there exists a pair of operations ri [x j ] and w j [x j ]
of a read-only transaction Ti and a read-write transaction T j such that w j wk and ck <ST bi .
3. There is an edge Ti → T j (Ti 6= T j ), whenever there exists a pair of operations ri [x j ] and
w j [x j ] of a read-only transaction Ti and a read-write transaction T j such that bi <ST c j ,
c j <MV H ri [x j ], Ti →s f r T j , and there is a write operation wk [xk ] of a committed transaction Tk
in MVH such that x j xk , ck <MV H ri [x j ], and Ti →s f r Tk .
Theorem 3. A history MVH consisting of committed read-only and read-write transactions executes
under Strict Forward BOT Serializability, if SFR-MVSG(MVH) is acyclic.
Proof. We show that the contrapositive of Theorem 3 holds. If Property 1 of Definition 27 is
violated, then MVH is a non-serializable history and SFR-MVSG(MVH) contains a cycle according
to Point 1 of Definition 28. Now suppose that Requirement 2 of Definition 27 does not hold. That
is, there exists a pair of operations ri [x j ] and w j [x j ] such that either:
1. c j <ST bi and there exists a write operation wk [xk ] such that ck <ST bi , x j xk or
2. bi <ST c j , c j [x j ] <MV H ri [x j ], and ¬Ti →s f r T j or
3. bi →MV H c j , c j [x j ] <MV H ri [x j ], and there exists a write operation wk [xk ] such that ck <ST bi ,
x j xk , Ti →s f r Tk .
82
In the first case, there is an edge Ti → T j according to Point 2 of Definition 28 and an edge
T j → Ti since Ti write-read depends on T j . Thus, SFR-MVSG(MVH) contains a cycle. In the
second case, there is again an edge T j → Ti since Ti write-read depends on T j . By assumption,
the property ¬Ti →s f r T j holds, which, in turn, requires that there is another read-write transaction
Tk in MVH that produced some object version xk that has been seen by Ti and T j installed xk ’s
(direct or indirect) successor object version or, alternatively, T j ?-depends or *-depends on some
read-write transaction Tl (k 6= l), whose effects Ti is also forbidden to see, i.e., ¬Ti →s f r Tl . This
implies that T j (directly or indirectly) rw-depends on Ti or Tl (directly or indirectly) rw-depends on
Ti . Thus, there is either an edge Ti → T j or a chain of edges from Ti to T j through Tl according
to Point 1 of Definition 28. Thus, SFR-MVSG(MVH) is cyclic. In the last case, there is an edge
Ti → T j from Point 3 of Definition 28 and an edge T j → Ti since Ti directly reads from T j . Again,
SFR-MVSG(MVH) contains a cycle.
4.3.4
Update Serializability
While the strictness of serializability may be necessary for some read-only transactions, the use of
such strong criteria often is overly restrictive and may negatively affect the overall system performance. Even worse, serializability does not only trade consistency for performance, but it also has
an impact on data currency. Such drawbacks can be eliminated or at least diminished by allowing
read-only transactions to be executed at weaker ILs. Various correctness criteria have been proposed in the literature to achieve performance benefits by allowing non-serializable execution of
read-only transactions. While some forms of consistency such as Update Serializability/Weak Consistency [29, 61, 162] or External Consistency/Update Consistency [29, 159] require that read-only
transactions observe a consistent database state, others such as Epsilon Serializability [88] allow
them to view transaction-inconsistent data. We believe that the majority of read-only transactions
need to see a transaction-consistent database state and therefore we focus solely on ILs that provide
such guarantees. An IL that is strictly weaker than serializability and allows read-only transactions
to see a transaction-consistent states is the Update Serializability (US) level which can be formally
defined as follows:
83
Definition 29 (Update Serializability). Let us denote the set of committed read-write transactions
by Tupdate = {T0 , T1 , . . . , Tn } and the projection of MVH onto Tupdate by P(MVH, Tupdate ). A multiversion history MVH over a set of read-only and read-write transactions is an update serializable
history, if for each read-only transaction Ti in MVH the subhistory P(MV H, Tupdate ) ∪ P(MV H, Ti )
is serializable.
If there are no read-only transactions in MVH, then only the subhistory
P(MV H, Tupdate ) needs to be serializable.
Update Serializability differs from the Serializability IL by allowing read-only transactions to
serialize individually with the set of committed read-write transactions in a multi-version history
MVH, i.e., it relaxes the strictness of the serializability criterion by requiring that read-only transactions are serializable w.r.t. committed read-write transactions, but not w.r.t. other committed
read-only transactions.
4.3.5
Strict Forward BOT Update Serializability
Update Serializability as defined above allows different read-only transactions to view different
transaction-consistent database states that result from different serialization orders of read-write
transactions. By not requiring that all read-only transactions have to see the same consistent state,
more concurrency between read-only and read-write transactions is possible. However, higher
transaction throughput by relaxing the consistency requirement may not be achieved at the cost
of providing no or unacceptable data currency guarantees to users. It is obvious that Update Serializability lacks any currency requirements, thus we need to extent the Update Serializability IL by
incorporating such guarantees.
To illustrate this requirement, we propose the following example history that has been produced
by our flight scheduling system:
Example 5.
MV H3 = b0 w0 [x0 , 2:40 pm] b1 r1 [z0 , cloudy] w0 [y0 , 2:50 pm] c0 w1 [z1 , blizzard] c1 b2
r2 [z1 , blizzard] b3 r3 [z1 , blizzard] r2 [x0 , 2:40 pm] w2 [x2 , 2:50pm] c2 r3 [y0 , 2:50 pm]
b4 r4 [x0 , 2:50 pm] w3 [y3 , 3:00 pm] c3 b5 r5 [y0 , 2:50 pm] b6 r6 [z1 , blizzard]
84
r6 [x2 , 2:50 pm] w6 [x6 , 3:00 pm] c6 b7 r7 [z1 , blizzard] r7 [y3 , 3:00 pm] w7 [y7 , 3:10 pm]
c7 r5 [x6 , 3:00 pm] c5 r4 [y7 , 3:10 pm] c4
[x0 x2 , y0 y3 y7 , z0 z1 , c2 <ST b4 , c3 <ST b5 ]
History MV H3 represents an extension of MV H1 since it contains two additional updates of the
departure time of the flights x and y. As in Example 3 take-off times of fights x and y need to
be delayed due to an imminent change in local weather conditions. The first amendment of the
flight’s schedule is performed by transactions T2 and T3 . Since weather conditions are not going to
improve in the foreseeable future both flights need to be rescheduled repeatedly which is carried
out by transactions T6 and T7 . Between both modifications of the departure times, two employees
of the airport personnel are asked by passengers to query the flight schedule system to get the
latest data on the status of flights x and y. To improve application response time, this time both
co-located employees initiate their read-only transactions T4 and T5 with Update Serializability
guarantees. At the transactions’ start time, both employees are disconnected from the central flight
scheduling system due to their unfavorable position with regard to the access points of the wireless
LAN at the airport. However, despite being disconnected, both read-only transactions start their
operations since their first requested objects (x and y, respectively) are located in the memory of
their PDAs. Since the other requested object is not cache-resident, both clients need to wait to be
reconnected to proceed with transaction processing. By the time both clients get reconnected, both
flights have been delayed repeatedly and transactions T4 and T5 read the latest data on the flights y
and x, respectively.
To illustrate that the scheduler has produced a correct schedule, Figure 4.2 shows the MVSG
of history MV H3 . It is easy to see that MV H3 is an update serializable history since the graph’s
cycle hT5 → T3 → T7 → T4 → T2 → T6 → T5 i can be eliminated by removing either T4 or T5 from
MVSG(MV H3 ). Although both read-only transactions are processed in compliance with the Update
Serializability requirements, it is easy to imagine that the produced query results are undesirable
since they may be confusing to the database users especially if they communicate with each other
in order to share the information obtained. Again, this example provides evidence that conventional
isolation levels need to be redefined or extended in order to be appropriate for read-only transactions
85
with data currency constraints.
T7
wr
wr
T1
wr
wr
T5
wr
rw
rw
ww,
wr
wr
T3
w T0 wr,ww
wr,w
wr
T2
ww,
wr
T6
wr
T4
Implications on transaction correctness due to the reads from out-of-date objects as shown in
Example 5 can be diminished by adding data currency guarantees to the definition of Update Serializability. As data currency and consistency are orthogonal concepts, it is possible to combine
Update Serializability with various types of currency. As before, we concentrate on the BOT data
currency type, since applications frequently require the values of the disseminated objects to be
up-to-date or at least “close” to the most current values [41, 145, 163]. Actually there is no need to
define a new IL that provides BOT data currency guarantees in combination with Update Serializability correctness since such a level would produce scheduling results that are consistent with the
results produced by the already defined BOT Serializability degree. Nevertheless, extending Update Serializability by the requirement that a read-only transaction Ti must perceive the most recent
version of committed objects that existed by Ti ’s starting point or thereafter seems to be a valuable
property in terms of currency and performance. However, forward reads beyond Ti ’s start point
should only be allowed, if the Update Serializability criterion is not violated. In order to determine
whether a read-only transaction Ti can safely read “forward” on some object version x that has been
created by a committed read-write transaction T j after its starting point, the following property can
be used:
Read Rule 2 (Update Serializable Forward Reads). Let Ti denote a read-only transaction in a
multi-version history MVH that requires to observe the (complete) effects of a read-write transaction T j that committed after Ti ’s starting point as long as the Update Serializability requirements
are not violated. Again, let Tbei f ore denote a set of read-write transactions that committed before
Ti ’s starting point and let Tai f ter represent a set of read-write transactions that committed after Ti ’s
86
starting point, but before the commit point of T j and whose effects have not been seen by Ti , i.e.,
in MVH. Ti is allowed to read “forward” and see the (complete) effects of T j :
1. If the invariant ReadSet(Ti ) ∩W riteSet(Tai f ter ∪ T j ) = 0/ holds or
2. If the condition ReadSet(Ti ) ∩W riteSet(T j ) = 0/ holds and there is no read-write transaction
Tk in MVH ( j 6= k, i 6= k) such that bi <ST ck , ck <ST c j , ¬Ti →us f r Tk , and T j ?-depends or
*-depends on Tk .
Otherwise, Ti is forced to observe the database state that was valid at its start time, i.e., Ti is obliged
to read the most up-to-date version of an object produced by a read-write transaction in Tbei f ore . In
what follows, we represent the fact that Ti is allowed to read “forward” to observe the effects of T j
by Ti →us f r T j .
As before, it can be shown that Read Rule 2 produces only correct histories in the sense that each
read-only transaction sees a serial order of all committed read-write transactions in a multi-version
history MVH.
Theorem 4. In a multi-version history MVH that contains a set of read-write transactions Tupdate
such that all read-write transactions in Tupdate are serializable, each read-only transaction Ti satisfying Read Rule 2 is update serializable w.r.t. Tupdate as well.
Proof. Again, let Tai f ter represent a set of read-write transactions that committed after Ti ’s starting
point and whose effects have not been seen by Ti and let T fiorward denote the set of read-write
transactions that committed after Ti ’s starting point, but whose effects have been observed by Ti . By
Definition 29, a read-only transaction Ti is said to be update serializable if it observes a serializable
database state but unlike serializability, the serial ordering observed by Ti could be different from
that observed by other read-only transactions. As a result, Update Serializability permits multiquery cycles involving multiple read-only transactions and one or more read-write transactions, but
prohibits single-query cycles involving a single read-only transaction and one or more read-write
transactions. Now suppose, by the way of contradiction, that MVH contains a single-query cycle
87
hTi → T j0 → . . . → T jn → Ti i, where Ti is a read-only transaction and T jn with n ≥ 0 is a readwrite transaction of Tupdate . By assumption, the set of read-write transactions Tupdate in MVH is
serializable, thus the cycle can only occur between a single read-only transaction and the read-write
transactions in Tupdate . In order for the above described cycle to be produced, Ti must have both
an incoming and outgoing edge. Thereby, the outgoing edge Ti → T j0 is to point to a read-write
transaction T j0 that wrote the successor version of an object read by Ti and the incoming edge
T jn → Ti is to originate from a read-write transaction T jn that installed an object version read by Ti .
By Read Rule 2, Ti is guaranteed to either miss the effects or observe the complete effects of any
read-write transaction and thus, the outgoing edge Ti → T j0 and the incoming edge T jn → Ti must
involve two distinct read-write transactions, i.e., T j0 6= T jn . A further prerequisite for the cycle to
occur is that T jn ?-depends or *-depends on T j0 . Now suppose (in addition to that T j0 rw-depends on
Ti and that Ti wr-depends on T jn ) that T jn ?-depends or *-depends on T j0 . This implies according to
Definition 11 and the additionally specified version and commit order constraints that T j0 committed
before T jn , i.e., c j0 <MV H c jn . Since T j0 rw-depends on Ti and T j0 committed before T jn , it follows
that T j0 is contained in Tai f ter . Also since T jn wr-depends on Ti and it committed after T i ’starting
point, it further follows that T jn is being part of T fiorward . Then, however, Read Rule 2 would be
violated since it prohibits Ti to observe the effects of T jn , if there exists a read-write transaction
T j0 in the subhistory P(MV H, [bi , c j ]) such that ReadSet(Ti ) ∩ W riteSet(Tai f ter ∪ T j0 ) = 0/ does not
hold and T jn ?-depends or *-depends on T j0 . Therefore, if each read-only transaction Ti adheres to
Read Rule 2, a single-query cycle cannot occur in MVH, and thus, Read Rule 2 guarantees each
read-only transaction Ti is update serializable w.r.t. the transactions in Tupdate .
We can now define a new IL that ensures Update Serializability correctness along with strict
forward BOT data currency guarantees:
Definition 30 (Strict Forward BOT Update Serializability). A multi-version history MVH over
a set of read-only and read-write transactions is strict forward BOT update serializable, if the following conditions hold:
1. MVH is update serializable, and
88
2. if the pair ri [x j ] and w j [x j ] of a read-only transaction Ti and a read-write transaction T j are in
MVH, then either
(a) Requirement 2a of Definition 27 is true or
(b) bi <ST c j , c j [x j ] <MV H ri [x j ], Ti →us f r T j and there is no write operation wk [xk ] of a
committed transaction Tk in MVH such that x j xk , ck <MV H ri [x j ], Ti →us f r Tk .
Again, we determine whether a given history MVH is strict forward BOT update serializable
by using a directed MVSG:
Definition 31 (Strict Forward Read Single Query Multi-Version Serialization Graph). A strict
forward read single query multi-version serialization graph for MVH w.r.t. a read-only transaction
Ti , denoted SFR-SQ-MVSG(MVH, Ti ), is a directed graph with nodes N = Tupdate ∪ Ti and labeled
edges E such that:
3. There is an edge Ti → T j (Ti 6= T j ), whenever there exists a pair of operations ri [x j ] and
c j <MV H ri [x j ], Ti →us f r T j , and there is a write operation wk [xk ] of a committed transaction Tk in MVH such that x j xk , ck <MV H ri [x j ], and Ti →us f r Tk .
Theorem 5. A history MVH consisting of committed read-only and read-write transactions executes under Strict Forward BOT Update Serializability, if for each read-only transaction Ti the
corresponding SFR-SQ-MVSG (MVH, Ti ) is acyclic.
Proof. We again show that the contrapositive of Theorem 5 holds. If Requirement 1 of Definition 30 is violated, then MVH is not consistent with the Update Serializability criterion and SFRSQ-MVSG(MVH, Ti ) contains a cycle according to Point 1 of Definition 31. Now suppose that
89
Requirement 2 of Definition 30 does not hold. That is, there exists a pair of operations ri [x j ] and
w j [x j ] such that either:
1. c j →MV H bi and there exists a write operation wk [xk ] such that ck →MV H bi and x j xk or
2. bi <ST c j , c j [x j ] <MV H ri [x j ], and ¬Ti →us f r T j or
3. bi <ST c j , c j [x j ] <MV H ri [x j ], and there exists a write operation wk [xk ] such that ck <ST bi ,
x j xk and Ti →us f r T j .
In the first case, there is an edge Ti → T j according to Point 2 of Definition 31 and an edge
T j → Ti since Ti write-read depends on T j . Thus, SFR-SQ-MVSG(MVH) contains a cycle. In the
second case, there is an edge T j → Ti since Ti write-read depends on T j . Further, since the property
¬Ti →us f r T j holds, there is either an edge Ti → T j because ReadSet(Ti ) ∩W riteSet(Tai f ter ∪ T j ) 6= 0/
or an edge from Ti to T j via some read-write transaction Tk that committed after Ti ’s starting
point, but before T j ’s commit point and ¬Ti →us f r Tk . If the latter holds, then their exists a sequence hTk → Tl0 → Tl1 → . . . → Tln → T j i (n ≥ 0) of edges in SFR-SQ-MVSG(MVH, Ti ) because
T j ?-depends or *-depends on Tk according to Requirement 2 of Read Rule 2. Thus, SFR-SQMVSG(MVH, Ti ) is cyclic. In the final case, there is an edge Ti → T j from Point 3 of Definition 31
and an edge T j → Ti since Ti directly reads from T j . Again, SFR-SQ-MVSG(MVH) contains a
cycle.
4.3.6
View Consistency
View Consistency (VC) is the weakest IL that ensures transaction consistency to read-only transactions provided that all read-write transactions modifying the database state are serializable. It was
first informally defined in the literature by [159] under the name External Consistency. Due to its
valuable guarantees provided to read-only transactions, it appears to be an ideal candidate for use
in all forms of environments including broadcasting systems. However, as noticed for the Full Serializability and Update Serializability degree, the definition of View Consistency lacks the notion
of data currency. We formally define the View Consistency level as follows:
90
i
Definition 32 (View Consistency). Let Tdepend
denote a set of committed read-write transactions
in MVH that the read-only transaction Ti (directly or indirectly) wr-depends on. A multi-version
history MVH over a set of read-only and read-write transactions is view consistent, if all readwrite transactions are serializable and for each read-only transaction Ti in MVH, the subhistory
i
P(MV H, Tdepend
) ∪ P(MV H, Ti ) is serializable.
The IL’s attractiveness relates to the fact that all read-write transactions produce a consistent
database state and read-only transactions observe a transaction-consistent database state. However,
as with Update Serializability, there might be a concern that two read-only transactions executed
at the same or different clients may see different serial orders of read-write transactions. A further
undesirable property of View Consistency might be that it allows read-only transactions to see a
database state that would have existed if one or more read-write transactions had never been executed or if they had been aborted. Additionally, it allows read-only transactions to see a state that
might not be consistent with the current state of the database. While the first and second potential
problems can be resolved by running read-only transactions with Full Serializability and Update
Serializability IL guarantees, respectively, the latter issue can be compensated by extending the
View Consistency IL with appropriate currency guarantees. As for the Update Serializability IL,
there is no need to define a new IL that ensures View Consistency correctness in combination with
BOT data currency since such an IL would produce scheduling histories consistent with the previously defined BOT Serializability IL. However, extending the definition of the View Consistency
IL with a read “forward” obligation that requires a read-only transaction Ti to see the effects of
read-write transactions that committed after its starting point as long as the requirements of View
Consistency correctness are satisfied, appears to be a worthwhile approach. However, before we
formally define this new IL, we need to formalize a condition that allows us to determine whether
a read-only transaction Ti can observe the effects of a read-write transaction T j that committed its
execution after Ti ’s starting time.
Read Rule 3 (View Consistent Forward Reads). Let Ti denote a read-only transaction in a multiversion history MVH that requires to observe the (complete) effects of a read-write transaction
91
T j that committed after Ti ’s starting point as long as the View Consistency requirements are not
violated. Additionally, let Tbei f ore denote a set of read-write transactions that committed before
Ti ’s starting point and let Tai f ter represent a set of read-write transactions that committed after Ti ’s
starting point, but before the commit point of T j and whose effects have not been seen by Ti , i.e.,
in MVH. Ti is allowed to read “forward” and see the (complete) effects of T j :
1. If the invariant ReadSet(Ti ) ∩W riteSet(Tai f ter ∪ T j ) = 0/ holds or
2. If the invariant ReadSet(Ti )∩W riteSet(T j ) = 0/ holds and there is no read-write transaction Tk
in MVH ( j 6= k, i 6= k) such that bi <ST ck , ck <ST c j , ¬Ti →vc f r Tk , and T j wr- or ww-depends
on Tk .
Otherwise, Ti is forced to see the database state as it existed by its starting point, i.e., Ti is obliged
to read the most up-to-date version of an object produced by some read-write transaction in Tbei f ore .
In what follows, we represent the fact that Ti is allowed to read “forward” to observe the effects of
T j by Ti →vc f r T j .
Again, it can be shown that Read Rule 3 produces only syntactically correct histories in the
sense that read-only transactions see a transaction-consistent database state.
Theorem 6. In a multi-version history MVH containing a set of read-write transactions Tupdate such
that all read-write transactions in Tupdate are serializable, each read-only transaction Ti satisfying
Read Rule 3 is serializable w.r.t. all transactions in Tupdate that created object versions that have
been (either directly or indirectly) seen by Ti .
Proof. As in previous proofs, let Tai f ter represent a set of read-write transactions that committed
after Ti ’s starting point and whose effects have not been seen by Ti and let T fiorward denote the
set of read-write transactions that committed after Ti ’s starting point, but whose effects have been
observed by Ti . According to Definition 32, a read-only transaction Ti is said to be view consistent,
if it serializes with the set of update transactions that produced values that are (either directly or
indirectly) seen by Ti . Thus, View Consistency permits single-query cycles in the serialization
92
graph as long as they can be broken by removing a rw-edge between two read-write transactions
that are involved in the cycle. Note that an rw-edge between two read-write transactions T j0 and
T j1 is formed due to a read operation by T j0 followed by a conflicting write operation T j1 , i.e., T j1
installed the successor object version read by T j0 . Now suppose, by the way of contradiction, that
MVH contains a cycle hTi → T j0 → . . . → T jn → Ti i, where Ti is a read-only transaction and T jn with
n ≥ 0 is a read-write transaction. Additionally, suppose that the cycle cannot be broken by removing
rw-edges between the involved read-write transactions. Given those prerequisites, Ti must have both
an incoming and outgoing edge in order for the cycle to be produced. As in previous proofs, the
incoming edge T jn → Ti is from a read-write transaction T jn that installed an object version read by
Ti and the outgoing edge Ti → T j0 is to a read-write transaction T j0 that wrote the successor version
of an object read by Ti . By Read Rule 3, Ti is guaranteed to see either all or none of the effects of
any read-write transaction and thus, both edges must involve distinct read-write transactions, i.e.,
T j0 6= T jn . A further prerequisite for a cycle to occur is that T jn (directly or indirectly) wr- or wwdepends on T j0 . According to Definition 11 and the specified version and commit order constraints,
this dependency implies that T j0 committed before T jn in MVH, i.e., c j0 <MV H c jn . Because T j0
rw-depends on Ti , it follows that T j0 is being part of Tai f ter . Also because T j0 committed before T jn
and T jn is wr-depends on Ti , T jn is contained in T fiorward . Under those conditions, however, Read
Rule 3 would be violated as it allows Ti to read “forward” and observe the effects of T jn only if there
is no read-write transaction T j0 in MVH such that bi <MV H c j0 , c j0 <MV H c jn , ¬Ti →vc f r T j0 , and
T jn write-read or write-write depends on T j0 . Therefore, if a scheduler operates according to Read
Rule 3, single-query cycles involving read-write transactions that rw-depend on each other cannot
occur in MVH. Thus, Read Rule 3 guarantees view consistency to read-only transactions.
We can now define our new IL that ensures Update Serializability correctness in addition to
strict forward BOT data currency guarantees:
Definition 33 (Strict Forward BOT View Consistency). A multi-version history MVH over a
set of read-only and read-write transactions is strict forward BOT view consistent, if the following
conditions hold:
93
1. MVH is view consistent, and
2. if the pair ri [x j ] and w j [x j ] of a read-only transaction Ti and a read-write transaction T j is in
MVH, then either
(a) Requirement 2a of Definition 27 is true or
(b) bi <ST c j , w j [x j ] <MV H ri [x j ], Ti →vc f r T j and there is no write operation wk [xk ] of a
committed transaction Tk in MVH such that x j xk , ck <MV H ri [x j ], Ti →vc f r Tk .
To show that a multi-version history MVH provides strict forward BOT view consistency guarantees, we associate a corresponding graph with MVH.
Definition 34 (Causal Dependency Strict Forward Read Single Query Multi-Version Serialization Graph). A causal dependency strict forward read single query multi-version serialization
graph for a multi-version history MVH w.r.t. a read-only transaction Ti , denoted CD-SFR-SQi
MVSG(MVH, Ti ), is a directed graph with nodes N = Tdepend
∪ Ti and labeled edges E such that:
3. There is an edge Ti → T j (Ti 6= T j ) whenever there exists a pair of operations ri [x j ] and
c j <MV H ri [x j ], Ti →vc f r T j , and there is a write operation wk [xk ] of a committed transaction
Tk in MVH such that x j xk , ck <MV H ri [x j ], and Ti →vc f r Tk .
Theorem 7. A history MVH consisting of committed read-only and read-write transactions executes
under Strict Forward BOT View Consistency, if for each read-only transaction Ti the corresponding
CD-SFR-SQ-MVSG(MVH, Ti ) is acyclic.
Proof. The correctness of Theorem 7 can be proved using the same logical reasoning as given in
the proof of Theorem 5.
94
To conclude this subsection, Table 4.1 summarizes the main characteristics of the newly defined
ILs.
4.4
Implementation Issues
We now propose four protocols that implement the newly defined ILs in a space- and time-efficient
manner. Before doing so, however, we illustrate the key characteristics of the target system environment for which our protocols are designed and present the general design assumptions that underlie
the implementations of the ILs.
Data dissemination using both push and pull mechanisms is likely to become the prevailing
mode of data exchange in mobile wireless environments. In the previous chapter, we have already
elaborated on the fundamental principles and general operating guidelines of hybrid disseminationbased data delivery systems and therefore, we restrict our subsequent discussion on the imposed
system properties and assumptions that are relevant for the design and performance of our protocols. Central to any dissemination-based system is the contents and structure of the broadcast
program. Due to its simplicity, we use a flat broadcast disk system [4] to generate the broadcast
program which consists, in our configuration, of three types of segments: (a) an index segment,
(b) a data segment, and (c) a concurrency control information segment. To make the data disseminated self-descriptive, we incorporate an index into the broadcast program. We choose (1, m)
indexing [78] as the underlying index organization method and broadcast the complete index once
within each MIBC. To provide cache consistency in spite of server updates, each minor cycle is
preceded with a concurrency control report or CCR that contains the read and write sets along
with the values of newly created objects of read-write transactions that committed in the previous
MIBC. An entry in a CCR is a 4-tuple (T ID, ReadSet,W riteSet,W riteSetValues), where T ID denotes the globally unique transaction identifier of some recently committed transaction Ti , ReadSet
and W riteSet represent Ti ’s read and write set, respectively, and W riteSetValues represents a list of
pairs (xi , v) that maps each object version xi newly created by transaction Ti to its associated value
v. Transactions stored in CCR are ordered by their commit time to ensure its efficient and correct
4.4. Implementation Issues
Newly defined
Isolation Level
BOT
Serializability
Strict Forward
BOT
Serializability
Strict Forward
BOT Update
Serializability
Strict Forward
BOT View
Consistency
95
Base Isolation
Level
Consistency Guarantees
Currency Guarantees
(Full)
Serializability
Each read-only transaction in
MVH is required to serialize
with all committed read-write
and all other read-only transactions in MVH.
(Full)
Serializability
MVH is required to serialize
with all committed read-write
and all other read-only transactions in MVH.
Update Serializability [61, 162] /
Weak
Consistency [29]
MVH is required to serialize with all committed update transactions in MVH, but
does not need to be serializable with other committed
read-only transactions.
View Consistency
/ Update
Consistency [29]
/ External Consistency [159]
Each committed read-only
transaction in MVH is required to serialize with all
committed update transactions in MVH that had written
values which have (either directly or indirectly) been seen
by the read-only transaction.
Read-only transactions are required to observe a snapshot
of committed data objects
that existed by their starting
points.
Read-only transactions are required to read from a database
snapshot valid as of the time
when they started.
However, read-only transactions
are forced to read “forward”
and observe the updates from
read-write transactions that
committed after their starting
points as long as the serializability requirement is not violated by those reads.
Enforces the same currency
requirements as the Strict
Forward BOT Serializability level with the difference
that read-only transactions are
obliged to issue forward reads
as long as the update serializability requirements are not
violated by those reads.
Enforces the same currency
requirements as the Strict
Forward BOT Serializability level with the difference
that forward reads of readonly transactions are enforced
whenever the view consistency criterion is not violated
by those reads.
Table 4.1: Newly defined ILs and their core characteristics.
96
processing by the clients. The data segment contains hot-spot data objects that are of interest to a
large number of clients. The rest of the database is assumed to be accessed on-demand through a
bandwidth-restricted back-channel. Figure 4.3 finally illustrates the basic layout of the broadcast
program used in our system model which corresponds to the structure depicted in Figure 3.1(b).
CCR ID
FirstCCRListEntry
OID
ObjectValue
NextListEntry
TID
ReadSet
WriteSet
FirstWriteSetValueListEntry
NextListEntry
Data Segment1
...
...
...
...
...
Index
Segment 4
Minor BC4
CCR
Segment 4
Index
Segment 3
Data Segment2
CCR
Segment 3
CCR
Segment 2
CCR
Segment 1
Index
Segment 1
Data Segment1
Index
Segment 2
Minor BC2
Minor BC3
Minor BC1
Data Segment4
...
...
Figure 4.3: Organization structure of the broadcast program.
With respect to the client and server architecture, we assume a hybrid caching system for both
system components to improve the performance of our protocols. In a hybrid caching system the
cache memory available is divided into a page-based segment and an object-based segment. The
server uses its page cache to handle fetch requests and to fill the broadcast disk with pages containing hot-spot objects. The server object cache is utilized to save installation disk reads for writing
modified objects onto disk. The latter is organized as a mono-version object cache similar to the
modified object buffer (MOB) in [53]. With respect to concurrency control, the server object cache
can be used to answer object requests in case a transaction-consistent page is not available from the
client’s perspective. The client also maintains a hybrid cache scheme to get full advantage of both
types. The client page cache is used to keep requested and prefetched database pages in volatile
memory. We assume a single-version page cache that maintains up-to-date server pages. The client
object cache, on the other hand, is allowed to store multiple versions of an object x. To simplify
the description of our protocols, we assume that an object x can either be stored in a page p or in
the object cache of the client. To judge about the correctness of a client read operation, each page
p is assigned a commit timestamp CT S(p) that reflects the (logical) time when the last transaction
97
that updated or newly created an object x in page p committed. Analogous to the page cache, each
version of an object maintained in the client object cache is associated with a commit timestamp
reflecting the point in time when the version was installed by a committed read-write transaction.
4.4.1
Multi-Version Concurrency Control Protocol with BOT Serializability Guarantees (MVCC-BS)
In this subsection, we present an algorithm that provides BOT Serializability to read-only transactions. To enforce database consistency, we assume that the state of the mobile database is
exclusively modified by transactions that run with serializability requirements. We also assume
that clients can only execute a single read-only transaction at a time. To avoid mixing up data
consistency- and currency-related issues arising from connection failures with the basic working
principles of our CC protocols, we assume that mobile clients do not suffer from intermittent connectivity and can always actively observe the broadcast channel. We refer the interested reader
to [21] for a detailed description of three cache invalidation methods that could be easily adapted
for use along with our subsequently defined CC protocols to prevent mobile clients from observing inconsistent and out-dated data in case of communication failures or voluntary disconnections.
Finally, note that the following algorithm will build the fundamental basis for subsequent protocols that ensure weaker semantic guarantees than serializability and should therefore be studied
carefully.
Our implementation of the BOT Serializability level allows concurrency control with nearly
no overhead. For each read-only transaction Ti , the client keeps the following data structures and
information for concurrency control purposes: (a) Ti ’s startup timestamp, (b) Ti ’s read set, and
(c) an object invalidation list (OIL). The latter contains the identifiers and commit timestamps of
objects that were created during Ti ’s execution time. Note that in order to ensure the correctness
of the MVCC-BS protocol, read-only transactions are not required to keep track of their read sets.
However, all subsequently defined protocols that are built upon the MVCC-BS protocol do require
such information and we therefore decided to introduce and describe this structure’s operations
98
already here. Also note that all underlying data structures of our CC schemes are chosen for clarity
of exposition rather than for efficient implementation.
The server data structures include the hybrid server cache, the CCR as described before, and
the temporary object cache (TOB). The TOB is used to record the modified or newly created object
versions of transactions that committed during the current MIBC. Additionally, the TOB is utilized
to store “shadow” versions of transactions that are not yet committed. Whenever an MIBC is
finished, all versions of committed transactions are merged from the TOB into the MOB and the
updated or newly created object versions will be available for the next MIBC.
Now we describe the protocol scheme by differentiating between client and server operations.
Client Operations
1. Read Object x by Transaction Ti on Client C
(a) Ti issues its first read operation.
Assign the number of the current MIBC to ST S(Ti ). Add x to Ti ’s read set (ReadSet(Ti )).
(b) Requested object x is cache-resident in the page or object cache.
If the requested object is stored in page p, it can be read by Ti whenever p’s commit
timestamp CT S(p) is smaller than ST S(Ti ) or there is no entry of x with commit timestamp CT SOIL (x) in the object invalidation list (OIL) such that ST S(Ti ) ≤ T SOIL (x).
Otherwise, Ti looks for the entry of object x in the object cache. If some version j of
object x is in the object cache, Ti can read x j , provided the invariant CT S(x j ) < ST S(Ti )
holds. Note that there is no requirement to check if there is an other version k of object x in the client cache such that CT S(x j ) < CT S(xk ) and CT S(xk ) < ST S(Ti ) since
by assuming that clients do not run more than a single read-only transaction at anyone
time, only one version of object x with a commit timestamp smaller than the starting
timestamp of the read-only transaction may be useful for it and is therefore maintained
in the client object cache. The other object versions would only waste scarce memory
space and are therefore garbage-collected as mentioned below. If Ti reads some version
of x, add x to ReadSet(Ti ).
99
(c) Requested object x is scheduled for broadcasting.
Read index of the broadcast to determine the position of the object on the broadcast. The
client is allowed to download the desired object x either if the commit timestamp of the
page p in which x resides is smaller than Ti ’s starting point, i.e., CT S(p) < ST S(Ti )
and there is no object version of x in OIL such that CT S(p) < CT SOIL (x) and
CT SOIL (x) < ST S(Ti ) or if there is no entry of x with commit timestamp CT SOIL (x)
in the object invalidation list (OIL) such that ST S(Ti ) ≤ T SOIL (x). If a consistent object
of x cannot be located in the air-cache, the client proceeds with Option 1(d). Otherwise,
it reads the installed version of object x and adds x to ReadSet(Ti ).
(d) Requested version of object x is neither in the local cache nor in the air-cache.
Send fetch request for object x along with ST S(Ti ) to the server. The server processes
the client request as described below. As a reply, the client either receives a transactionconsistent copy of a page p which contains the requested object x or alternatively, a
transaction-consistent version of x. If the request cannot be satisfied, the server notifies
the client and Ti must be aborted.
2. Concurrency Control Report Processing on Client C
CCRs are disseminated at the beginning of each MIBC. The client processes the CCR as follows: For each object x included in the write set of a read-write transaction T j that committed
in the last MIBC, an entry is added into OIL containing the identifier of object x along with
its commit timestamp. Additionally, the contents of the page and object cache is refreshed. If
object x kept in page p at client C was updated during the last MIBC, the old version of x is
overwritten by the newly created version. Otherwise, the updated version of x is installed into
the object cache, if x belongs to C’s hot-spot objects. If a prior version of object x becomes
useless for Ti , it is discarded from the object cache.
3. Transaction Commit
Transaction Ti is allowed to commit, if all read requests were satisfied and no abort notification was sent by the server.
100
Server Operations
1. Fetch Request for Object x From Client C
If the server receives a fetch request for object x from transaction Ti , the server first checks if
the page p holding x is in the server cache. If p is cache-resident and the startup timestamp of
Ti is equal to the number of the current MIBC, the server will send page p to C after applying
to p all pending MOB entries. Otherwise, the server searches for x in the MOB. If it finds
an entry for object x such that CT S(x) < ST S(Ti ), the server will send object x to the client.
Otherwise, if p is not cache-resident, but ST S(Ti ) equals the number of the current MIBC,
the server reads p from disk and applies all pending modifications recorded in the MOB of
objects that reside in p to the page. If none of the above conditions holds, the fetch request
cannot be satisfied for consistency reasons and an abort message will be transmitted to the
client.
2. Integration of the TOB into the MOB
At the end of each MIBC, the newly created and updated versions of objects are merged into
the MOB. If objects exist in the MOB, their object values will be overwritten and timestamp
numbers will be updated.
3. Filling the broadcast disk server
The server fills the memory storage space allocated to contain the data and index segments
of the broadcast program at the beginning of each MBC. In doing so, the server proceeds
as follows: If the desired page p containing hot-spot objects is not in the page cache of
the server, it is read into the cache from the disk and thereafter, it is updated to reflect all the
modifications of its objects recorded in the MOB. At the end of this process, every data page p
included in the broadcast program is completely up-to-date, i.e., it contains the most current
versions of its objects. Further, the server creates a (1, m) index containing entries of objects
scheduled for broadcasting and stores it into all index segments of the broadcast program.
The concurrency control segment of the broadcast program is updated at the beginning of
each MIBC just before its broadcast. This segment is filled with the CCR as described above.
4.4.2
101
Multi-Version Concurrency Control Protocol with Strict Forward BOT Serializability Guarantees (MVCC-SFBS)
After the description of a MVCC protocol that ensures BOT Serializability consistency, we extend
this scheme to provide Strict Forward BOT Serializability. Recall that the Strict Forward BOT Serializability level differs from the BOT Serializability degree by requiring that read-only transactions
observe the updates of read-write transactions that committed after their respective starting points
provided that Read Rule 1 is satisfied. To implement the latter requirement, we adopt a technique
used by the multi-versioning with invalidation scheme in [123] and associate a read forward flag
or RFF with each read-only transaction Ti in MVH that is initially (i.e., at transaction start time)
set to true and indicates whether Ti has read a version of an object that was later modified by a
read-write transaction T j . If such an event occurs, RFF of Ti is set to false and T j ’s commit timestamp is recorded in a variable called read forward stop timestamp or RFSTS. Equipped with the
latter information, the scheduler can efficiently determine which version of a requested object x a
read-only transaction Ti needs to observe by applying the following algorithm:
5
begin
if Read-only transaction Ti requests to read object x and RFF is set to false then
Ti reads the latest committed object version of x with a commit timestamp CT S(x)
that is smaller than RFST S(Ti )
else
Ti reads the most recent object version of x
6
end
1
2
3
4
Algorithm 4.1: Algorithm used by the MVCC-SFBS scheduler to map an appropriate object version
to some read request for object x issued by read-only transaction Ti .
4.4.3
Multi-Version Concurrency Control Protocol with Strict Forward BOT Update Serializability Guarantees (MVCC-SFBUS)
Remember that the Update Serializability IL is less restrictive than the Full Serializability IL by
allowing different read-only transaction to see different serialization orders of read-write transactions. This weaker requirement affects the forward read behavior of read-only transactions running
102
under Strict Forward BOT Update Serializability. As for the MVCC-SFBS protocol, the mobile
client has to determine for each active read-only transaction Ti as to whether it needs to observe the
effects of a read-write transaction T j that committed during Ti ’s execution time. To this end, each
read-only transaction maintains two additional data structures: (a) First, an object version write
prohibition list or OVWPL is associated with each read-only transaction Ti . An OVWPL is a set
of pairs (OID,CT S) where OID denotes the identifiers of objects whose values Ti is not allowed to
see and CTS represents the logical time when the transactions that modified or created the objects
committed. The OVWPL of an active read-only transaction Ti is updated whenever a new CCR
appears on the broadcast channel. (b) Additionally, for each active read-only transaction Ti the
client maintains an object version read prohibition list or OVRPL that keeps track of the objects
read by read-write transactions that committed during Ti ’s execution time and whose effects may
not be seen by Ti . The identifiers of objects created by a read-write transaction T j along with the
corresponding commit timestamp (in case of the OVWPL structure) have to be added to the Ti ’s
OVRPL and OVWPL, if any of the following conditions holds:2
1. ReadSet(Ti ) ∩ W riteSet(T j ) 6= 0/
2. OVW PL(Ti ) ∩ ReadSet(T j ) 6= 0/
3. OVW PL(Ti ) ∩ W riteSet(T j ) 6= 0/
4. OV RPL(Ti ) ∩ W riteSet(T j ) 6= 0/
Condition 1 implies that in order for Ti to read “forward” on objects written by T j , the intersection between Ti ’s read set and T j ’s write set must be empty. Condition 2 states that T j may not have
seen any objects contained in Ti ’s OVWPL. It ensures that Ti will only see the effects of T j if there
wr
is no wr-dependency (Tk δ T j ) between any read-write transaction Tk whose updates are registered
in Ti ’s OVWPL and T j . Condition 3 states that T j may not have overwritten an object whose corresponding object identifier is contained in Ti ’s OVWPL. This rule ensures that Ti will only see the
effects of T j if there is no read-write transaction Tk whose updates are registered in Ti ’s OVWPL
and T j ww-depends on it. Condition 4 states that T j must not have overwritten an object that is
2 Note
that whenever T j has updated an object x that is already listed in Ti ’s OVWPL, the entry of x is not modified
and the protocol proceeds with the next object written by T j (if there is any).
103
included in Ti ’s OVRPL. This condition guarantees that Ti will only see the updates of T j if there
rw
exists no rw-dependency (Tk δ T j ) between any read-write transaction Tk (that conflicts with Ti and
those read operations are included in Ti ’s OVRPL) and T j .
A read-only transaction running under Strict Forward BOT Update Serializability sees a correct
state of the database, if the transaction scheduler running on the client uses the following algorithm
in order to map object requests to appropriate object versions:
5
begin
if Read-only transaction Ti requests to read object x and x is registered in Ti ’s OVWPL
then
Ti reads the latest committed object version of x with a commit timestamp CT S(x)
that is smaller than the commit timestamp of the entry of object x in OVWPL
else
Ti reads the most recent object version of x
6
end
1
2
3
4
Algorithm 4.2: Algorithm used by the MVCC-SFBUS scheduler to select an appropriate object version
whenever read-only transaction Ti wants to read an object x.
4.4.4
Multi-Version Concurrency Control Protocol with Strict Forward BOT View
Consistency Guarantees (MVCC-SFBVC)
View Consistency is the weakest IL that provides transaction consistency to read-only transactions
and has the potential to maximize the number of forward reads of read-only transactions without violating transaction correctness. To determine whether an active read-only transaction Ti is required
to see the updates of a read-write transaction T j that successfully finished its execution during Ti ’s
lifetime, the satisfaction of the following conditions have to be tested:
1. ReadSet(Ti ) ∩ W riteSet(T j ) 6= 0/
2. OVW PL(Ti ) ∩ ReadSet(T j ) 6= 0/
3. OVW PL(Ti ) ∩ W riteSet(T j ) 6= 0/
If any of those conditions are violated, the write set of T j must be registered in Ti ’s OVWPL and
Ti is not allowed to read “forward” and observe the effects of T j . Otherwise, Ti is allowed to read
104
an object x written by T j provided that there exists no later version of the respective object that Ti
is allowed to observe as well. In order to decide which version of a requested object x a read-only
transaction Ti needs to observe, MVCC-SFBVC uses the same algorithm as the MVCC-SFBUS
protocol (see Algorithm 4.2). As those conditions that decide whether a read-only transaction Ti
is allowed to read “forward” on some read-write transaction T j are a proper subset of the ones
formulated for the MVCC-SFBUS scheme, it is obvious that the MVCC-SFBVC protocol provides
strictly stronger currency guarantees than MVCC-SFBUS. Further, it is easy to see that MVCCSFBVC has lower time and space overheads than MVCC-SFBUS since the former does not need to
maintain the OVRPL data structure. Hence, we expect that the MVCC-SFBVC scheme outperforms
the MVCC-SFBUS protocol in our performance study.
4.5
Performance Results
The performance study aims at measuring the absolute and relative performance of our proposed
MVCC protocols in a wireless hybrid data delivery environment. Additionally, we compare our
protocols with the previously devised concurrency control schemes [123,139] to detect performance
trade-offs among these different schemes and their underlying consistency and currency guarantees.
We analyze the performance of the new ILs’ implementations and other protocols using two key
metrics, namely transaction commit rate and transaction abort rate. We restricted the subsequent
analysis to those two performance metrics since they help us to evaluate the protocols’ performance
and overhead in the most condensed form. For an extended version of the performance analysis, we
refer the interested reader to [137].
4.5.1
System Model
Our simulation parameters are similar to the ones taken in previous performance studies in the
field of mobile data broadcasting and conventional distributed database systems [4, 7, 58]. The
simulation model consists of the following core components: (a) a central broadcast server, (b)
numerous mobile clients, (c) a database hosted by the broadcast server, (d) a broadcast program,
4.5. Performance Results
105
and (e) a hybrid network allowing the server to establish point-to-point and point-to-multipoint
conversations with mobile clients. The components are briefly described below.
Broadcast Server and Mobile Clients:
The broadcast server and the mobile clients are at the heart of the simulator and both are modeled as simple facilities where events are generated and handled according to pre-defined rules.
Various selected events are charged in terms of CPU instructions and are then converted into time
using the server’s and clients’ processor speeds which are specified in million instructions per second or MIPS. Events on the processors, disks, network interfaces, etc., are executed in a FIFO
fashion after a specified delay using an event queue, i.e., if any of such devices is heavily utilized,
the event may be scheduled later than its specified time. Time is measured in terms of broadcast
ticks which is defined as the time it takes to broadcast a disk page of 4,096 bytes in size.
The clients’ CPU speed is chosen to be 100 MIPS and the server’s CPU speed is set to 1,200
MIPS. These values reflect typical processor speeds of mobile PDAs and high-performance workstations observed in production systems about three years ago when this study was conducted. Note,
however, that despite the fact that the used CPU speeds are somewhat out-dated, we believe that
our gathered simulation results still approximate the relative performance differences between the
investigated CC protocols (when deployed in a hybrid data delivery system using today’s hardware
technology) very well, since the actual sizes of system components are less important than their
relative sizes when mirroring the characteristics of real systems. The performance improvement of
mobile and stationary computing devices has been in line with each other over the recent past. As
mentioned above, we have associated CPU instruction costs with various system and user events
which are itemized in Table 4.2. Note that the client is charged an inter- and intra-transaction think
time between two consecutive transactions and transaction operations, respectively, to simulate the
users’ decision-making process before the user proceeds to the next transaction and transaction
operation, respectively.
The client cache size is chosen to be 2% of the database size. As in previous simulation studies that examined various performance issues within the context of stationary distributed database
systems, we modeled the client cache as a hybrid cache by dividing it into a page-based segment
106
and an object-based segment as described before. The client page cache is managed by the LRU
replacement policy and the client object cache is organized by an eviction algorithm called P [7].
P is an offline page replacement algorithm that uses the knowledge of objects’ access probabilities
to determine the cache replacement victims. We use this knowledge whenever the cache capacity
is reached to evict those objects that have the lowest access probability. Client page and object
cache freshness is achieved by downloading recently modified objects from the CCR segment of
the broadcast cycle. The server cache size is set to 20% of the database size and it is also divided
into a relatively small page cache and a relatively large object cache (see Table 4.2 for the split
ratio between both caches). The server object cache is further split into a modified object cache
(MOB) and temporary object cache (TOB), with the latter not being explicitely modeled as a separate facility with its own dedicated storage space and cache replacement policy since we expect
its storage requirements to be rather neglectable. The MOB is modeled as a single-version object
cache as described in [53]. It is managed as a simple FIFO buffer and the server page cache, on the
other hand, by a LRU replacement policy.
Database and Broadcast Program:
A relatively small database size is used in order to make the simulations of our complex mobile
broadcasting architecture computationally feasible with today’s computer technology. Therefore,
the database is modeled as a set of 10,000 objects. The size of each disk page is 4 KB and a page
contains 40 objects of 100 bytes each. The database objects are stored on 4 disks on the server
and page fetch requests are uniformly assigned to a disk independent of the workload. Note that
disks are only available at the server, i.e., we assume diskless mobile clients. The disk itself is
modeled as a shared FIFO queue on which operations are scheduled in the order they are initiated.
Disk delays are composed of a queuing delay and a disk access and transfer delay, where the disk
access delay is the sum of seek time (defined as the time it takes for the disk head to reach the
data track) and rotational latency (defined as the time it takes for the data sector to rotate under
the disk head). We use an average seek time of 4.5 ms, an average rotational latency of 3.0 ms,
and a disk bandwidth of 40 Mbps. These performance values correspond to the Quantum Atlas
10K III Ultra SCSI disk [107]. The broadcast program determines the structure and the contents
Server Database Parameters
Parameter
Database size (DBSize)
Object size (OBSize)
Page size (PGSize)
Server Cache Parameters
Server buffer size (SBSize)
Page buffer memory size
Object buffer memory size
Page cache replacement policy
Object cache replacement policy (MOB)
Server Disk Parameters
Fixed disk setup costs
Rotational speed
Media transfer rate
Average seek time (read operation)
Average rotational latency
Variable network costs
Page fetch time
Disk array size
Client/Server CPU Parameters
Client CPU speed
Server CPU speed
Client/Server page/object cache lookup costs
Client/Server page/object read costs
Register/Unregister a page/object copy
Register an object in prohibition list
Prohibition list lookup costs
Inter-transaction think time
Intra-transaction think time
107
Parameter Value
10,000 objects
100 bytes
4,096 bytes
20% of DBSize
20% of SBSize
80% of SBSize
LRU
FIFO
5,000 instr
10,000 RPM
40 Mbps
4.5 ms
3.0 ms
7 instr/byte
7.6 ms
4
100 MIPS
1,200 MIPS
300 instr
5,000 instr
300 instr
300 instr
300 instr
50,000 instr
5,000 instr
Table 4.2: Summary of the system parameter settings – I.
of the underlying broadcast disk. We assume a flat single-version broadcast disk whose contents
is cyclically disseminated along with the (1, m) index and CCR to the client population. Since we
want to model a hybrid data delivery environment, only the hottest 20% of the database is broadcast.
At the beginning of each MBC, a dedicated broadcast disk server is filled with data pages containing
the most popular 20% of data objects. Each MBC is subdivided into five MIBCs and each MIBC,
in turn, consists of one-fifth of the data to be broadcast within an MBC, a (1, m) index to make the
data self-descriptive, and a CCR as described before.
Hybrid Network:
Our modeled network infrastructure consists of three communication paths: (a) a broadcast
108
channel, (b) back-channels or uplink channels from the clients to the server, and (c) downlink channels from the server to the clients. The network parameters of those communication paths are
modeled after a real system such as Hughes Network System’s DirecPC3 [70]. We set the default
broadcast bandwidth to 12 Mbps and the point-to-point bandwidth to 400 Kbps downstream and
to 19.2 Kbps upstream. Both point-to-point connections were modeled as unshared resources and
in order to create network contention, we restricted the number of uplink and downlink communication channels to two in each direction. With respect to point-to-point communication costs, each
network message has a latency that is divided into four components: (a) CPU costs for sending the
message, (b) queuing delay, (c) transmission time, and (d) CPU costs for receiving the message.
CPU costs at each communication end consist of a fixed number of instructions (i.e., 6,000 instructions) and a variable number of instructions that are charged according to the size of the message
(i.e., 7.6 instructions/byte). The uplink and downlink network paths are modeled as a shared FIFO
queue on which operations are scheduled in the order they are initiated. The network bandwidth of
the uplink and downlink channels multiplied by the message size is used to determine how long the
the network queue is utilized for sending the message. Communication costs incurred by transmitting broadcast messages to the client population do not include queuing delays4 since we assume
that there is no congestion in the broadcast medium. Finally, the network parameters used in the
simulation study are once more summarized in Table 4.3. Note that we used an end-to-end latency
for the transmission of request messages (from the client to the server and back again) of only
20 ms. Although the DirecPC system has much higher message latency in reality (approximately
375 ms) [64], underestimating the propagation delay helped us speed up the simulation runs.
4.5.2
Workload Model
Transaction processing is modeled in our simulation study as in a stock trading and monitoring
database. Data objects are modified at the server by a data workload generator that simulates the
effects of multiple read-write transactions. To produce data contention in the system, 100 data
3 DirecPC
is now being called DIRECWAY [71]
other previously mentioned communication costs incurred by disseminating data are, however, charged to the
server and clients.
4 All
Client Cache Parameters
Parameter
Client cache size (CCSize)
Client object cache size
Client page cache size
Object cache replacement policy
Broadcast Program Parameters
Number of broadcast disks
Number of objects disseminated per MBC
Number of index segments per MBC
Number of CCRs per MBC
Bucket size
Bucket header size
Index header size
Index record size
OID size
Network Parameters
Broadcast bandwidth
Downlink bandwidth
Uplink bandwidth
Fixed network costs
Propagation and queuing delay
Number of point-to-point uplink/downlink channels
109
Parameter Value
2% of DBSize
80% of CCSize
20% of CCSize
LRU
P
1
20% of DBSize
5
5
4,096 bytes
96 bytes
96 bytes
12 bytes
8 bytes
12 Mbps
400 Kbps
19.2 Kbps
6,000 instr
7 instr/byte
20 ms
2
Table 4.3: Summary of the system parameter settings – II.
110
objects are modified by as much as 20 transactions during the course of an MBC and their commit points are uniformly distributed over this period of time. There is no variance in the number
of objects written by any of the initiated read-write transactions, i.e., each transaction updates exactly 5 data objects. Objects read and written by read-write transactions are modeled by using a
Zipf distribution [168] with parameter θ = 0.80 and each object of the database is accessible and
updateable by the read-write transactions. The ratio of the number of write operations versus the
number of read operations is fixed at 0.25, i.e., only every fifth operation issued by the server results in an object modification. Read-only transactions are modeled as a sequence of 10 to 50 read
operations. As for the read and write operations at the server, the access probability of client read
operations follows a Zipf distribution with parameter θ = 0.80 (and θ = 0.95, respectively), i.e.,
about 75% (90%) of all object accesses are directed to 25% (10%) of the database. To account
for the impact on the communication and server resources when the client sends a data request to
the server, we model a multi-client environment consisting of 10 mobile clients. For simplicity
and as noted before, we assume that each mobile client runs only one read-only transaction at the
time. We adopt a parameter, called uplink usage threshold [6], whose value determines whether a
client may explicitly request an object even though it is scheduled for broadcasting. The chosen
threshold of 100% means that an object version cannot be requested from the server if it is listed in
the broadcast program. We have chosen an abort variance of 100% which means that whenever a
read-only transaction aborts due to a read conflict, the restarted transaction reads from a different
set of objects. To conclude the description of the modeled hybrid data delivery system, Table 4.4
summarizes the simulator’s workload parameter settings and Figure 4.4 shows the interrelationship
between the various system’s components.
4.5.3
Experimental Results of the Proposed CC Protocols
All performance results presented are derived from executing 10,000 read-only transactions after
the system reached its steady state. The results come from artificially generated traces, i.e., they
give only an intuition about the performance of our ILs implementations, but may not represent
a real application’s behavior. We now give a brief interpretation of the experimentally measured
111
Workload Parameters
Parameter
Read-write transactions size (number of operations)
Read-only transactions size (number of operations)
Server data update pattern (Zipf distribution with θ)
Client data access pattern (Zipf distribution with θ)
Number of updates per MBC
Number of concurrent read-only transactions per client
Uplink usage threshold
Abort variance
Parameter Value
25
10 – 50
0.80
[0.80, 0.95]
1.0% of DBSize
1
100%
100%
Table 4.4: Summary of the workload parameter settings.
3
Page
Cache
CCRs
Data Objects
Database
Pull−
Manager
Object
Cache
Cache
Broadcast Channel / Air−Cache
CCRs
Data Objects
CCR5
Data
Segment4
Index5
CCR4
Data
Segment3
Minor BC5
Minor BC4
Index4
CCR3
Data
Segment2
Minor BC3
Index3
Data
Segment1
CCR2
2
Minor BC2
Index2
Mana− 4
ger
CCR1
1
5
Index1
Broad−
cast
Minor BC1
Data
Segment5
CCRs
Data Objects
Cache
Cache
Cache Hit
Cache Hit
Cache Hit
Transaction
Generator
Transaction
Generator
Transaction
Generator
Cache Miss
Cache Miss
Cache Miss
Pull Manager
Pull Manager
Pull Manager
Server
Client 1
Client 2
...
Client 10
Figure 4.4: An overview of the simulation model used to generate the performance statistics.
results w.r.t. the aforementioned metrics.
As Figures 4.5(a) and 4.5(b) show, the transaction throughput rate of all protocols decreases
along the x-axis as the number of objects accessed by read-only transactions rises. Increasing the
transaction length results in longer transaction execution times and hence fewer transaction commits per second. Furthermore, longer read-only transactions might abort at a later point of their
execution which results in higher abort costs, thus also reducing the transaction throughput. Additionally, as transaction execution time progresses, the likelihood that object read requests can be
satisfied by some component of the database system (client cache, air-cache, or server memory)
decreases. Thus, apart from increased abort costs, higher abort rates are another consequence of
longer transaction execution times. As the tabular results show, the performance difference between the MVCC-SFBVC and the other protocols widens slightly (w.r.t. the MVCC-SFBS and
112
MVCC-SFBVC scheme) / significantly (w.r.t. the MVCC-BS protocol) with increase in read-only
transaction length. The growing performance penalty is caused by a disproportionate increase in
the number of messages sent per committed read-only transaction since fewer client cache and
air-cache hits occur.
The increase in the transaction abort rate as a function of the transaction length is depicted in
Figures 4.6(a) and 4.6(b). In terms of the abort rate, the relative difference between the protocols
decreases when growing the transaction size from 10 to 50 read operations. The reason for the narrowing gap between the protocols is related to a decline in the relative difference in the number of
prohibition list entries or PLEs (being defined as the number of data objects that a read-only transaction is forbidden to read “forward” on by its commit point) with increasing transaction length.
113
12
10
Throughput / Second
Concurrency Control Protocol
MVCC-SFBVC (0.80)
MVCC-SFBUS (0.80)
MVCC-SFBS (0.80)
MVCC-BS (0.80)
Transaction
Length
MVCC-SFBUS
MVCC-SFBS
MVCC-BS
Access Pattern (θ = 0.80)
10
20
30
40
50
8
6
0.5%
2.4%
4.2%
7.4%
9.0%
0.8%
1.6%
6.9%
7.7%
9.0%
55.0%
93.4%
99.6%
-
4
2
0
10
20
30
40
50
Transaction Length
(a) Client data access pattern (θ = 0.80)
18
Throughput / Second
MVCC-SFBVC (0.95)
MVCC-SFBUS (0.95)
MVCC-SFBS (0.95)
MVCC-BS (0.95)
16
14
Transaction
Length
MVCC-SFBUS
MVCC-SFBS
MVCC-BS
10
20
30
40
50
12
10
8
0.4%
2.1%
3.6%
4.1%
4.5%
1.2%
5.0%
5.2%
8.1%
10.2%
29.7%
70.8%
91.6%
98.0%
99.8%
6
4
2
0
10
20
30
40
50
Transaction Length
(b) Client data access pattern (θ = 0.95)
Figure 4.5: Throughput (transaction commits per second) achieved by MVCC-BS, MVCC-SFBS, MVCCSFBUS, and MVCC-SFBVC under the baseline setting of the simulator. While graphs show absolute simulation values, tables present the performance penalty of the three ILs relative to the best performing CC
protocol, namely MVCC-SFBVC.
114
14
Aborts / Second
12
MVCC-SFBVC (0.80)
MVCC-SFBUS (0.80)
MVCC-SFBS (0.80)
MVCC-BS (0.80)
10
Transaction
Length
MVCC-SFBUS
10
20
30
40
50
12.9%
5.9%
4.6%
3.4%
2.3%
MVCC-SFBS
MVCC-BS
8
6
17.9%
8.9%
7.0%
6.0%
4.5%
94.5%
84.9%
76.2%
70.9%
68.5%
4
2
0
10
20
30
40
50
Transaction Length
14
Transaction
Length
Aborts / Second
12
MVCC-SFBVC (0.95)
MVCC-SFBUS (0.95)
MVCC-SFBS (0.95)
MVCC-BS (0.95)
10
8
MVCC-SFBUS
MVCC-SFBS
MVCC-BS
10
20
30
40
50
6
6.3%
5.8%
3.5%
2.4%
2.4%
17.9%
14.9%
6.6%
5.0%
4.3%
91.6%
82.7%
72.9%
66.0%
62.1%
4
2
0
10
20
30
40
50
Transaction Length
Figure 4.6: Wasted work (aborts per second) performed by MVCC-BS, MVCC-SFBS, MVCC-SFBUS, and
MVCC-SFBVC under the baseline setting of the simulator. While graphs again show absolute simulation
values, tables present the performance penalty of the three ILs relative to the best performing CC protocol,
namely MVCC-SFBVC.
4.5.4
Comparison to Existing CC Protocols
In what follows, we present the results of experiments comparing the transaction throughput and
abort rate of the best and worst performing protocol, namely MVCC-SFBVC and MVCC-BS, with
CC schemes previously proposed in the literature, which are suitable for mobile database systems [123, 124, 139]. A suite of protocols, namely the multi-versioning method, multi-versioning
with invalidation method, and invalidation-only scheme, all providing serializability along with
115
varying currency guarantees to read-only transactions were devised in [123, 124]. Out of those
protocols, we selected the invalidation-only (IO) scheme for the comparison analysis. The other
two protocols were left out due to their similarity to the MVCC-BS and MVCC-SFBS schemes.
Additionally, we compare our protocols with the APPROX algorithm [139], which provides View
Consistency along with EOT data currency guarantees to read-only transactions. In [139], two implementations, namely F-Matrix and R-Matrix, were developed for the APPROX algorithm. We
have selected a variant of the F-Matrix, called F-Matrix-No, for the comparison analysis, since it
showed the best performance results among the four protocols, namely Datacycle [67], R-Matrix,
F-Matrix, and F-Matrix-No, experimentally compared in [139]. F-Matrix-No differs from the FMatrix protocol by ignoring the cost of broadcasting concurrency control information for each
database object, and therefore can be used as a baseline for measuring the best possible performance of the protocol’s underlying guarantees.
Figure 4.7 shows the relationship between our proposed protocols (printed in bold and italics)
and the ones published in literature (printed in normal type). As the figure indicates, both the IO
scheme and F-Matrix-No protocol ensure EOT data currency. Therefore, they are not expected to
perform well especially under the 0.95 workload where the clients’ access pattern is highly skewed,
thus resulting in frequent conflicts.
We are now ready to present the results of the comparison study. As shown in Figures 4.8
and 4.9, MVCC-SFBVC turns out to be superior over the other compared protocols. On average,
MVCC-SFBVC outperforms the F-Matrix-No by 90.9% for the 0.95 workload and by 85.7% for
the 0.80 workload. The average performance degradation of the IO scheme relative to the MVCCSFBVC protocols is 95.5% for the 0.95 workload and 93% for the 0.80 workload. The reason for
the relatively poor performance of the F-Matrix-No and IO schemes is related to the strong data
currency requirements these two protocols impose. The F-Matrix-No performs moderately better
than the IO scheme since the former processes read-only transactions with weaker consistency
guarantees than the later. While the IO scheme forces read-only transactions to abort whenever they
had read some object version that was later updated by some read-write transaction, the constraints
imposed by the F-Matrix-No protocol are less severe. Here, only those read-only transactions need
116
Data Consistency Guarantees
Serializability
MVCC−BS
MVCC−SFBS
Multiversioning Multiversioning
Scheme
with
Invalidation
Scheme
Update
Serializability
MVCC−SFBUS
Read Committed &
Transaction
Consistency
MVCC−SFBVC
BOT
Invalidation−only
Scheme
F−Matrix−No
BOT with read
EOT
forward obligation
Data Currency Guarantees
Figure 4.7: Protocols studied with their respective data consistency and currency guarantees.
to be aborted which had observed an object version than was later updated by some read-write
transaction T j , and T j belongs to the set of transactions whose effects have been (either directly or
indirectly) seen by the respective read-only transaction. Thus, the transactions potentially aborted
by the F-Matrix-No protocol form a proper subset of the ones aborted by the IO scheme.
12
MVCC-SFBVC (0.80)
MVCC-BS (0.80)
F-Matrix-No (0.80)
IO Scheme (0.80)
10
Throughput / Second
117
Transaction
Length
MVCC-BS
10
20
30
40
50
55.0%
93.4%
99.6%
100%
100%
F-Matrix-No
IO scheme
8
6
4
35.4%
86.8%
97.2%
100%
100%
64.3%
97.9%
100%
100%
100%
2
0
10
20
30
40
50
Transaction Length
18
Throughput / Second
MVCC-SFBVC (0.95)
MVCC-BS (0.95)
F-Matrix-No (0.95)
IO Scheme (0.95)
16
14
Transaction
Length
MVCC-BS
10
20
30
40
50
29.7%
70.8%
91.6%
98.0%
99.8%
F-Matrix-No
IO scheme
12
10
8
6
53.4%
96.5%
99.9%
100%
100%
76.3%
99.5%
100%
100%
100%
4
2
0
10
20
30
40
50
Transaction Length
Figure 4.8: Absolute and relative throughput results gained by the comparison study under the baseline
setting of the simulator. Relative performance results presented in tabular form are related to the MVCCSFBVC protocol.
118
20
18
Transaction
Length
Aborts / Second
16
MVCC-BS
F-Matrix-No
IO scheme
14
10
20
30
40
50
12
10
MVCC-SFBVC (0.80)
MVCC-BS (0.80)
F-Matrix (0.80)
IO Scheme (0.80)
8
6
55.0%
93.4%
99.6%
100%
100%
35.4%
86.8%
97.2%
100%
100%
64.3%
97.9%
100%
100%
100%
4
2
0
10
20
30
40
50
Transaction Length
30
25
Aborts / Second
MVCC-SFBVC (0.95)
MVCC-BS (0.95)
F-Matrix (0.95)
IO Scheme (0.95)
Transaction
Length
MVCC-BS
F-Matrix-No
IO scheme
20
10
20
30
40
50
15
10
29.7%
70.8%
91.6%
98.0%
99.8%
53.4%
96.5%
99.9%
100%
100%
76.3%
99.5%
100%
100%
100%
5
0
10
20
30
40
50
Transaction Length
Figure 4.9: Absolute and relative performance penalty of the MVCC-BS, F-Matrix, and IO scheme compared
to the MVCC-SFBVC protocol in terms of transaction aborts per second.
4.6. Conclusion and Summary
4.6
119
Conclusion and Summary
In this chapter, we have presented formal definitions of four new ILs suitable for managing readonly transactions in mobile broadcasting environments. We have given concrete examples on how
the individual ILs differ from each other, provided evidence that their underlying read rules produce
correct histories and identified possible anomalies that may arise in using weaker ILs such as Strict
Forward BOT Update Serializability or Strict Forward BOT View Consistency than (Full) Serializability for processing read-only transactions. We have also described a suite of MVCC protocols
that efficiently implement the newly defined ILs in a hybrid data delivery environment. Finally, the
implementations of our defined ILs were compared by means of a performance study which experimentally confirmed the hypothesis that protocols with weaker correctness requirements outperform
implementations of stronger ILs as long as they enforce the same data currency guarantees. A comparison study has shown that the MVCC-SFBVC scheme is the best concurrency control mechanism
for cacheable transactions executed in mobile broadcasting environments. Thus, MVCC-SFBVC
should always be the first choice for processing read-only transactions in mobile disseminationbased environments whenever read-only transactions are not required to serialize with the complete
set of committed transactions in the system. Otherwise, the MVCC-SFBS protocol is to be preferred.
120
“The process of scientific discovery is, in
effect, a continual flight from wonder.”
– Albert Einstein
Chapter 5
Efficient Client Caching and Prefetching Strategies to
Accelerate Read-only Transaction Processing
5.1
Introduction and Motivation
Remember that a mobile hybrid data delivery network is a communication medium that combines
push- and pull-based data delivery in an efficient way by broadcasting the data objects that are of
interest to a large client population and unicasting less popular data objects only when they are
requested by the clients. While a combined push/pull data delivery mode has many advantages, it
also suffers from two major disadvantages: (a) The client data access latency depends on the length
of the broadcast cycle for data objects that are fetched from the broadcast channel. (b) Since most
of the data requests can either be satisfied by the clients themselves or the broadcast channel, the
server lacks clear knowledge of the client access patterns. While the latter weakness can be diminished by subscribing to the broadcast server and by sending usage profiles to it [4, 38, 114] or by
dynamically adjusting the broadcast content on the basis of “broadcast miss” information received
through direct data requests submitted by clients through point-to-point channels [147], the former
can be relaxed by designing and deploying an efficient cache replacement and prefetching policy
that is closely coupled with the transaction manager of the mobile client. Such a tight coupling
121
122
Chapter 5. Client Caching and Prefetching Strategies to Accelerate Read-only Transactions
of the transaction manager with the client cache replacement and prefetching manager is required
since multi-versioning has been identified as a viable approach to diminish the interference between
concurrent read-only and read-write transactions by servicing read requests with obsolete, but nevertheless appropriate object versions (see Chapter 4). As client cache resources are typically scarce
and to support MVCC efficiently, the client cache manager’s task is to maintain only those object
versions in the cache that are very likely to be accessed in the future and to evict all those that are
unlikely to be referenced and have become useless from the CC point of view. While statistical information on the object popularity can be exploited to decide on the likelihood of an object version
to be accessed in the future, it requires in-depth scheduling knowledge of the transaction manager
to determine whether an object version can be safely evicted from the cache since it is not any more
required for servicing “out-of-order” read requests.
Unfortunately, exploiting multi-versioning to improve the degree of concurrency between readwrite transactions is not as effective as for read-only transactions since read-write transactions typically need to access up-to-date (or at least “close” to current) object versions to provide serializability guarantees. Therefore, in the remainder of the chapter, we concentrate on multi-versioning
as the means of improving the performance of mobile applications issuing read-only transactions
exclusively. For a detailed qualitative and quantitative evaluation of the potential of using multiversion data caching and prefetching to efficiently satisfy the data requests of mobile read-write
transactions, we refer the interested reader to Subsections 6.4.1 and 6.5.5.4 of the thesis.
5.1.1
Multi-Version Client Caching
So far, we have indicated that multi-versioning is a graceful and practicable approach to process
read-only and read-write transactions concurrently. In what follows, we highlight various issues
a mobile cache and prefetch manager needs to consider, so that key performance metrics such as
transaction throughput and transaction abort rate are maximized and minimized, respectively.
Since data caching is an effective, if not the most effective, and, therefore, indispensable way
of reducing transaction response times [33], cache replacement policies have been extensively
studied in the past two decades for stationary database management systems [44, 83, 85, 99, 115].
5.1. Introduction and Motivation
123
As conventional caching techniques are inefficient for wireless, broadcast-based networks where
communication channels form an intermediate memory level between the clients and server and
where communication quality varies over space and time, mobile cache management strategies [4,5,89,90,149,166] have been designed, that are tailored to the peculiarities and constraints of
mobile communication systems. However, to our knowledge, none of the proposed caching policies
designed either for the stationary or for the mobile client-server architecture tackles the problem of
managing multi-version client buffer pools efficiently. Multi-version client caching differs from
mono-version caching by at least two key observations: (a) The cost/benefit value of different versions of a data object in the client cache may vary over time depending on the storage behavior of
the server, i.e., if the server discards an object version useful for the client, this version’s cost/benefit
value increases since it cannot be re-acquired from the server. (b) Versions of different data objects
may, for the very same reason, have dissimilar cost/benefit values despite being equally likely to be
referenced.
The following example illustrates the aforementioned peculiarities. Suppose a diskless mobile
client executes a read-only transaction Ti with BOT Serializability consistency (see Section 4.3.2 for
its formal definition), i.e., Ti is always forced to observe the most recent object versions that existed
by its starting point. Assume the start timestamp ST S of Ti is 1, i.e., the non-decreasing sequence
number of the current MIBC is 1, and the database consists of four objects {a, b, c, d}. The client
cache size is small and may hold only two object versions. Further, it is up to the client how many
versions of each object it maintains. For space and time efficiency reasons, the database server holds
a restricted number of versions, namely the last two committed versions of each data object. Also
assume the client’s access pattern is totally uniform, i.e., each object is equally likely to be accessed
and at the end of MIBC 5 the client cache holds the set of objects {a0 , b0 } and the server keeps
objects {a1 , a3 , b0 , b1 , c0 , c4 , d2 , d5 }. Note that the subscripts assigned to object versions correspond
to the identifier of the MIBC that existed when the transaction that created the respective version
committed. Now suppose the client needs to read a transaction-consistent version of object c. Since
there is no cache-resident version of object c, the client fetches the missing object from the server.
By the time the object arrives at the client, the local cache replacement policy needs to select a
124
b0
d0
a1
b1
d2
a0
a0
a1
a0
a1
a1
a3
a1
a3
a1
b0
c0
d0
b0
c0
d0
b1
b0
c0
d0
b1
b0
c0
d0
b1
d2
b0
c0
d0
b1
c4
d2
b0
c0
d2
b0
a0
b0
a0
Newly Created a 0
Object Versions c 0
Server Cache
Content
Network
Client Cache
Content
MIBC
a0
Miss
d2
Miss
a0
0
a3
a0
1
2
b0
a0
3
a6
d6
a3
a3
a6
b1
c4
d5
b0
c0
d5
b1
c4
d6
Miss
c0
a0
c0
d5
c4
4
b0
5
6
Timeline
Figure 5.1: An example illustrating the peculiarities of multi-version client caching.
replacement victim to free some cache space. In this case, a judicious cache replacement strategy
would evict b0 since it is the only object version that can be re-acquired from the server, i.e., a
cache replacement policy suitable for a multi-version cache needs to incorporate both probabilistic
information on the likelihood of object references in the future and data re-acquisition costs. To
conclude and to deepen the understanding of the described caching scenario, Figure 5.1 specifies
the example’s underlying changes on the database state of the server and illustrates their impact on
the server and client cache contents as time progresses.
5.1.2
Multi-Version Client Prefetching
Apart from demand-driven caching and judicious eviction of object versions from the client cache,
another technique that can be used to reduce on-demand fetches is data prefetching by which the
client optimistically fetches object versions from the server and/or broadcast channel into the local
cache in expectation of a later request. Since prefetching, especially if integrated with caching,
strongly affects transaction response time, various combined caching and prefetching techniques
have been studied for stationary database systems [32, 83, 119, 151]. Work on prefetching in mobile data broadcasting environments has been conducted by [5, 89, 90, 149]. Again, as for caching,
prefetching mechanisms proposed in the literature are inefficient for mobile dissemination-based
applications utilizing MVCC protocols to provide consistency and currency guarantees to read-only
transactions. The reasons are twofold: (a) First, existing algorithms on data prefetching for wireless
5.2. System Design and General Assumptions
125
data dissemination such as PT, its approximation APT [5], Gray [89, 90], OSLA and its generalization the W-step look-ahead scheme [149] are based on simplified assumptions such as no database
updates and no use or availability of back-channels to the server. (b) Second, and even more importantly, all previous prefetching strategies were designed for mono-version database systems and,
therefore, lack the ability to make proper prefetching decisions in a multi-version environment. In
contrast, we base our model on more realistic assumptions and develop a prefetching algorithm that
is multi-version compliant. As prefetching may unfold its total strength if deeply integrated with
data caching, our prefetching algorithm uses the same cost/benefit metric for evaluating prefetching
candidates as the cache replacement algorithm. To ensure that the prefetching algorithm does not
hurt, but rather improve performance, we allow prefetches of only those object versions that have
been recently referenced and whose cost/benefit value exceeds the value of any currently cached
object version.
5.1.3
Outline
The chapter is structured as follows: Section 5.2 describes the system model and general design
assumptions underlying this research study. In Section 5.3 a new multi-version integrated caching
and prefetching policy, called MICP, is introduced along with an implementable approximation of
MICP that we refer to as MICP-L. Section 5.4 reports on detailed experimental results that show
the superiority of our algorithm compared to previously proposed online caching and prefetching
policies and quantifies the performance penalty of MICP compared to an offline probability-based
caching and prefetching algorithm P-P having full knowledge of the client access behavior. The
chapter’s conclusions and summary can be found in Section 5.5.
5.2
System Design and General Assumptions
The focus of this chapter is to develop efficient cache management and prefetching algorithms
which provide mobile clients with good performance in a dissemination-based environment. In
what follows, we present a brief overview of the core components of the system architecture for
126
which MICP is developed and give the general assumptions about the environment it is designed.
As the envisaged system architecture does not differ from that discussed in Chapter 4, the reader
may skip Sections 5.2.1 and 5.2.2 in sequential reading.
5.2.1
Data Delivery Model
We have chosen a hybrid data delivery system as underlying network architecture for MICP since
a hybrid push/pull scheme has the ability to mask the disadvantages of one data delivery mode by
exploiting the advantages of the other. Since broadcasting is especially effective when used for
popular data, we assume that the server broadcasts only such data that is of interest to the majority
of the client population. Our broadcast structure is logically divided into three segments of varying
sizes: (a) index segment, (b) data segment, and (c) CCR segment. Each MIBC is supplemented with
an index to eliminate the need for the mobile clients to continuously listen to the broadcast in order
to locate the desired data object on the channel. We choose (1, m) indexing [78] as the underlying
index allocation method by which the whole index, containing, among other things, a mapping
between the objects disseminated and the identifiers of the data pages in which the respective objects
appear, is broadcast m times per MBC. The data segment, on the other hand, solely contains hot-spot
data pages. Note that we assume a flat broadcast disk approach for page scheduling, i.e., each and
every data page is only broadcast once within an MBC. For data consistency reasons, we model the
broadcast program so that all data pages disseminated are a consistent snapshot as of the beginning
of each MBC. Thus, the modified or newly created object versions committed after the beginning
of an ongoing MBC will not be included in any data page disseminated during the current MBC.
To guarantee cache consistency despite server updates, each MIBC is preceded with a concurrency
control report or CCR as described below.
The second core component of the hybrid data delivery system are the point-to-point channels.
Point-to-point channels may be utilized by mobile clients to request locally missing, non-scheduled
data objects from the broadcast server. Also clients are allowed to use the back-channel to the server
when a required data object is scheduled for broadcasting, but its expected arrival time is above the
uplink usage threshold [6] dynamically set up by the server. This optimization helps clients improve
127
their response times.
5.2.2
Client and Server Cache Management
Conventional caching and prefetching strategies are typically page-based as the optimal unit of
transfer between systems resources are pages with sizes ranging from 8 KB to 32 KB [55]. In mobile data delivery networks caching and prefetching data on a coarse granularity such as pages is
inefficient due to the physical constraints and characteristics of the mobile environment. For example, the communication in client-server direction is handicapped by low bandwidth communication
channels. Choosing page-sized granules to be the unit of transfer for data uploads would be a waste
of bandwidth compared to sending objects of much smaller size in case of a low degree of locality.
Since a broadcast server typically serves a large client population and each client tends to have its
own set of frequently accessed data objects, it is not unrealistic to assume that the physical data organization of the server may not comply with the individual access pattern of the clients. Therefore,
and in order to increase the hit ratio of the client cache and in order to save scarce uplink bandwidth
resources, we deploy our caching and prefetching schemes primarily on the basis of data objects.
However, to allow clients to cache pages as well, we opt for a hybrid client cache consisting of
a small-size page cache and a large-size object cache. While the page cache is primary used as
temporary storage memory to extract and copy requested or prefetched object versions into the object cache, the object cache’s task is to efficiently maintain those object versions, i.e., it is used as
permanent storage space. Note that our intuition behind such a cache structure was experimentally
confirmed by a performance study [42] demonstrating that an object-based caching architecture is
superior to a page-based one when physical clustering is poor and the client’s cache size is small
relative to the size of the database, which is typically the case in mobile environments. The client
object cache itself is partitioned into two variable-size segments: (a) the REC (re-cacheable) segment and (b) the NON-REC (non-re-cacheable) segment. As their names imply, the REC segment is
used to store object version that may be re-fetched from the server, while the NON-REC segment is
exclusively used to maintain object versions that cannot be re-acquired from the server as they have
been evicted from it. To avoid wasting scarce client memory space, the size of both segments is
128
not fixed to some predefined value, but rather can be dynamically adjusted (within certain bounds)
to the needs of the moment (see Section 5.3.1 for more information on that). Figure 5.2 finally
illustrates the above described organization of the client cache along with information on how the
page and object caches are implemented.
Page Cache
Tail
Head
f
b
m
k
d
s
Root
Root
c0
b2
Object Cache
a
a8
v2
x1
c3
r0
e2
a5
d3
f 3 x5
t 6 k4
l 1 w7
g5 h9
Binary Min−Heap
e 1 p3
y0
i1
Binary Min−Heap
REC
NON−REC
NON−REC
Upper Cache Size
Boundary
Adjustable
Cache Size
Delimiter
Figure 5.2: Organization of the client cache.
As for the mobile clients, the broadcast server manages its cache as a hybrid system that permits
both page and object caching. To increase cache memory utilization, the object cache is designed
to be much larger than the page cache and the former is partitioned into two segments: (a) a large
modified object cache (MOB) and (b) a small temporary object cache (TOB). The structure of the
MOB is similar to the one described in [53] with the exception that multiple versions of objects
may be maintained to reduce the number of data miss-induced transaction aborts. The TOB, on the
other hand, is used to as temporary storage space for uncommitted and recently committed object
versions, and the latter are merged into the MOB at the end of each MIBC. Again note that the use
of both cache types (page and object caches) allows us to exploit the benefits of each. While the
page cache is useful for storing data to the broadcast disk, handling installation reads [118], etc.,
the object cache is attractive for recording object modifications and servicing object requests.
5.2.2.1
129
Data Versioning
To be able to distinguish between different versions of the same data object and to correctly synchronize read-only transactions with committed and/or currently active read-write transactions in
a mobile multi-version environment, each object version is assigned the identifier of the transactions that wrote the version, i.e., a write operation on an object x by transaction Ti installs object
version xi . As we assume that any transaction Ti cannot modify an object x multiple times during its lifetime, the used notation identifies each newly created object version non-ambiguously.
Since multi-versioning imposes additional memory and processor overheads on the mobile clients
and the server, we assume that the number of versions maintained in the involved memory levels
is restricted. For clients it is sufficient to maintain at most two versions of each database object
at any time as we assume that clients do not execute read-only transactions in parallel. In contrast, the server may need to maintain every object version in order to guarantee that any read-only
transaction can read from a transaction-consistent database snapshot. Since such an approach is
impracticable, we assume that the server maintains a fixed number of versions in the MOB (see
Section 5.4.5 for a performance experiment on this issue).
5.2.2.2
Client Cache Synchronization
Hoarding, caching, or replicating data in the client cache is an important mechanism to improve the
data availability, system scalability, application response time, and to reduce the power consumption of mobile clients. However, data updates at the server make cache consistency a challenge. An
effective cache synchronization and update strategy is needed to ensure consistency and freshness
between the the primary or source data at the server and the secondary or data cached at the client
and the original data at the server. Although invalidation messages are space and time efficient
compared to propagation messages, they lack the ability to update the cache with new object versions. Due to the inherent tradeoffs between propagation and invalidation, we employ a hybrid of
the two techniques. On the one hand, the broadcast server periodically disseminates a CCR which
is a simple structure that contains, in addition to concurrency control information, copies of of all
those object versions that have been created during the last MIBC (see Section 4.4 for more in-
130
formation). Based on those reports, mobile clients operating in connected mode can easily update
their caches at low costs. However, since CCRs contain only concurrency control information w.r.t.
the last MIBC, those reports are useless for cache synchronization of recently reconnected clients
that had missed one or more CCRs. To resolve this problem, we assume that the server maintains
the update history of the last w MBCs as proposed in [20, 21]. This history is used for client cache
invalidation as follows: when a mobile client wakes up from a disconnection, it waits for the next
CCR to appear and checks whether the following equation is valid: IDCCR,c < IDCCR,l + w, where
IDCCR,c denotes the timestamp of the current CCR and IDCCR,l represents the timestamp of the latest CCR report received by the client. If so, a dedicated invalidation report (IR) can be requested
by the client (containing the identifiers of the data objects that have been modified or newly created
during the course of disconnection) to invalidate its cache properly. If the client was disconnected
for more than w MBCs, the entire cache contents has to be discarded upon reconnection.
5.3
MICP: A New Multi-Version Integrated Caching and Prefetching
Algorithm
The design of MICP consists of two complementary algorithms that behave synergistically. The
first algorithm, responsible for selecting cache replacement victims, is called PCC (Probabilistic
Cost-based Caching) and the second one dealing with data prefetching is termed PCP (Probabilistic
Cost-based Prefetching). While PCC may be employed without PCP in order to save scarce CPU
processing and battery power of mobile devices, PCP’s potential can be exploited by coupling it
with a cache replacement policy that uses the same or a similar metric for decision making.
5.3.1
PCC: A Probabilistic Cost-based Caching Algorithm
The major goal of any cache replacement policy designed either for broadcasting or for unicasting
environments is to minimize the average response time a user/process experiences when requesting
data objects. Traditional cache replacement policies try to achieve this goal by making use of
two different approaches: (a) the first category requires information from the database application.
5.3. MICP: A New Multi-Version Integrated Caching and Prefetching Algorithm
131
That information can either be obtained from the application directly or from the query optimizer
that processes queries of the corresponding application. (b) The second category of replacement
algorithms bases its decisions on observations of past access behavior. The algorithm proposed
in this paper belongs to the latter group, extends the LRFU policy [98, 99] and borrows from the
2Q algorithm [85]. As for LRFU policy, PCC quantifies the probability of an object being rereferenced in the future by associating with each object x a score value that incorporates the effects
of the frequency and recency of past references on that object. More precisely, PCC computes a
combined recency and frequency value or CRF for each object x whenever it is referenced by a
transaction according to the following formula:
CRFn+1 (x) = 1 + 2−(λ·(IDre f ,c −IDre f ,l (x))) ·CRFn (x),
(5.1)
where CRFn (x) is the computed combined recency and frequency value of object x over the last
n references, IDre f ,c denotes the monotonically increasing reference identifier associated with the
current object reference, IDre f ,l (x) is the reference identifier assigned to object x when it was last
accessed, and λ (0 ≤ λ ≤ 1) is a kind of “slide controller” that allows PCC to weigh the importance
of recency and frequency information for the replacement selection. Note that if λ converges towards 0 PCC behaves more like an LFU policy and, contrarily, with λ approaching 1 it acts more
like an LRU policy.
In contrast to the LRFU algorithm, PCC bases its replacement decisions not only on recency
and frequency information of historical reference patterns, but additionally makes use of three further parameters (besides the future reference probability of objects as expressed by CRF). First,
and in order to reflect the situation that instantaneous access costs of data objects scheduled for
broadcasting are non-constant due to the serial nature of the broadcast medium, PCC’s replacement decisions are sensitive to the actual state and contents of the broadcast cycle. More precisely,
PCC accounts for the costs of re-acquiring object versions by evicting those versions that have low
probabilities of access and low re-acquisition costs. To provide a common metric for comparing
costs of ejecting object versions that can be re-cached from the broadcast channel and/or database
132
server, we measure re-acquisition costs in terms of broadcast units. Since we assume that the content and organization of the broadcast program does not change significantly between consecutive
MBCs and the clients are aware of the position of each object version in the MBC due to (1, m)
indexing, determining the number of units, i.e., broadcast ticks, until an object version re-appears
on the channel is straightforward. However, estimating the cost of re-fetching a requested version
from the server is more difficult since that value depends on parameters such as the current load on
the communication network and server as well as the effect of server caching. To keep our cache
replacement algorithm as simple as possible, we use the uplink usage threshold [6] as a simple
guideline for approximating data fetch costs. Since the uplink usage threshold provides a tuning
knob to control the server and network utilization and, thus, affect data fetch costs, its dynamically
fixed value correlates with the data fetch latency a client experiences when requesting data objects
from the server. If the threshold is high, the system is expected to operate under a high workload
and, therefore, data retrieval costs are high as well. In what follows, we denote the re-acquisition
cost of an object version xi at time t by RCt (xi ).
A second parameter that PCC uses to make cache replacement decisions is the update probability of data objects. As noted before, multi-version database systems suffer from high processing
and storage overheads if the number of versions maintained by the server for each object is not
restricted. However, limiting the number of versions negatively affects the likelihood of data requests from the clients being satisfied by the server. To provide mobile clients with information on
the probability that an object x is being updated during the next MBC, the server needs to estimate
that value. It does so by using a well-known exponential aging method. At the end of each MBC,
the server (re-)estimates the update probability of any object x that has been modified during the
completed MBC by using the following formula:
UPn+1 (x) = (1 − α) ·UPn (x) +
α
,
IDMBC,c − IDMBC,l (x)
(5.2)
where IDMBC,c is the non-decreasing identifier of the current MBC, IDMBC,l (x) denotes the identifier
of the MBC where object x was last updated, UPn (x) represents the update probability of object x
133
based on its n previous updates, and α (0 ≤ α ≤ 1) is an aging factor to adapt to changes in the data
update patterns. The higher α, the more important are recent updates of x.
Last but not least, a cache replacement policy that wants to be effective in maintaining multiversion client caches needs to take into account the server’s version storage policy. Besides the
update probability of each data object, the version maintenance strategy of the server affects the
likelihood that an obsolete object version can be re-acquired once evicted from the client cache.
The more versions of an object x are kept by the server, the higher the probability that the server
can satisfy requests for specific versions of x. PCC incorporates the versioning policy of the server
by means of two complementary methods: (a) it computes re-acquisition costs of in-memory object
versions based on their re-fetch probabilities (see Equations 5.4 and 5.5) and (b) it takes care of
non-re-cacheable object versions by placing them into their own dedicated partition of the client
object cache, namely the NON-REC segment. Re-cacheable object versions, on the other hand, are
maintained in the REC segment of the client object cache as noted above.
The reason for cache partitioning is to prevent undesirable replacement of non-re-cacheable versions by referenced or prefetched re-cacheable object versions. With regard to the size of the cache
partitions, we experimentally established that NON-REC should not exceed 50% of the overall
client cache size, i.e., REC should never be smaller than 50% of the available amount of client memory space. The justification for those values is as follows: the majority of users issuing read-only
transactions want to observe up-to-date (or at least “close” to current) object versions [41,145,163],
i.e., they usually initiate read-only transactions with either BOT or strict forward BOT data currency
guarantees (see Chapter 4). The assumptions that clients do not execute more than one read-only
transaction at a time and transactions are issued with at least BOT data currency requirements imply
that at their starting points only up-to-date object versions are useful for them, i.e., the NON-REC
segment of the client cache is empty at this stage. As transactions progress, more and more useful
object versions may become non-re-cacheable and need to be placed into NON-REC. Since the
storage space needed to maintain non-re-cacheable object versions is not known in advance and
depends on factors such as the transaction size, the user’s/transaction’s data currency requirements,
the rate at which objects are being updated, etc., PCC adapts to this situation by changing the size
134
of NON-REC dynamically. That is, as demand for more storage space in NON-REC arises, PCC
dynamically extends the size of the NON-REC segment by re-allocating object slots from REC to
NON-REC as long as its size does not exceed 50% of the overall client cache size. Without this
upper bound, the system performance could degrade due to insufficient cache space reserved for
up-to-date or nearly up-to-date (re-cacheable) versions. It is important to note that this cache structure suits read-write transactions as well, since they have similar data requirements as read-only
transactions with the exception that potentially fewer non-current object versions are requested for
correctness reasons (see Section 6.4.1 for further discussions).
As all of the aforementioned parameters influence replacement decisions, it is obvious that there
is a need for a combined performance metric to enable comparison of those values that would be
meaningful for the client cache manager. To this end, we combine the estimates given above into
a single performance metric, called probabilistic cost/benefit value (PCB), which is computed for
each cache-resident object version xi at eviction time t as follows:
PCBt (xi ) = CRFt (x) · (Thit (xi ) + Tmiss (xi )).
(5.3)
In the above formula, CRFt (x) denotes the re-reference probability of object x at time t, Thit is the
weighted time in broadcast ticks it takes to re-fetch object version xi if evicted from the cache, and
Tmiss represents the weighted time required to re-process the completed read operations of the active
read-only transaction, denoted T j in what follows, in case it needs to be aborted since xi is not any
more system-resident and thus cannot be provided to T j .
The weighted time to service a request for object version xi that hits either the air-cache or the
server memory is the product of the following parameters:
Thit (xi ) = (1 −UP(x)Nver (xi ) ) · RC(xi ),
(5.4)
where Nver (xi ) denotes the number of versions of object x with CTSs equal to or older than xi
currently being kept or potentially allowed to be kept by the server according to its version management policy. Further on, we compute Tmiss (xi ) as a weighted approximation of the amount of time
135
it would take the client to restore the current state of T j for which xi is useful in case T j has to be
aborted due to a fetch miss of xi :
Tmiss (xi ) = UP(x)Nver (xi ) · Tre−exe, j ,
(5.5)
where Tre−exe, j denotes the sum of the estimated retrieval and processing times it would take the
client to re-execute T j ’s data operations (if aborted) and is computed as follows:
Tre−exe, j = Chit · Nop, j + (1 −Chit ) · Nop, j ·
LMBC
,
2
(5.6)
where LMBC represents the average length of the MBC, Chit denotes the average cache hit rate of
the client, and the expression Nop, j symbolizes the number of read operations executed so far by
read-only transaction T j . As Formula 5.6 indicates, we assume that the average latency to fetch a
non-cache-resident object version into the client memory takes half a broadcast period independent
of whether that object appears on the broadcast channel or has to be requested through point-to-point
communication. We opted for this simplification to refrain the algorithm from further complexity
inflation.
The complete PCC algorithm invoked upon a reference to an object version xi is illustrated
below. Algorithm 5.1 contains a number of functions/procedures used to modularize the code.
The function segment(xi ) determines the segment of the client cache in which object version xi
is or will be maintained, the procedure select victim(CSi ) selects and evicts the object version
with the lowest PCB value from the cache segment CSi and the function retrieval latency(xi )
returns the estimated time to service a fetch request for object version xi .
5.3.2
PCP: A Probabilistic Cost-based Prefetching Algorithm
While PCC achieves the goal of improving transaction response times by caching demand-requested
object versions close to the database application, PCP tries to further reduce fetch latency by pro-
136
Notations:
CSi : Client object cache segment i, i ∈ {REC,NON-REC}.
T (xi ): Estimated weighted time to service a fetch request for object version xi .
begin
if xi is already cache-resident then
break
else
/* determine replacement victim
if there is no free space for xi in segment(xi ) then
select victim(segment(xi ))
insert xi into the free or recently freed cache slot
compute CRFn+1 (x) according to Equation 5.1;
IDre f ,l (x) ←− IDre f ,c
end
*/
Function segment(xi )
begin
if xi is or will be stored into cache segment “REC” then
return “REC”
else
return “NON-REC”
end
Procedure select victim(CSi )
begin
min ←− 0.0;
foreach object version xi in segment CSi do
T (xi ) ←− retrieval latency(xi );
PCBt (xi ) ←− CRF(x) · T (xi );
if PCBt (xi ) ≤ min then
victim ←− xi ;
min ←− PCBt (xi )
evict the determined replacement victim from cache;
end
Algorithm 5.1: Probabilistic Cost-based Caching (PCC) Algorithm.
actively loading useful object versions with high access probability and/or high re-acquisition costs
into the client cache in anticipation of their future reference. As uncontrolled prefetching without
reliable information might not improve, but rather harm the performance, the greatest challenge of
PCP is to decide when and which object version to prefetch and which cache slot to overwrite with
the prefetched version when the cache is full. PCP tackles those challenges as follows: in order
Function retrieval latency(xi )
begin
if segment(xi ) = “REC” then
calculate Thit (xi ) according to Equation 5.4;
calculate Tmiss (xi ) by using Equation 5.5
else
/* segment(xi ) = “NON-REC”
Thit (xi ) ←− 0.0;
compute Tmiss (xi ) according to Equation 5.5 with UP(x) set to 1
T (xi ) ←− Thit (xi ) + Tmiss (xi );
return T (xi )
end
137
*/
Algorithm 5.1: Probabilistic Cost-based Caching (PCC) Algorithm (cont’d).
to behave synergistically with PCC, PCP bases its prefetching decisions on the same performance
metric, namely PCB. Since calculating PCB values for every object version that flows past the
client is very expensive, if not infeasible, PCP computes those values only for a small subset of the
potential prefetching candidates, namely recently referenced objects.
The reason for choosing this heuristic is the assumption that reference sequences exhibit temporal locality [40]. Temporal locality states that once an object has been accessed, there is a high
probability that the same object (either the same or different version) will be accessed again in the
near future. To decide whether an object has recently been referenced, clients need to maintain
historical information on past object references. As will be explained later, we assume that clients
retain such information for the last r distinct object accesses where r depends on the actual client
cache size. Based on this statistical data, PCP selects its prefetch candidates by a simple policy.
In order for a disseminated object version xi to qualify for prefetching, there must exist any recent
entry for object x in the reference history. The exact decision how recent an object reference has
to be in order for the object to qualify for prefetching is left up to the client since the prefetching
decision process is computationally expensive and has to be aligned with the client’s resources. If
the object qualifies for prefetching, PCP computes xi ’s PCB value and compares it with the PCP
values of all currently cached object versions. If xi ’s PCB value is greater than the least PCB value
of all cache-resident object versions then xi is prefetched and replaces the lowest valuable version.
138
As for the PCC algorithm, prefetch candidates compete for the available cache space only with
those versions that belong to the same cache segment.
Apart from prefetching current and non-current versions of recently referenced objects, PCP
downloads from the broadcast channel all useful versions of data objects that will be discarded
from the server by the end of the MBC. The intuition behind this heuristic is to minimize the number of transaction aborts caused by fetch requests that cannot be satisfied by the server. A viable
approach to reducing the number of fetch misses is to cache those versions at the client before they
are garbage-collected by the server. To implement this approach, mobile clients need information as
to whether a particular object version will be disseminated for the last time on the broadcast channel. There are basically two ways how clients could receive such information: (a) First and most
conveniently, the server indicates whether an object version is about to be garbage-collected. That
information could be provided, for example, by adding a bit field in the header of each disseminated
data page containing a bit for each object version stored in the data page that indicates whether it
will be evicted from the server at the end of the MBC. (b) Alternatively, clients could determine
whether an object version becomes non-re-cacheable by keeping track of the object version history
and using knowledge of the server’s version storage policy. As the latter approach may consume
lots of valuable client storage space, we opt for the first approach. To summarize, the complete
pseudo-code of PCP is depicted in Algorithm 5.2 being presented below. To improve the readability of PCP algorithm, its pseudo-code is modularized into two components: (a) the algorithm’s
main procedure and (b) the function min pcb(CSi ) that returns the PCB value of the object version
with the lowest PCB value of all the versions currently kept in cache segment CSi .
5.3.3
Maintaining Historical Reference Information
It has been noted that MICP takes into account both recency and frequency information on past data
references in order to select cache replacement victims. Similar to LRFU, MICP maintains CRF
values on a per-object basis that capture information on both recency and frequency of accesses.
However, in order for MICP to be effective, such values need to be retained in client memory not
only for cache-resident objects, but also for evicted data objects. The necessity to keep historical
139
Notations:
CSi : Client object cache segment i, i ∈ {REC,NON-REC}.
T j : Read-only transaction currently run by the client.
p: A disseminated data or CCR page.
begin
foreach object version xi resident in p do
if xi is useful for T j and xi is not cache-resident and (there exists a CRF value for xi
at the client or xi will be garbage-collected by the server at the end of the current
MBC) then
if there is a free cache slot in segment(xi ) then
insert xi into segment(xi )
else
if PCBt (xi ) > min pcb(segment(xi )) then
select victim(segment(xi ));
insert xi into the slot of the recently evicted object version
end
Function min pcb(CSi )
begin
min ←− 0.0;
foreach object version xi in CSi do
if PCBt (xi ) ≤ min then
min ←− PCBt (xi )
return min
end
Algorithm 5.2: Probabilistic Cost-based Prefetching (PCP) Algorithm.
140
information of a referenced object even after all versions of this object have been evicted from
cache was first recognized by [115] and was termed “reference retained information problem”. This
problem arises from the fact that in order to gather both recency and frequency information, clients
need to keep history information on recently referenced objects for some time. This is in particular
required for determining the frequency of object references. If CRF values are maintained only for
cached data objects and the size of the client cache is relatively small compared to the database size,
then there exists a danger that MICP might over-estimate the recency information since frequency
information is rarely available. On the other hand, storing reference information consumes valuable
memory space that could otherwise be used for storing data objects.
To limit the memory size allocated for historical reference information, O’Neil et al. [115] suggest storing that information only for a limited period of time after the reference had been recorded.
As reasonable rule of thumb for the length of this period they use the Five Minute Rule [55]. However, applying it in a mobile environment may be inappropriate for the following reason: a timebased approach for keeping reference information ignores the available cache size and reference
behavior of the client. For example, if a client operates in disconnected mode due to lack of network coverage, its processing may be interrupted because a data request cannot be satisfied by the
local cache. In such a situation the client needs to wait until reconnection for transaction processing
to continue. Since disconnections might exceed 5 minutes, all the reference information will be lost
during such a period. On the other hand, if the client cache size is small, the reference information
on objects may have to be discarded even sooner than 5 minutes after their last reference. To resolve the problem of determining a reasonable guideline for maintaining CRF values, we conducted
a series of experiments. We figured out that clients with a cache size in the range of 1 to 10% of the
database size should maintain reference information on all recently referenced objects that would
fit into a cache if it were about 5 times as large as its actual size (see Figure 5.6). Clearly, due to
its time-independence such a rule avoids the aforementioned problem of discarding reference information during periods when clients are idle. Second, it limits the amount of memory required for
storing historical information by coupling the retained information period to the client cache size.
5.3.4
141
Implementation and Performance Issues
The previous section has shown that MICP bases its replacement and prefetching decisions on a
number of factors combined into the PCB value. However, this metric is dynamic since it changes
at every tick of the MBC. Although in theory one could obtain the required values while a page is
being transmitted, such an approach would be much too expensive. To reduce overhead, we propose
that the estimate of PCB for each cached data object is updated either only when a replacement
victim is selected or at fixed points in time such as the end of an MIBC. While experimenting
with our simulator, we noticed that both approaches are capable of remarkably reducing processing
overhead while providing good performance results. However, we favor the latter technique since
it may allow MICP to compute PCB values less frequently. In what follows, we refer to the version
of MICP that calculates PCB values periodically as MICP-L where L stands for “light”.
Several statistical parameters are required when calculating PCB values for cache-resident object versions. While most of them can be acquired at the client-side, UP and Nver values are best
obtained directly from the server. Although clients could individually determine UP values for
database objects, this approach would be too expensive in terms of CPU overhead and power consumption. Instead, we propose that the server centrally calculates UP values and periodically broadcasts them. Nver values, on the other hand, can only be determined with knowledge on the version
storage policy of the server. To inform clients on how many versions of an object the server guarantees to maintain, the server assigns a backward version counter to each object version. When a
new object version xi is created at the server, its version counter is initialized to some value v which
equals the number of versions of object x the server is willing to maintain in its memory. Additionally, the counters of existing versions of x are decremented by 1 at both the server and mobile
clients. If the value of the counter is zero, the object version is selected for garbage-collection by
the server and copied from the REC into the NON-REC segment if stored in the client cache.
In addition to reducing processing overhead by restricting the frequency of calculating PCB
values, MICP requires a data structure that efficiently maintains the object versions along with
their PCB values. Like many other cache replacement algorithms, MICP can be implemented with
two (binary) min-heaps (see Figure 5.2) that maintain the ordering of object versions stored in the
142
NON-REC and REC segments, respectively, by their PCB values. Using min-heaps that contain
the object version with the smallest PCB value at the root allows MICP to make cache replacement
decisions in O (1) time. Insert and delete operations take at most O (log2 n) time, where n denotes the
number of object versions maintained in the respective cache partition. Thus, the time complexity
of each cache replacement operation is O (log2 n), which is similar to that of the LFU policy but
considerably higher than that of the LRU. As noted before, PCB values are re-calculated at fixed
time periods. Rebuilding the min-heaps has a time complexity of O (n log2 n).
5.4
Performance Evaluation
We studied and compared MICP’s performance with other online and offline caching and prefetching algorithms numerically through simulation, and not analytically because the effects of such
parameters as transaction size, client cache size or total number of versions maintained for each
each object depend on a number of internal and external system parameters that cannot be precisely
estimated by mathematical analysis. The simulator set-up and the generated workloads are based
on the same system model that was previously used for evaluating the performance of implementations of various new ILs defined to provide well-defined data consistency and currency guarantees
to read-only transactions (see Chapter 4). To gain an insight into the efficiency of our proposed
client caching and prefetching policy, we extended the simulator with a set of popular caching
and prefetching algorithms in addition to MICP and MICP-L. In what follows, we briefly describe
the main characteristics of the simulator on which our experimental results are based. The reader
who has studied the description of the simulator model given in Chapter 4 may skip the related
subsections in this part of the thesis and continue with Section 5.4.3 on page 147.
5.4.1
System Model
The simulation model consists of the following core components: (a) a broadcast server hosting
a central database, (b) mobile clients, (c) a broadcast disk, and (d) a hybrid network, which are
5.4. Performance Evaluation
143
briefly described below and are modeled analogous to those of the simulation model presented in
Chapter 4.
Both the broadcast server and the mobile clients are modeled as consisting of a number of
subcomponents including a processor, volatile cache memory, and magnetic disks, with the latter
being only available to the broadcast server, i.e., we assume diskless mobile clients. Data is stored
on 4 disks and data accesses are uniformly distributed among the disks by means of a shared FIFO
queue. The unit of data transfer between the server and disks is a page of 4 KB and the server keeps
a total of 250 pages in its stable memory. The size of an object is 100 bytes and the database consists
of a set of 10,000 objects. To reflect the characteristics of a modern disk drive, we experimented
with the parameters from the Quantum Atlas 10K III disk [107]. The client CPU speed is set to 100
MIPS and the server CPU speed is 1,200 MIPS, which have been typical processor speeds of mobile
and stationary computing devices when this study was conducted two and a half years ago. We have
associated CPU instruction costs with various events as listed in Table 5.1. The client cache size is
set to 2% of the database size and the server cache size to 20% of the database size. As described
in Section 5.2.2, we model the client cache as a hybrid system consisting of both a page-based
and object-based segment. The page-based segment is managed by an LRU replacement policy
and the object-based segment by various online and offline cache replacement strategies including
MICP and MICP-L. Similarly, the server cache is partitioned into a page cache and a modified
object cache (MOB). The page cache is managed using an LRU policy and the MOB is managed
in a FIFO order. The MOB is initially modeled as a single version cache. This restriction is later
removed to study the effects of maintaining multiple versions of objects in the server cache. Client
cache synchronization and freshness are accomplished by inspecting the CCR at the beginning of
each MIBC and by downloading newly created object versions whose PCB values are larger than
those of currently cached object versions.
The broadcast program has a flat structure. To account for the high degree of skewness in data
access patterns [68] and to exploit the advantages of hybrid data delivery only the latest versions
of the most popular 20% of the database objects are broadcast. Note that we assume that clients
regularly register at the server to provide their access profiles, so that the server can generate the
144
Parameter
Page size (PGSize)
Object cache replacement policy (MOB)
Maximum number of versions maintained for each object in the MOB
Rotational speed
Media transfer rate
Average seek time (read operation)
Page fetch time
Disk array size
Client CPU speed
Server CPU speed
Parameter Value
(Sensitivity Range)
10,000 objects
100 bytes
4,096 bytes
20% of DBSize
20% of SBSize
80% of SBSize
LRU
FIFO
1 (1-5) version(s)
5,000 instr
10,000 RPM
40 Mbps
4.5 ms
3.0 ms
7 instr/byte
7.6 ms
4
100 MIPS
1,200 MIPS
300 instr
5,000 instr
300 instr
300 instr
300 instr
50,000 instr
5,000 instr
Table 5.1: Summary of the system parameter settings – I (Cache replacement and prefetching policies experiments).
145
clients’ global access pattern. Every MBC is subdivided into 5 MIBC each consisting of a data
segment (containing 10 data pages), a (1, m) index [77], and a CCR.
Our modeled network infrastructure consists of three communication paths: (a) a unidirectional broadband broadcast channel, (b) shared uplink channels from the client to the server,
and (c) shared downlink channels from the server to the client. The network parameters of those
communication paths are modeled after a real system such as Hughes Network System’s DirecPC1 [70]. We set the default broadcast bandwidth to 12 Mbps and the point-to-point bandwidth
to 400 Kbps downstream and to 19.2 Kbps upstream. The point-to-point network is modeled as
a shared FIFO queue and each point-to-point channel is dedicated to 5 mobile clients. Charged
network costs consist of CPU costs for message processing at the client and server, queuing delay,
and transfer time. Processor costs include a fixed and a variable cost component while the latter
depends on the message size. With respect to message latency we experimented with a fixed RTT
end-to-end latency of 300 ms. To exclude problems arising when clients operate in disconnected
mode (e.g. cache invalidation/synchronization), we assume that clients are always tuned to the
broadcast stream and do not suffer from intermittent connectivity. Tables 5.1 and 5.2 summarize
the system parameters used in the study.
5.4.2
Workload Model
To produce data contention in our simulator, we periodically modify a subset of the data objects
maintained at the server by a workload generator that simulates the effects of read-write transactions
being executed at the server. In our system configuration, 20 objects are modified by four fixedsize read-write transactions during the period of a MIBC. Objects read and written by read-write
transactions are modeled using a Zipf distribution [168] with parameter θ = 0.80. The ratio of the
number of write operations versus the number of read operations is fixed at 0.25, i.e., only every fifth
operation issued by the server results in an object modification. Read-only transactions are modeled
as a sequence of 10 to 50 read operations and they are serialized with the other transactions by the
MVCC-SFBS scheme (see Chapter 4). The access probabilities of client read operations follow a
1 DirecPC
is now being called DIRECWAY [71].
146
Parameter
Parameter Value (Sensitivity Range)
2% (1 - 5%) of DBSize
Client object cache size
80% of CCSize
20% of CCSize
LRU
MICP-L
Retained information period
1,000 (200 - 2,000) references
Aging factor α
0.7
Replacement policy control parameter λ
0.01
PCB calculation frequency
5 times per MIBC
1
20% of DBSize
5
5
Bucket size
4,096 bytes
Bucket header size
96 bytes
Index header size
96 bytes
Index record size
12 bytes
OID size
8 bytes
Network Parameters
Broadcast bandwidth
12 Mbps
Downlink bandwidth
400 Kbps
Uplink bandwidth
19.2 Kbps
Fixed network costs
6,000 instr
7 instr/byte
300 ms
2
Table 5.2: Summary of the system parameter settings – II (Cache replacement and prefetching policies
experiments).
147
Workload Parameters
Parameter
Read-write transactions size (number of operations)
25
Read-only transactions size (number of operations)
25 (10 - 50)
0.80
[0.80, 0.95]
Number of updates per MBC
1.0% of DBSize
1
100%
Abort variance
100%
Table 5.3: Summary of the workload parameter settings (Cache replacement and prefetching policies experiments).
Zipf distribution with parameter θ = 0.80 and θ = 0.95. While the θ = 0.95 setting is intended to
stress the system by directing about 90% of all object accesses to 10% of the database, the θ = 0.80
setting models a more realistic medium-contention workload (about 75/25). To account for the
impact on shared resources (point-to-point communication network, server CPU, and magnetic
disks) when clients send fetch requests to the server, we model our hybrid data delivery network in
a multi-user environment that services 10 mobile clients. As previously noted, clients do not run
more than one read-only transaction at a time and they are only allowed to request object versions
from the server if they cannot be retrieved from the air-cache. The latter client behavior is enforced
by setting the uplink usage threshold [6] to 100% which indicates that independent of the current
position of a required object version in the broadcast cycle, it may not be requested from the server.
To control the data access behavior of read-only transactions that were aborted, we use an abort
variance of 100% which means that the restarted transaction reads from a different set of objects.
Table 5.3 summarizes the workload parameters of our simulation study.
5.4.3
Other Replacement Policies Studied
In order to show MICP-L’s performance superiority, we need to compare it with state-of-the-art online cache replacement and prefetching policies. We experimented with LRFU since it is known to
be the best performing page-based cache replacement policy [98, 99]. However, since LRFU does
not use any form of prefetching, comparing MICP-L to LRFU directly would be unfair. Therefore, we decided to incorporate prefetching into the LRFU and denote the resulting algorithm as
148
LRFU-P. In order to treat LRFU-P as fair as possible with respect to data prefetching, we adopt
the prefetching heuristic from MICP-L. That is, we select all newly created object versions along
with the versions of those objects that have been referenced within the last 1,000 data accesses as
prefetch candidates. Out of those candidates, LRFU-P prefetches those versions whose CRF values
are larger than the smallest CRF value of all cached object versions. The rest of the algorithm works
as described in [98, 99].
In addition to comparing MICP-L to LRFU-P, we experimented with the W2 R algorithm [83].
We selected the W2 R scheme for comparison mainly because it is an integrated caching and
prefetching algorithm similar to MICP and performance results have shown [83] that W2 R outperforms caching and/or prefetching policies such as LRU, 2Q [85], and LRU-OBL [144]. However,
since W2 R was designed for conventional page-based database systems, it has to be adapted to the
characteristics of a mobile broadcast-based data delivery environment in order to be competitive
with MICP. Our goal was to re-design W2 R in such a way that its original design targets and structure are still maintained. In the following we refer to the amended version of W2 R by W2 R-B,
where B stands for broadcast-based. Like W2 R, W2 R-B partitions the client cache into two segments, called the Weighing Room and the Waiting Room. While the Weighing Room is managed as
an LRU queue, the Waiting Room is managed as a FIFO queue. In contrast to W2 R, W2 R-B admits
both referenced and prefetched object versions into the Weighing Room. However, W2 R-B grants
admission to the Weighing Room only to newly created object versions, i.e., those listed in CCR
and whose underlying objects have been referenced within the last 1,000 data accesses. The other
modified objects contained in CCR and all the prefetch candidates from the broadcast channel are
kept in the Waiting Room. As before, an object version xi becomes a prefetch candidate if some
version of object x has been recently referenced. With regard to the segment sizes, we experimentally found out that the following settings work well for W2 R-B: the Weighing Room should be
80% of the total available memory size and the remaining 20% should be dedicated to the Waiting
Room.
Last but not least, we experimented with an offline cache replacement algorithm, called P [4], to
present the theoretical bounds on the performance of MICP-L. We have chosen P as an offline policy
149
due to its straightforward implementation as P determines its replacement victims by selecting the
object with the lowest access probability. Since the client access pattern follows a Zipf distribution,
the access probability of each object is known at any point in time. Note that in Zipf distribution,
the probability of accessing the i-th most popular object is pi =
1
iθ ∑Nj=1
1
jθ
, where N is the number of
objects in the database, and θ is the skewness parameter [31]. Like LRFU, P is a “pure” caching
algorithm. Therefore, we had to extend P by incorporating prefetching. To ensure that clients
cache all useful versions of objects with the highest likelihood of access, we added an aggressive
prefetching strategy to P and called the new algorithm P-P. P-P’s relatively simple prefetching
strategy is as follows: a newly created or disseminated object version xi is prefetched from the
broadcast channel, if xi ’s access probability is higher than the lowest probability of all cached object
versions. It should be intuitively clear, that such a policy is suboptimal since it neither considers the
update and caching behavior of the server nor the serial nature of the broadcast channel.
5.4.4
Basic Experimental Results
In the following subsection we compare the performance of MICP-L to that of the online and offline cache replacement and prefetching policies introduced above under the baseline setting of our
simulator. We later vary those parameters in order to observe the changes in the relative performance differences between the policies under different system settings and workload conditions.
We point out that all subsequently presented performance results lie within a 90% confidence level
with a relative error of ±5%. We now study the impact of the read-only transaction size on the
performance metrics when MICP-L and other policies are used for client cache management. Figures 5.3(a) and 5.3(c) show experimental throughput results of increasing the read-only transaction
size from 10 to 50 read operations. A superlinear decrease in the throughput rate is observed when
transaction length is increased. More importantly, and as shown in Figures 5.3(b) and 5.3(d), the
performance penalty of using LRFU-P or W2 R-B in comparison to MICP-L as cache replacement
policy is, on average, 19% and 80%, respectively. Further, but not shown in the graphs for reasons
of visual clarity, the degradation of the throughput performance caused by computing PCB values
periodically (i.e., 5 times per MIBC), rather than every time when replacement victims are se-
150
lected, is insignificant since MICP outperforms MICP-L by only 3% on average. When comparing
the relative performance differences between MICP-L and the other online policies under the 0.80
workload to those under the 0.95 workload, it is interesting to note that the performance advantage
of MICP-L declines when the client access pattern becomes less skewed. The reason is related to
the degradation in the client cache effectiveness experienced when client accesses are more uniform
in nature and due to a weakening in the predictability of the future reference patterns by inspecting
the past reference string. In this situation the impact of the client caching policy on the overall
system performance is smaller, and, therefore, the throughput gap between the investigated online
policies narrows.
When considering the client cache hit rate, defined as the percentage of the object version
requests served by the client cache, we also notice MICP-L’s superiority compared to the other
two investigated online policies (see Figures 5.4(a) and 5.4(b)). On average, MICP-L’s cache hit
rate is 6% and 94% higher than that of LRFU-P and W2 R-B, respectively. At the first glance,
the relatively large performance gap between MICP-L and LRFU-P might be surprising since both
policies select replacement victims (at least partially) based on the objects’ CRF values. Thus, one
would expect cache hit rates of both policies to be fairly close to each other. But since MICP-L
tries to minimize broadcast retrieval latencies by replacing popular object versions that soon reappear on the broadcast channel with other less popular versions, which, if not cached, incur high
re-acquisition costs when requested, MICP-L’s hit rate is expected to be slightly lower than that
of LRFU-P. However, as both MICP-L and LRFU-P incorporate prefetching, the performance gain
from pre-loading objects into the cache is higher for MICP-L since it keeps more object versions in
the client cache that are of potential utility for the active read-only transaction, while LRFU-P, on
the contrary, maintains more up-to-date versions potentially useful for future transactions.
5.4.5
Additional Experiments
This subsection discusses the results of some other experiments conducted to determine how MICPL and its counterparts perform under varying number of retained reference information and number
of versions maintained for each object by the server. As before, we report the results for both the
151
6
Throughput / Second
5
Relative Performance Penality
compared to MICP−L (Percent)
P-P (0.80)
MICP-L (0.80)
LRFU-P (0.80)
W2R-B (0.80)
4
3
2
1
0
10
15
20
25
30
35
40
45
LRFU−P (0.80)
W 2 R−B (0.80)
100
40
20
50
Throughput / Second
6
5
4
3
2
1
0
10
15
20
25
30
35
40
Transaction Length
45
50
10
20
30
40
Transaction Lenght
50
(b) Performance penalty under the 0.80 workload
Relative Performance Penality
compared to MICP−L (Percent)
P-P (0.95)
MICP-L (0.95)
LRFU-P (0.95)
W2R-B (0.95)
7
Transaction Length
8
60
0
(a) Throughput per second under the 0.80
workload
80
100
80
60
40
20
0
LRFU−P (0.95)
W 2 R−B (0.95)
10
20
30
40
Transaction Lenght
50
(c) Throughput per second under the 0.95 (d) Performance penalty under the 0.95 workworkload
load
Figure 5.3: Absolute and relative throughput performance of MICP-L compared to P-P, LRFU-P, and W2 RB under various read-only transaction sizes. Note that the relative performance “penalty” of P-P compared
to MICP-L is not specified here, since P-P outperforms MICP-L.
152
0.55
0.35
0.5
Cache Hit Rate
Cache Hit Rate
0.3
0.25
P-P (0.80)
MICP-L (0.80)
LRFU-P (0.80)
W2R-B (0.80)
0.2
0.15
0.1
0.45
0.4
P-P (0.95)
MICP-L (0.95)
LRFU-P (0.95)
W2R-B (0.95)
0.35
0.3
0.25
10
15
20
25
30
35
40
45
50
Transaction Length
(a) Cache hit rate under the 0.80 workload
0.2
10
15
20
25
30
35
40
45
50
Transaction Length
(b) Cache hit rate under the 0.95 workload
Figure 5.4: Client cache hit rate of MICP-L compared to P-P, LRFU-P, and W2 R-B under various read-only
transaction sizes.
0.80 and the 0.95 workload.
5.4.5.1
Effects of the Version Storage Policy of the Server on the Performance of MICP-L
To study the effect of keeping multiple versions per modified object at the server, we experimented
with varying the version storage strategy of the MOB. As noted before, in order to save installation
reads the server maintains modified objects in the MOB. In the baseline setting of the simulator,
the MOB was organized as a mono-version object cache, i.e., only the most up-to-date versions of
recently modified objects are maintained. We now remove that restriction by allowing the server
to maintain up to 10 versions of each object. However, as the MOB is organized as a FIFO queue
and limited to 20% of the database size, such a high number of versions will only be maintained
for a small portion of the frequently updated database objects. As intuitively expected, the system
performance increases with growing number of non-current object versions maintained at the server.
However, it is interesting to note, that the gain in throughput performance levels off when the server
maintains more than four non-current versions of a recently modified object. Beyond this point,
no significant performance improvement can be achieved by further increasing the version retain
boundary. As shown in Figure 5.5, the performance gap of MICP-L relative to LRFU-P and W2 R-B
narrows when the maximum number of versions maintained for each object increases. For example,
153
for the 0.95 workload, the throughput performance degrades between MICP-L and LRFU-P from
22% to only 3% as the maximum number of versions maintained by the server increases. The reason
is that under a multi-version storage strategy potentially more non-current object version requests
from long-running read-only transactions can be satisfied by the server and, thus, fewer read-only
transactions have to be aborted. Further, it is worth noticing, that the LRFU-P performs slightly
better than the MICP-L if the following two conditions are satisfied: (a) the server does not start
overwriting obsolete object versions until at least two versions of each particular object are stored
in the MOB. (b) The client access pattern is not very skewed in nature (80/ > 20). The reason why
LRFU-P outperforms MICP-L under such a system setting is the inaccuracy of the PCB values on
which MICP-L bases its caching and prefetching decisions.
3
1.6
2.5
1.2
Throughput / Second
Throughput / Second
1.4
1
P-P (0.80)
MICP-L (0.80)
LRFU-P (0.80)
W2R-B (0.80)
0.8
0.6
0.4
P-P (0.95)
MICP-L (0.95)
LRFU-P (0.95)
W2R-B (0.95)
1.5
1
0.5
0.2
0
2
1
2
3
4
5
6
7
8
9
10
Max. Number of Versions in the MOB
0
1
2
3
4
5
6
7
8
9
10
Max. Number of Versions in the MOB
Figure 5.5: Performance of MICP-L compared to its competitors with varying number of non-current versions maintained by the server.
5.4.5.2
Effects of the History Size on the Performance of MICP-L
MICP-L requires historical information on the past reference behavior of the client in order to
make precise predictions about its future data accesses. In order to collect this information, objects’
reference history needs to be maintained in the client memory even after their eviction from the
cache. Since keeping superfluous history information in form of CRF values wastes scarce memory
resources, we wanted to determine a rule of thumb for estimating the amount of reference informa-
154
tion clients need to retain in order to achieve good throughput performance. To this end, we use the
history size/cache size ratio (HCR) defined as
HCR =
Nob j
Smem
(5.7)
where Nob j denotes the total number of objects for which the client retains historical reference
information and Smem represents the client cache size available for storing object versions. As
shown in Figure 5.6, we measured MICP-L’s performance for various HCR and client cache size
combinations. The results show, that MICP-L reaches its performance peak if clients maintain
reference information of all those recently accessed objects that would fit into the client cache
if it were about 5 times larger than its actual size. Beyond that point, i.e., when HCR is larger
than 5, MICP-L’s throughput slightly degrades. The reason for this degradation is related to an
increase in the number of prefetches caused by MICP-L’s prefetching heuristic that allows clients
to download useful object versions into their local caches if their corresponding object has been
referenced within the retained information period. As a result of those additional prefetches, object
versions useful for the active read-only transaction may be replaced by up-to-date object versions
which are of potential use for future transactions. This slightly hurts the cache hit rate, and hence
the throughput performance.
3
Cache Size 1000 (0.80)
2
Throughput / Second
Throughput / Second
2.5
1.5
1
4
3
2
1
0.5
0
5
1
2
3
4
5
6
HCR
7
0
1
2
3
4
5
6
7
HCR
Figure 5.6: Performance of MICP-L under various cache sizes when HCR is varied.
5.5. Conclusion
5.5
155
Conclusion
We have presented the design and implementation of a new integrated cache replacement and
prefetching algorithm called MICP. MICP has been evolved to efficiently support the data requirements of read-only transactions in mobile hybrid data delivery environments. In contrast to many
other cache replacement and prefetching policies, MICP not only relies on future reference probabilities when selecting replacement victims and for prefetching data objects, but additionally uses
information about the content and the structure of the broadcast channel, the data update probability, and the server storage policy. MICP combines those statistical factors into a single metric,
called PCB, in order to provide a common basis for decision making and in order to achieve the
goal of maximizing the transaction throughput of the system. Further, in order to reduce the number of transaction aborts caused by the eviction of useful, but obsolete, object versions from the
server, MICP divides the client cache into two variable-size cache partitions and maintains nonre-cacheable object versions in a dedicated part of the cache, called NON-REC. We evaluated the
performance of MICP experimentally using simulation configurations and workloads observed in
a real system and compared it with the performance of other state-of-the-art online and offline
cache replacement and prefetching algorithms. The obtained results show that the implementable
approximation of MICP, termed MICP-L, improves the throughput rate, on average, by 19% when
compared to LRFU-P, which is the second best performing online algorithm after MICP-L. Further,
our experiments revealed that the performance degradation of MICP-L relative to MICP is a modest
3%.
156
“Theory provides the maps that turn an
uncoordinated set of experiments or computer simulations into a cumulative exploration.”
– David Goldberg
Chapter 6
Processing Read-Write Transactions Efficiently and
Correctly
6.1
Introduction
As current technology trends indicate, two issues pose a great challenge on mobile database management in the future, namely limited battery power and restricted wireless bandwidth capacities.
To contribute to a solution of the problem, we take a software approach and propose a suite of five
new MVCC protocols designed to provide good performance along with strong semantic guarantees
to read-write transactions despite the existence of the aforementioned environmental constraints
and limits in mobile and wireless technology. The family of MVCC algorithms that we present
provides, on the one hand, a range of useful data consistency and currency guarantees to mobile applications that are assumed to access and update data shared among multiple users and, on the other
hand, enables application programmers to trade off data currency for performance. The MVCC-*
suite, where the asterisk (*) is a placeholder for the names of the various protocols, takes account of
the environmental constraints of mobile computing as its underlying protocols were built based on
157
158
Chapter 6. Processing Read-Write Transactions Efficiently and Correctly
the following design objectives: (a) Minimizing the amount of wasted work caused by continuing
to process transactions, despite the fact that they are doomed to abort, and (b) maximizing the degree of transaction concurrency in the system. Clearly, both factors contribute to the general goal
of maximizing overall system performance and thus help to save scarce network bandwidth and
battery power.
In contrast to previous approaches which try to support read-only transactions efficiently and
correctly [101, 123, 124, 138, 139], our protocols tackle the challenge of providing serializability
guarantees to read-write transactions by delivering high QoS levels to mobile users by means of
minimizing transaction response times and reducing the number of false aborts. To provide support
for general-purpose database applications where the inherent semantics have not yet been analyzed
or is simply unavailable, we build our MVCC schemes by just considering the manner in which data
objects are accessed/modified by transactions, i.e., we chose the read/write model for CC purposes.
Another argument in favor of the read-write model is its simplicity and compatibility with the
semantics-based CC model. Since all higher-order transactional operations eventually boil down
to simple read and write operations, our adopted computational model and its underlying schemes
can be utilized as a profound building block for evolving semantics-based CC schemes. As those
approaches have the potential to improve transactional performance even further by exploiting the
semantic knowledge of the applications [11,25,141], objects [95,116,157], database structure, etc.,
we consider them a means to complement our protocols rather than an incompatible alternative.
6.1.1
Motivation
The reason for proposing yet another bunch of CC protocols are the following two observations: (a) Currently available mobile CC protocols are either designed for read-only transactions
only [101, 123, 124, 138, 139] or those that efficiently support read-write transactions do not provide serializability guarantees [121]. (b) Those protocols that stick to the serializability criterion
either use a mono-version scheduler for CC purposes [102] or do not fully exploit the potential
of multi-versioning by keeping only a very limited number of object versions in the system [110].
Furthermore, none of the CC protocols that enforce serializability correctness to read-write trans-
6.1. Introduction
159
actions explicitly specifies the degree of data currency it provides. Note that there exist a number
of conventional MVCC protocols and isolation levels, respectively, that incorporate precise data
currency guarantees in their specifications, such as Forward Consistent View [9], Snapshot Isolation [23], and Oracle’s Read Consistency [117]. None of them, however, ensures serializability to
read-write transactions.
Our protocols eliminate that shortage by providing well-defined data currency guarantees and
by exploiting the scheduling opportunities offered by a multi-version database system with no apriori version restriction while not suffering from its high space overheads. Low storage costs
are achieved by continuously providing clients with the latest CC information on, among other
things, the most recent updates to the common database hosted by the broadcast server through the
broadcast channel and by deploying a judicious garbage collector at the clients, eagerly evicting
useless object versions as soon as identified by the local transaction manager.
6.1.2
Contribution and Outline
In this chapter, we present a suite of protocols designed for managing read-write transactions in
hybrid data delivery environments that differ in terms of performance, degree of data currency, and
space and time complexity. In particular, this chapter makes the following contributions: (a) We
present five new MVCC protocols that all provide serializability guarantees to read-write transactions, prove their correctness, and show their performance results and compare them to the Snapshot
Isolation protocol. (b) We outline the possibilities of extending the protocols in order to reduce the
number of false conflicts that occur due to information grouping and in order to avoid transaction
restarts by applying conflict resolution techniques. Additionally, we quantify their effects on the
overall system performance through simulation studies. (c) We explain why MICP-L as introduced
in the previous chapter should be utilized as client cache replacement and prefetching policy independent of whether read-only or read-write transactions are processed at the clients. This claim is
backed up by performance results demonstrating that MICP-L outperforms LRFU-P which is an
integrated caching and prefetching variant of the LRFU policy [98, 99] known as the best online
cache replacement policy proposed so far.
160
The rest of this chapter is organized as follows: In Section 6.2, the system architecture, design
assumptions, and notational framework are presented. The algorithms underlying the MVCC-*
schemes are incrementally introduced in Section 6.3. We start by motivating for the respective protocol and proceed by presenting their basic algorithm and then gradually refine and extend it in order
to obtain a sophisticated and well-performing solution. Performance-related issues are elaborated in
Section 6.4 and Section 6.5 presents performance results of our schemes and relates them to other
approaches. Additionally, we show the performance potential of MICP-L when used to quickly
satisfy the data requests of mobile applications that perform client-side transaction processing and
updates on shared data objects.
6.2
System Design and Assumptions
In this section, we briefly present the system model underlying the discussion that follows, we give
some basic definitions, identify our major assumptions and state the notational framework of this
chapter. For reasons of efficiency and conformity to our previously presented research studies, we
evolve our the MVCC-* suite, again in the context of a hybrid data delivery environment. Thus, the
reader familiar with the basic concepts of hybrid data delivery and our assumptions on the content
and organization of the broadcast program may omit the following subsection, if wished, or treat it
as an opportunity to refresh its memory.
6.2.1
Data Delivery Model
Although the basic ideas intrinsic to the discussed protocols are not restricted to mobile computing,
the envisaged area of their use is a hybrid data delivery environment. Remember that we define
a hybrid data delivery network as a communications infrastructure that, on the one hand, allows
resource-poor clients to establish point-to-point connections with a resource-rich broadcast server,
and, on the other hand, allows the server to broadcast information to all clients tuned into the
broadcast channel. The reason for choosing a hybrid network as building block for our MVCC-*
suite is fourfold: (a) Combining the pull/unicasting and push/broadcasting modes into a hybrid one
6.2. System Design and Assumptions
161
helps overcome the limitations of each individual method, resulting in a performance improvement
of either basic communication technique. (b) Because hybrid communication networks exist in both
wired and wireless infrastructures, the proposed protocols are applicable for wireless and stationary
networks. (c) The prevalent bandwidth asymmetry often found in cellular or satellite networks
ideally fits the bandwidth requirements demanded by the CC and cache coherency protocols suitable
for mobile environments. Typically, database servers deployed in such environments are stateless
in nature, i.e., they do not keep any state information about their serviced clients due to scalability
problems. As a consequence, the server has neither information about the state of the transactions
processed by clients nor about the contents of their caches. Therefore, the server is forced to transfer
either instantaneously or periodically newly accrued concurrency control information along with a
copy of each newly created object version to the client population. Obviously, those tasks are most
efficiently carried out via the broadcast channel of a hybrid network, whereas data requests for
unpopular data objects and transaction commit messages should be handled by means of point-topoint communication. (d) Last, but not least, as previously presented research work (see Chapters 4
and 5) has been conducted within the context of a hybrid network, it is therefore highly desirable to
use the same network model in this work for comparison purposes as well.
As partially indicated above, the role of the broadcast medium in a transaction-oriented clientserver environment is twofold: (a) to efficiently transmit popular data objects to the client community, and (b) to provide clients or, more precisely, their transaction and cache managers with
concurrency and cache coherency information. The data chosen for data dissemination is summarized within a broadcast program which has the following logical structure: (a) one or multiple
index segments to make the data self-descriptive, (b) one or multiple data segments that contain
popular database objects, and (c) one or multiple CCR segments which include information desirable for transaction validation and cache synchronization. We assume a simplified broadcast
program structured into a number of logical units called MIBC. Each MIBC commences with a
CCR, followed by a (1, m) index [78] which indicates the position of each disseminated object in
the broadcast program, and concludes with a sequence of popular data objects belonging to some
particular data segment. Since broadcast programs are typically large in size and clients should reg-
162
ularly be provided with CC and cache coherency information, we assume that only a subset of data
objects scheduled for broadcasting is disseminated within each MIBC. Hence, an MBC consists
of a number of MIBCs and is completed after each data object being scheduled for broadcasting
has been disseminated once. Figure 6.1 once more depicts the structure of the presumed broadcast
program for reasons of reader convenience.
CCR ID
FirstCCRListEntry
OID
ObjectValue
NextListEntry
TID
ReadSet
WriteSet
ST
FirstWriteSetValueListEntry
NextListEntry
Data Segment1
...
...
...
...
...
...
Index
Segment 4
Minor BC4
CCR
Segment 4
Index
Segment 3
Data Segment2
CCR
Segment 3
CCR
Segment 2
CCR
Segment 1
Index
Segment 1
Data Segment1
Index
Segment 2
Minor BC2
Minor BC3
Minor BC1
Data Segment4
...
...
Figure 6.1: Structure of the broadcast program.
As the diagram illustrates, an entry in the CCR segment contains the following CC related
information of each recently, i.e., in the previous MIBC, committed read-write transaction Ti : (a)
T ID, (b) ReadSet, (c) W riteSet, (d) W riteSetValues, (e) ST where T ID denotes the globally unique
transaction identifier of Ti , ReadSet and W riteSet represents the identifiers of the objects observed
and written, respectively, by Ti , W riteSetValues denotes a list of pairs that maps to each newly
created object version xi by a read-write transaction Ti its associated value v, and ST denotes a set of
transaction identifiers that are serialized after Ti . Note that we do not associate commit timestamps
to read-write transactions contained in a CCR since transactions committed during the same MIBC
all carry the same commit timestamp being equal to the number of the MIBC when they committed.
This simplification helps us to easy the presentation of our protocols. Further note that CCRs are
not primarily required for protocol correctness, but merely for protocol efficiency since they provide
the following benefits to mobile users: (a) CCRs help identify useless object versions, thus ensuring
that the effective client cache size is not lowered by unnecessarily maintaining those versions; (b)
6.2. System Design and Assumptions
163
CCRs provide scalability since a major part of transaction processing and validation work can
be offloaded from the server to the clients; (c) CCRs reduce wasted work since they provide CC
information to clients allowing them to recognize those transactions doomed to abort; (d) CCRs
keep client caches up-to-date, eliminating the risk of aborts due to stale object observations.
6.2.2
Database and Transaction Model
In our system model the database D = {x1 , x2 , . . . , xi } consists of a set of uniquely identified objects,
where i denotes the number of data objects in the database and each object has one or more versions
being totally ordered w.r.t. one another (with the ordering relation ) according to the commit
times of the transactions that created the versions. The state of the database is modified by readwrite transactions initiated by mobile applications. A transaction is a partial order of operations on
objects of the database (see Definition 9 for its formal definition). To keep our notations consistent
with those used in previous chapters, a read operation of a transaction Ti on object x is denoted
by ri [xk ], where the subscript k is the identifier of the transaction that installed the version xk , i.e.,
object versions are denoted by the index of the transaction that created the respective version. Write
operations of a transactions Ti are denoted by wi [xi ], and bi , ai , and ci specify the begin, abort, and
commit operation of transaction Ti , respectively. The read set of transaction Ti is the set of data
objects that Ti reads and is denoted by ReadSet(Ti ). Likewise, the write set of transaction Ti is
the set of data objects that Ti writes and is denoted by W riteSet(Ti ). To argue about the timing
relationships among transactions, the transaction scheduler records of any read-write transaction Ti
its start timestamp, denoted by ST S(Ti ), and its commit timestamp, denoted by CT S(Ti ).
We assume that read-write transactions do not issue blind writes, i.e., if a transaction writes
some object version, it is assumed to have read some previous version of the object before. We
make the latter assumption for two reasons: (a) Blind writes are infrequent in applications and (b)
accommodating them in our model would complicate the algorithms and correctness proofs of our
protocols significantly. We also assume that the same data object is not modified more than once
during a transaction’s lifetime and any mobile client runs at most one read-write transaction at a
time.
164
Central to any CC scheme is the notion of conflicts. Since we have opted for the read-write
model to perform concurrency control, three kinds of direct conflicts between any pair of transactions may occur, namely write-read, write-write, and read-write dependencies. The definitions
for those conflicts which are essential for the descriptions and proofs of our protocols have already
been given in Section 4.2 of this thesis and are once more summarized in Table 6.1.
Conflict / Dependency
Conflict Definition
direct write-read dependency
A transaction T j δ -depends on a transaction Ti , if Ti
installs some object version xi and T j reads the created
version, i.e., wi [xi ] <MV H r j [xi ].
wr
wr
(denoted δ or wr)
rw
direct read-write dependency
rw
(denoted δ or rw)
A transaction T j δ -depends on a transaction
Ti , if Ti reads some object version xk and
T j installs the successor version of xk , i.e.,
wk [xk ] <MV H ri [xk ] <MV H w j [x j ] ∧ xk x j .
ww
direct write-write dependency
ww
(denoted δ or ww)
A transaction T j δ -depends on a transaction Ti , if Ti
installs some object version xi and T j installs the successor version of xk , i.e., wi [xi ] <MV H w j [x j ] ∧ xi x j .
Table 6.1: Definitions of possible conflicts between transactions.
We have selected a conflict detection-based or optimistic approach [96] for concurrency control since it frees clients from performing lock acquisitions and exempts the server from carrying
out deadlock detection. Another argument in favor of optimistic concurrency control (OCC) is
the fact that during periods of disconnection transaction processing will not be interrupted as it is
the case when pessimistic schemes are applied. However, OCC protocols force transactions to be
validated or certified at their commit points in order to preserve serializability. Optionally, transactions may be pre-validated at earlier stages in order to identify transactions that are destined to
abort. Independent of the point in time at which an active read-write transaction Ti is validated, the
notion of conflict-based serializability requires that Ti must be certified against the set of already
committed transactions Tactive = {T1 , T2 , . . . , Tn } that were active during Ti ’s execution time, i.e.,
∀T j ∈ Tactive (Ti ) : bi <MV H c j <MV H ci . Depending on whether any of Ti ’s data operations (either
directly or indirectly) conflicts with any of T j ’s data operations (T j ∈ Tactive (Ti )) and the type of
the data dependency between the two transactions, T j can be classified w.r.t. Ti into one of the
following categories: (a) (in)dependent preceding transactions (PT), (b) (in)dependent succeeding
6.3. A New Suite of MVCC Protocols
165
transactions (ST), and (c) conflicting transactions (CT). If T j belongs to PT (Ti ), it precedes Ti in
any valid serialization order and the opposite is true if T j is assigned to ST (Ti ). In case T j can be
assigned neither to PT (Ti ) nor to ST (Ti ), it automatically belongs to CT (Ti ). The rules for carrying
out the classification into those categories are protocol-specific and will be presented later in the
chapter. Additional notations will be added upon demand.
6.3
A New Suite of MVCC Protocols
In this section, we describe three (new) MVCC protocols, namely MVCC-BOT, MVCC-IBOT, and
MVCC-EOT, followed by optimizations of the first two schemes and their correctness proofs. Presentation of the MVCC-BOT and MVCC-IBOT protocols will start out by outlining a basic, not yet
very efficient design, and will then be gradually refined and extended until a sophisticated solution
is obtained. We opt for such an approach since it allows us to provide correctness arguments in an
incremental and concise manner.
6.3.1
MVCC-BOT Scheme
Performance studies conducted in [124] and by ourselves (see Chapter 4) have shown that MVCC
protocols that provide begin-of-transaction (BOT) data currency guarantees to read-only transactions such as the multi-version caching method or the MVCC-BS protocol outperform those with
EOT data currency guarantees such as the invalidation-only method [124]. The potential of improving the overall system performance by providing BOT data currency to read-write transactions
was first recognized by [23] and the property has been incorporated into the Snapshot Isolation (SI)
protocol. Contrary to many previously proposed CC protocols, SI is implemented in commercial
and non-commercial products such as Oracle 10g (referred to as the Serializable isolation level in
Oracle [82, 117]), SQL Server 2005 [152], and PostgreSQL [57]. SI’s popularity in practice arises
from its two fundamental properties: (a) Non-current object reads do not cause transaction restarts,
and (b) the validation of an active read-write transaction Ti is restricted to an intersection test of
Ti ’s write set with the write sets of Tactive (Ti ); as read-write transactions typically perform much
166
fewer write than read operations, the probability that Ti can be successfully validated is relatively
high. Despite SI’s performance attractiveness, it is not an option for mission-critical applications
since it may leave the database in a corrupted state due to its inability to prevent the “Write Skew”
phenomenon [23]. A protocol that rectifies SI’s consistency problems without losing all of its benefits is called MVCC-BOT. Contrary to SI, MVCC-BOT guarantees serializability to read-write
transactions, however, ensuring SI’s BOT data currency. Providing BOT data currency guarantees
to mobile applications is attractive in at least two respects: (a) Transactions can be provided with
useful data currency guarantees that cannot be ensured by serializability alone. For example, if
a meteorologist needs temperatures measured by all weather stations around New York as of the
current moment, he/she could run a query requesting a respective snapshot, and the system would
return the values valid as of the transaction’s starting point, i.e., the meteorologist would not observe any temperature changes after the transaction initiated its processing, which actually might
be undesirable for statistical reasons. (b) Transactions are allowed to observe stale object versions
which may result in fewer transaction restarts compared to conventional mono-version protocols
(which oblige transactions always to observe the most up-to-date object version) as the following
example shows:
Example 6.
MV H4 = b1 b2 b3 r1 [x0 ] r2 [x0 ] r2 [y0 ] r3 [y0 ] r3 [z0 ] w1 [x1 ] r3 [x1 ] w3 [z3 ] w2 [y2 ] c1 c2 a3
[x0 x1 , y0 y2 , z0 z3 ]
MV H5 = b1 b2 b3 r1 [x0 ] r2 [x0 ] r2 [y0 ] r3 [y0 ] r3 [z0 ] w1 [x1 ] r3 [x0 ] w3 [z3 ] w2 [y2 ] c1 c2 c3
[x0 x1 , y0 y2 , z0 z3 ]
History MV H4 differs from history MV H5 in that the former was produced by a conventional monoversion scheduler, while the latter was generated by a multi-version scheduler enforcing BOT data
currency. In history MV H4 transaction T3 was aborted since otherwise the multi-version serialization graph of MV H4 would not have been acyclic anymore (see Figure 6.2(a)). Conversely,
in history MV H5 all three transactions T1 , T2 , and T3 terminate successfully because they can be
serialized into the order T3 < T2 < T1 (see Figure 6.2(b)).
167
w T0 wr,ww
wr,w
wr
T1
T3
wr,
ww
rw
rw
w T0 wr,ww
wr,w
wr,
T1
T3
ww
(a) MVSG(MVH4 )
(b) MVSG(MVH5 )
T2
rw
T2
rw
Figure 6.2: Multi-version serialization graph of MVH4 and MVH5 .
After we have seen that MVCC-BOT incorporates attractive properties that are of much practical use to mobile application and database users, we start deepening into the protocol’s details. To
avoid wasted work by processing transactions that are destined to abort, MVCC-BOT validates any
read-write transaction Ti not only at its commit point, but also during its execution time, namely
whenever it issues a new data operation or its processing client receives a new CCR. To do so,
MVCC-BOT (like the other schemes of the MVCC-* suite) uses the backward validation technique [62]. Since MVCC-BOT ensures BOT data currency guarantees, the validation algorithm
is straightforward to implement. To provide serializability, a validating read-write transaction Ti
needs to be checked (only) against all those transactions that are in Tactive (Ti ), i.e., such transactions
that committed during Ti ’s execution time. MVCC-BOT successfully validates an active read-write
transaction Ti against a read-write transaction T j ∈ Tactive (Ti ), if the following condition is satisfied:
Serializability Condition 1. Ti must not (either directly or indirectly) rw-depend on T j , i.e., Ti must
not overwrite an object version (either directly or indirectly) read by T j .
In order to ensure BOT data currency guarantees and to detect violations of the serializability criterion instantly, the MVCC-BOT scheduler operates as follows:
168
1. Start Rule: As soon as Ti issues its first read operation, the identifier of the current
MIBC is stored into ST S(Ti ).
2. Read Rule: A read operation ri [x] is processed as follows:
(a) A read operation ri [x] is transformed into ri [xk ], where xk is the latest committed
version of x that was created by a read-write transaction Tk such that
CT S(Tk ) < ST S(Ti ).
(b) If there exists a write operation w j [x j ] in MVH such that T j ∈ Tactive (Ti ) and T j
rw-depends on Ti , i.e., it has overwritten the object version xk observed by Ti
(xk x j ), then the read operation is rejected and Ti is aborted.
(c) To record the fact that Tk precedes Ti in any valid one-copy serializable history,
the scheduler inserts Tk into PT (Ti ).
3. Write Rule: A write operation wi [x] is processed as follows:
(a) If there exists a read operation r j [xk ] in MVH and
i. T j ∈ ST (Ti ) or
ii. T j (either directly or indirectly) wr- or rw-depends on some read-write
transaction Tk ∈ ST (Ti ),
then wi [x] is rejected and Ti is aborted.
(b) Otherwise, wi [x] is transformed into wi [xi ] and executed.
Algorithm 6.1: MVCC-BOT’s scheduling algorithm.
CCRs enable clients (among other things) to pre-validate active read-write transactions without
contacting the server. MVCC-BOT uses Algorithm 6.2 to validate an active read-write transaction
Ti and to update its associated data structure (ST (Ti )) as shown below:
1
2
3
4
5
6
7
begin
foreach T j in CCR do
if Serializability Condition 1 holds then
if T j rw-depends on Ti then
insert T j along with ST (T j ) into ST (Ti );
if ST (Ti ) ∩ PT (Ti ) 6= 0/ then
abort Ti
else
abort Ti
8
9
10
end
Algorithm 6.2: CCR processing and transaction validation under MVCC-BOT.
169
In case Ti fails the validation check, it will be aborted and subsequently restarted. Otherwise,
the execution of Ti proceeds. Once Ti issues a commit primitive, i.e., all of its data operations
were successfully executed by the client, it automatically pre-commits without any further prevalidation and a final validation messages, denoted FV M(Ti ), is send to the server. A FV M(Ti ) is
a 5-tuple (ReadSet(Ti ),W riteSet(Ti ), PT (Ti ), ST (Ti ), T SC (CCRlast )), where T SC (CCRlast ) denotes
the timestamp of the last CCR that has been successfully processed by client C; in case the client
was disconnected for some time during Ti ’s execution and, therefore, might have missed one or
more CCRs, then C sends the timestamp of the largest CCR up to which C had received a complete
sequence of CCRs. Note that the client does not need to attach Ti ’s start timestamp to FV M(Ti ) as
such information is only required for the local scheduler to transform object read operations into
consistent object version reads.
When the server receives the validation information of a committing transaction Ti it runs Algorithm 6.2 for all read-write transactions T j where CT S(T j ) > T Sc (CCRlast ). If Ti ’s final validation
succeeds, Ti ’s updates are applied to the central database hosted by the broadcast server and a respective notification is send to the client through the point-to-point channel. Otherwise, Ti aborts
and its restart is implicitly initiated through the abort message. To conclude, we show that MVCCBOT produces only serializable histories that provide BOT data currency guarantees to read-write
transactions.
Theorem 8. Every history generated by MVCC-BOT is serializable and any read-write transaction
Ti sees a database state as it existed at the beginning of the MIBC when Ti started executing.
Proof. The proof consists of two parts. In the first part we show that MVCC-BOT ensures BOT
data currency. Thereafter, we proof by contradiction that MVCC-BOT produces only serializable
histories. We do so by first considering the special case that MVSG(MVH) contains a cycle involving only two transactions and then turn to the more general case that the cycle is formed by three or
even more transactions.
Part A: Let Ti denote a read-write transaction, ST S(Ti ) the logical time when Ti started its execution, and DSST S(Ti ) the database state as it existed at Ti ’s starting point. We will show that all data
objects read by Ti belong to DSST S(Ti ) and, therefore, MVCC-BOT ensures BOT data currency guar-
170
antees. MVCC-BOT’s read rule (i.e., Point (2a)) enforces that Ti always reads the latest committed
objects versions that were created by read-write transactions with commit timestamps < ST S(Ti ).
Those object versions are, according to the MVCC-BOT’s start rule, the most up-to-date ones as of
the beginning of the MIBC when Ti started its execution, i.e., they belong to DSST S(Ti ) .
Part B: Let MVH denote any serializable multi-version history and MVSG(MVH) its multiversion serialization graph. Because MVH is serializable, its MVSG(MVH) is acyclic. Suppose,
by way of contradiction, that MVSG(MVH) contains a cycle hTi → T j0 → . . . → T jn → Ti i, where Ti
and T jn with n ≥ 0 denote read-write transactions that have been processed under the MVCC-BOT
scheme.
(1) Cycle consists of exactly two transactions: To start with, let us assume that the cycle consists
of two transactions only, namely Ti and T j0 , i.e., n = 0. Then, one of the following dependency
relationships between Ti and T j0 must hold: (a) Ti and T j0 wr-depend on each other, (b) Ti and
T j0 rw-depend on each other, or (c) Ti wr-depends on T j0 and T j0 rw-depends on Ti or vice versa.
(a) Suppose that Ti wr-depends on T j0 and as MVCC-BOT ensures BOT data currency guarantees, it follows that the ordering relation c j0 <MV H bi holds. Because T j0 wr-depends on
Ti , it also follows that the ordering relation ci <MV H b j0 holds too. This, however, leads
to a contradiction since, according to Points (3) and (4) of Definition 9, a transaction can
only commit after it has been initiated. Thus, MVSG(MVH) cannot contain the cycle
wr
wr
hTi δ T j0 δ Ti i when the MVCC-BOT scheme is used.
(b) Suppose that Ti rw-depends on T j0 , ci precedes c j0 in MVH, and T j0 does not rw-depend
on Ti by the time the former gets to know about Ti ’s commit. Then, it follows that Ti
is an element of ST (T j0 ) according to Algorithm 6.2. Now suppose at a later stage T j0
overwrites some object version observed by Ti which implies that T j0 now rw-depends on Ti .
Then, however, MVCC-BOT’s write rule (i.e., Point (3(a)i)) would be violated if this write
operation were not rejected. Suppose otherwise that T j0 had already rw-depended on Ti by
the time T j0 was validated against Ti . Then, Serializability Condition 1 of Algorithm 6.2
would be violated if T j0 were not aborted. Alternatively, suppose that Ti rw-depends on T j0
171
and that c j0 precedes ci in MVH. Then, Serializability Condition 1 of Algorithm 6.2 would
rw
rw
be violated if Ti were not aborted. Hence, the cycle hTi δ T j0 δ Ti i cannot be produced
under the MVCC-BOT scheme.
(c) Suppose finally that Ti wr-depends on T j0 . Then, it follows that c j0 precedes bi . Now,
however, T j0 may not rw-depend on Ti since otherwise MVCC-BOT’s BOT data currency
property would be violated. Using the same line of reasoning, the dependencies Ti rwdepends on T j0 and T j0 wr-depends on Ti cannot co-exist without that the MVCC-BOT
rw
wr
wr
rw
scheme is violated. Thus, the cycles hTi δ T j0 δ Ti i and hTi δ T j0 δ Ti i cannot occur in
MVSG(MVH) and, therefore, it may not contain a cycle involving exactly two transactions.
(2) Cycle consists of three or more transactions: In the more complex case, the cycle may involve
three or more read-write transactions. Irrespective of how many transactions form the cycle, it
must have an edge Ti → T j0 , where T j0 is a read-write transaction that either wrote the successor
version of an object read by Ti , i.e., T j0 rw-depends on Ti , or it observed an object version
written by Ti , i.e., T j0 wr-depends on Ti .
(a) Now let us initially assume that T j0 wr-depends on Ti which implies that ci <MV H b j0 and
Ti belongs to PT (T j0 ). Suppose also that T jn (either directly or indirectly) wr- and/or rwdepends on T j0 .
I) If T jn wr-depends on T j0 , the ordering relation c j0 <MV H b jn holds and, therefore,
ci <MV H b jn holds too. Now suppose that Ti rw- or wr-depends on T jn which implies that the ordering relations b jn <MV H ci and c jn <MV H bi hold, respectively. This,
however, leads to a contradiction since, according to Points (3) and (4) of Definition 9, a transaction may only commit after it has been initiated and it may comwr
wr
wr
wr
mit only once during its lifetime. Thus, the cycles hTi δ T j0 δ . . . δ T jn δ Ti i and
wr
wr
wr
rw
hTi δ T j0 δ . . . δ T jn δ Ti i cannot be formed in MVSG(MVH).
II) However, the cycle may be produced if T jn rw-depends on T j0 which implies that
b j0 precedes c jn and, thus, ci <MV H c jn holds. Because ci precedes c jn in MVH, Ti
cannot wr-depend on T jn and, therefore, may only rw-depend on it. Since the ordering
172
relations b jn <MV H b j0 and b j0 <MV H c jn hold, it follows that transactions T j0 and T jn
are executed in parallel with each other and, therefore, can commit in arbitrary order.
? Now suppose that T j0 commits before T jn and that T jn does not rw-depend on
T j0 by the time the former gets to know about T j0 ’s commit. Now suppose at
a later point in time T jn overwrites some object version observed by T j0 which
implies that T jn now rw-depends on T j0 . Then, however, MVCC-BOT’s write
rule (i.e., Point (3(a)ii)) would be violated if this write operation were not rejected. Suppose otherwise that T jn had already rw-depended on T j0 by the
time T jn was validated against T j0 . Then, however, Serializability Condition 1
of Algorithm 6.2 would be violated if T jn were not aborted. Thus, the cycle
wr
rw
rw
rw
hTi δ T j0 δ . . . δ T jn δ Ti i cannot be created in MVSG(MVH) when T j0 commits
before T jn .
? Suppose, on the contrary, that T j0 commits after T jn and that T jn does not rwdepend on T j0 by the time the latter gets informed about T jn ’s commit. Suppose further that at some later point in time T j0 observes some object version
whose successor version was installed by T jn , thus T jn rw-depends on T j0 . Then,
however, MVCC-BOT’s read rule (i.e., Point (2b)) would be violated if the
read operation were not rejected. Suppose, alternatively, that T jn had already
rw-depended on T j0 when T j0 was validated against T jn . Then, the condition
ST (T j0 ) ∩ PT (T j0 ) 6= 0/ tested at line 6 of Algorithm 6.2 would be violated if T j0
were not aborted. The reason is as follows: Since Ti commits before T jn and Ti rwdepends on T jn , Ti is an element of ST (T jn ). Since T jn rw-depends on T j0 , T jn and
Ti are included in ST (T j0 ) according to Algorithm 6.2. Because T j0 wr-depends
on Ti , it follows that Ti is a member of PT (T j0 ) according to MVCC-BOT’s read
rule. Obviously, the condition ST (T j0 ) ∩ PT (T j0 ) = 0/ is not satisfied any more
and, therefore, T j0 would be aborted by MVCC-BOT. Consequently, the cycle
wr
rw
rw
rw
hTi δ T j0 δ . . . δ T jn δ Ti i cannot be created when T j0 commits after T jn .
(b) What remains to be shown is that the cycle cannot be created when T j0 rw-depends on Ti .
173
If such a dependency exists, bi precedes c j0 in MVH.
I) Now suppose that T jn (either directly or indirectly) wr-depends on T j0 which implies that c j0 occurs before b jn and as the relation <MV H is transitive, the relation
bi <MV H b jn holds too.
i) Now suppose that Ti wr-depends on T jn which implies that c jn occurs before bi .
Since <MV H is transitive, it follows that bi <MV H b jn . Because b jn <MV H c jn and
c jn <MV H bi hold, it follows that bi <MV H b jn <MV H bi holds too, leading to a contradiction since a transaction may start only once during its lifetime. As a result, the
rw
wr
wr
wr
cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot occur in MVSG(MVH).
ii) Therefore, Ti may only rw-depend on T jn which implies that b jn precedes ci . Since
the ordering relations bi <MV H b jn and b jn <MV H ci hold, Ti and T jn are concurrent to
each other and, therefore, can commit in arbitrary order.
? Initially suppose that T jn commits before Ti and that Ti does not rw-depend on T jn
by the time the former gets to know about T jn ’s commit. Suppose further that at
some later point in time Ti overwrites some object version observed by T jn , thus Ti
rw-depends on T jn . Then, however, MVCC-BOT’s write rule (i.e., Point (3(a)ii))
would be violated if the write operation were not rejected. Suppose, alternatively,
that Ti had already rw-depended on T jn when Ti was validated against T jn . In this
case Serializability Condition 1 of Algorithm 6.2 would be violated if T jn were not
rw
wr
wr
rw
aborted. That is, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot occur in MVSG(MVH)
when T jn commits before Ti .
? Alternatively, suppose that T jn commits after Ti and that Ti does not rw-depend on
T jn by the time the latter gets to know about Ti ’s commit. Suppose further that at
some later point in time T jn observes some object version whose successor version
was installed by Ti , thus Ti rw-depends on T jn . Then, however, MVCC-BOT’s read
rule (i.e., Point (2b)) would be violated if the read operation were not rejected. Suppose, alternatively, that Ti had already rw-depended on T jn when T jn was validated
against Ti . Given those facts, the condition ST (T jn ) ∩ PT (T jn ) = 0/ at line 6 would
174
rw
wr
wr
rw
be violated if T jn were not aborted. Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i
cannot occur in MVSG(MVH) even if T jn commits after Ti .
II) As another alternative to produce a cycle suppose that T jn (either directly or indirectly)
rw-depends on T j0 . Then, Ti may wr- and/or rw-depend on T jn .
i) Suppose Ti wr-depends on T jn which implies that T jn committed before Ti ’s starting
point and T jn is an element of PT (Ti ). Since the relations bi <MV H c j0 and b j0 <MV H ci
hold, T j0 and Ti are concurrent to each other and, therefore, can commit in arbitrary
order.
? Suppose that Ti commits before T j0 and that T j0 does not rw-depend on Ti by the
time the former gets to know about Ti ’s commit. Additionally, suppose that at some
later point in time T j0 overwrites some object version observed by Ti , thus T j0 rwdepends on Ti . Then, however, MVCC-BOT’s write rule (i.e., Point (3(a)ii)) would
be violated if the write operation were not rejected. Suppose, alternatively, that
T j0 had already rw-depended on Ti when T j0 was validated against Ti . Then, however, Serializability Condition 1 of Algorithm 6.2 would be violated if T j0 were not
rw
rw
rw
wr
if Ti commits before T j0 .
? Suppose, on the contrary, that Ti commits after T j0 and that T j0 does not rw-depend
on Ti by the time the latter gets to know about T j0 ’s commit. Now suppose that at
some later point in time Ti observes some object version whose successor version
was installed by T j0 , thus T j0 rw-depends on Ti . Then, however, MVCC-BOT’s
read rule (i.e., Point (2b)) would be violated if the read operation were not rejected.
Suppose, on the contrary, that T j0 had already rw-depended on Ti when the latter
was validated against T j0 . This, however, leads to a violation of Algorithm 6.2 since
the intersection of ST (Ti ) and PT (Ti ) returns a non-empty result set and, therefore,
rw
rw
rw
wr
Ti cannot commit. Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot be created in
MVSG(MVH) even if Ti commits after T j0 .
ii) Suppose as a final means for the cycle to be formed that Ti rw-depends on T jn
175
which implies that the ordering relation b jn <MV H ci holds. Because the ordering
relations bi <MV H c j0 , b j0 <MV H c jn , and b jn <MV H ci hold, transactions Ti , T j0 , and
T jn are executed concurrently and thus, can commit in arbitrary order. Without loss
of generality, assume that the commit order is ci <MV H c j0 <MV H c jn and T jn does not
rw-depend on T j0 by the time T jn gets to know about T j0 ’s commit. Now suppose that
at some later point in time T jn overwrites some object version observed by T j0 , thus T jn
now rw-depends on T j0 . Then, however, MVCC-BOT’s write rule (i.e., Point (3(a)ii))
that T jn had already rw-depended on T j0 when T jn was validated against T j0 . Then,
however, Serializability Condition 1 of Algorithm 6.2 would be violated if T jn were
rw
rw
rw
rw
not aborted. Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot occur in MVSG(MVH)
and it can be concluded that MVCC-BOT produces only serializable histories.
6.3.2
Optimizing the MVCC-BOT Scheme
To keep the algorithm simple and to focus on its basic principles, we left out such issues as optimization techniques that aim at extending the transaction manager’s scheduling flexibility, thereby
improving the overall system performance since false transaction aborts are avoided. As an representative example providing a better understanding of the improvement potential, consider the
following history produced by a scheduler that ensures BOT data currency guarantees:
Example 7.
MV H6 = bc1 b1 r1 [x0 ] bc2 r1 [y0 ] b2 r2 [x0 ] bc3 r2 [z0 ] bc4 w2 [z2 ] c2 w1 [x1 ] c1 [x0 x1 , z0 z2 ]
Note that compared to previous histories MV H6 has been enriched with logical time information.
Here each occurrence of bcn denotes the beginning of a new MIBC, where n stands for the nondecreasing identifier of the respective cycle. Note further that the scheduling scenario illustrated
in MV H6 would not be allowed by MVCC-BOT since Serializability Condition 1 is violated for
176
transaction T1 . However, as MVSG(MV H6 ) is acyclic (see Figure 6.3) and both transactions meet
the BOT data currency criterion, MV H6 should be accepted by the scheduler.
T1
w T0 wr,ww
wr,w
rw
T2
If one examines MV H6 more closely is soon becomes obvious where MVCC-BOT’s validation
inefficiently is related to. When validating a read-write transaction Ti , MVCC-BOT assumes that
each transaction T j ∈ Tactive (Ti ) has to be serialized after Ti . However, there may exist situations
where T j can be safely serialized before Ti . Informally speaking, this is the case if a transaction
T j ∈ Tactive (Ti ) has read from the same or even an earlier database snapshot than Ti and has not
written any object such that T j (either directly or indirectly) rw-depends on Ti . In case T j ∈ Tactive (Ti )
has read some object version xk such that CT S(Tk ) ≥ ST S(Ti ), i.e., T j has observed a latter database
snapshot than Ti , then T j must neither (directly or indirectly) rw-depend on Ti nor (directly or
indirectly) wr- or rw-depends on some transaction Tk that itself (directly or indirectly) wr- or rwdepends on Ti (in order to be serialized before Ti ).
A protocol that avoids spurious transaction aborts such as illustrated in history MV H6 by using
a more sophisticated transaction validation algorithm than MVCC-BOT is called MVCC-BOT O and
a description of its peculiarities follows. MVCC-BOTO optimizes the MVCC-BOT scheme by
checking for each read-write transaction T j ∈ Tactive (Ti ) whether it can be serialized before or after
Ti or whether Ti needs to be aborted since a serialization conflict has been identified. It does so, by
using Serializability Conditions 1, 2, 3, and 4 with the latter three being specified below:
Serializability Condition 2. T j must not (either directly or indirectly) rw-depend on Ti , i.e., T j
must not overwrite an object version (either directly or indirectly) read by Ti .
Serializability Condition 3. T j must not (either directly or indirectly) wr- or rw-depend on some
read-write transaction Tk ∈ ST (Ti ), i.e., it must not (either directly or indirectly) observe the effects
of Tk or overwrite an object version (either directly or indirectly) observed by Tk .
177
Serializability Condition 4. Ti must not (either directly or indirectly) wr- or rw-depend on some
read-write transaction Tk ∈ ST (T j ), i.e., it must not (either directly or indirectly) observe the effects
of Tk or overwrite an object version (either directly or indirectly) observed by Tk .
Classifying a read-write transaction T j ∈ Tactive (Ti ) as belonging into PT (Ti ) or ST (Ti ) is handled by MVCC-BOTO ’s CCR processing algorithm as illustrated below:
1
2
3
4
5
6
7
8
9
10
begin
if Serializability Conditions 2 and 3 hold then
insert T j into PT (Ti )
else
if Serializability Conditions 1 and 4 are satisfied then
insert T j into ST (Ti )
else
abort Ti
end
Algorithm 6.3: CCR processing and transaction validation under MVCC-BOTO .
In addition to modifying MVCC-BOT’s CCR processing algorithm, its scheduling algorithm
must be adapted to handle the fact that assigning validated transactions into PT (Ti ) is only preliminary until Ti has executed its last read operation. By the time a read-write transaction T j ∈ Tactive (Ti )
is assigned into PT (Ti ), the scheduler’s decision is based on information on Ti ’s current read set. As
Ti ’s read set may subsequently become larger, previous assignments of validated transactions into
PT (Ti ) might not anymore be valid under the changed conditions. Clearly, a read-write transaction
T j needs to be removed from PT (Ti ) whenever Serializability Condition 2 is violated. In that case T j
cannot any more be serialized before Ti , i.e., the only possible serialization order would be Ti < T j .
However, Serializability Condition 4 has to hold then. Determining whether some read-write transaction T j ∈ PT (Ti ) cannot be serialized before Ti and thus needs to be removed from PT (Ti ) is best
institutionalized in MVCC-BOTO ’s scheduling algorithm which is depicted below:
178
1. Start Rule: See Algorithm 6.1.
(a) A read operation ri [x] is transformed into ri [xk ], where xk is the latest committed
version of x that was created by a read-write transaction Tk such that
CT S(Tk ) < ST S(Ti ).
(b) To record the information that Tk precedes Ti in any serial history, the scheduler
inserts Tk into PT (Ti ).
(c) Additionally, a transaction T j ∈ PT (Ti ) that (now) rw-depends on Ti together with
any transaction Tk ∈ PT (Ti ) that (either directly or indirectly) wr- or rw-depends
on T j is moved into ST (Ti ):
i. If xk has been overwritten by T j , i.e., Serializability Condition 2 is violated,
and Ti does not wr- or rw-depend on any transaction Tk ∈ ST (T j ), i.e.,
Serializability Condition 4 holds.
ii. Otherwise, T j belongs to CT (Ti ) and Ti needs to be aborted.
3. Write Rule: See Algorithm 6.1.
Algorithm 6.4: MVCC-BOTO ’s scheduling algorithm.
Upon issuing the last read operation by Ti , PT (Ti ) won’t change any more in the sense that
no read-write transaction T j has been erroneously assigned into PT (Ti ) and, therefore, needs to be
removed. Between the time of Ti ’s pre-commit at the client and the time of its final validation at
the server, additional transactions typically commit and, therefore, need to be validated against Ti .
In order to efficiently determine which of those transactions belong to PT (Ti ), ST (Ti ), and CT (Ti ),
the server requires information on the contents of ST (Ti ) as of Ti ’s pre-commit time. Therefore,
and analogous to the MVCC-BOT scheme, MVCC-BOTO should piggyback the identifiers of all
transactions recorded in ST (Ti ) on Ti ’s final validation message. It is important to note that clients
provide pre-committed transactions’ ST sets not for protocol correctness purposes, but merely to
reduce the CPU overhead incurred by their validation at the server. As before, we conclude this
subsection by showing that MVCC-BOTO produces only serializable histories that provide BOT
data currency guarantees to read-write transactions.
Theorem 9. MVCC-BOTO produces only correct histories in the sense that they are serializable
and any object version read by a committed read-write transaction Ti in MVH has been up-to-date
179
at the beginning of the MIBC when Ti started its execution.
Proof. In this proof we only need to show that MVCC-BOTO generates serializable histories since
both the MVCC-BOT and MVCC-BOTO schemes apply the same version function (read rule) for
mapping data requests to specific object versions and as has been shown in Part A of the proof of
Theorem 8, MVCC-BOT ensures BOT data currency guarantees to read-write transactions. Again
let MVH denote any serializable multi-version history and let MVSG(MVH) be its corresponding
multi-version serialization graph. Suppose, by way of contradiction, that MVSG(MVH) contains a
cycle hTi → T j0 → . . . → T jn → Ti i, where Ti and T jn with n ≥ 0 are read-write transactions that have
been executed according to the MVCC-BOTO scheme.
Without loss of generality, possible cycles can be grouped into two classes: (1) those that involve only two transactions, and (2) others that contain three or more transactions. Since cycles
consisting of only two transactions may not occur in the MVCC-BOTO scheme for similar or even
the same reasons as specified in the second paragraph of Part B of the proof given for Theorem 8,
we only need to show that cycles consisting of three or more transactions may not be possible
either.
(a) To do so, let us initially assume that (in the cycle) T j0 wr-depends on Ti which implies that
ci <MV H b j0 and Ti belongs to PT (T j0 ).
I) Suppose also that T jn (either directly or indirectly) wr-depends on T j0 which implies according to MVCC-BOTO ’s read rule that c j0 <MV H b jn and, thus, ci <MV H b jn holds.
Now suppose that Ti rw- or wr-depends on T jn which implies that the ordering relations
b jn <MV H ci and c jn <MV H bi hold, respectively. This, however, leads to a contradiction
since, according to Points (3) and (4) of Definition 9, a transaction may only commit
after it has been initiated and it may commit only once during its lifetime. Thus, the
wr
wr
wr
wr
wr
wr
wr
rw
cycles hTi δ T j0 δ . . . δ T jn δ Ti i and hTi δ T j0 δ . . . δ T jn δ Ti i cannot be formed in
MVSG(MVH).
II) Consequently, T jn may only (either directly or indirectly) rw-depend on T j0 which implies
that b j0 precedes c jn and, thus, the ordering relation ci <MV H c jn holds. Because ci pre-
180
cedes c jn in MVH, Ti cannot wr-depend on T jn and, therefore, may only rw-depend on it.
Since the ordering relations b jn <MV H b j0 and b j0 <MV H c jn hold, transactions T j0 and T jn
are executed concurrently and, therefore, can commit in arbitrary order.
? Now suppose that T j0 commits before T jn and that T jn does not rw-depend on T j0 by
the time the former gets to know about T j0 ’s commit. Now suppose at a later point in
time T jn overwrites some object version observed by T j0 which implies that T jn now
rw-depends on T j0 . Then, however, MVCC-BOTO ’s write rule (i.e., Point (3(a)ii) of
Algorithm 6.1) would be violated if this write operation were not rejected. Suppose
otherwise that T jn had already rw-depended on T j0 by the time T jn was validated
against T j0 . Then, however, Serializability Condition 1 or 3 of Algorithm 6.3 would
wr
rw
rw
rw
be violated if T jn were not aborted. Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot
be created in MVSG(MVH) when T j0 commits before T jn .
? Suppose, on the contrary, that T j0 commits after T jn and that T jn does not rw-depend
on T j0 by the time the latter gets informed about T jn ’s commit. Suppose further that
at some later point in time T j0 observes some object version whose successor version
was installed by T jn . Then, however, MVCC-BOTO ’s read rule (i.e., Point (2(c)i))
would be violated if the read operation were not rejected. Suppose, on the contrary,
that T jn had already rw-depended on T j0 when T j0 was validated against T jn . Then,
however, Serializability Condition 2 or 4 of Algorithm 6.3 would be violated if T j0
wr
rw
rw
rw
were not aborted. Consequently, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot be
created when T j0 commits after T jn .
(b) What remains to be shown is that the cycle cannot be formed when T j0 rw-depends on Ti . If T j0
rw-depends on Ti , it follows that bi precedes c j0 in MVH.
I) Now suppose that T jn (either directly or indirectly) wr-depends on T j0 which implies that
c j0 occurs before b jn in MVH, thus the ordering relation bi <MV H b jn holds.
i) Now suppose that Ti wr-depends on T jn which implies that c jn occurs before bi .
Since <MV H is transitive, it follows that bi <MV H b jn .
Because b jn <MV H c jn and
181
c jn <MV H bi hold, it follows that bi <MV H b jn <MV H bi holds too, leading to a contradiction since a transaction may start only once during its lifetime. As a result, the cycle
rw
wr
wr
wr
hTi δ T j0 δ . . . δ T jn δ Ti i cannot occur in MVSG(MVH).
ii) Therefore, Ti may only rw-depend on T jn which implies that the ordering relation
b jn <MV H ci holds. Since the ordering relations bi <MV H b jn and b jn <MV H ci hold, Ti and
T jn are concurrent to each other and, therefore, can commit in arbitrary order.
? Initially suppose that T jn commits before Ti and that Ti does not rw-depend on T jn
by the time the former gets to know about T jn ’s commit. Suppose further that at
some later point in time Ti overwrites some object version observed by T jn . Then,
however, MVCC-BOTO ’s write rule (i.e., Point (3(a)ii) of Algorithm 6.1) would be
violated if the write operation were not rejected. Alternatively, suppose that Ti had
already rw-depended on T jn by the time Ti was validated against T jn . Given those facts,
Serializability Condition 1 or 3 of Algorithm 6.3 would be violated if T jn were not
rw
wr
wr
rw
when T jn commits before Ti .
? Alternatively suppose that T jn commits after Ti and that Ti does not rw-depend on T jn
by the time the latter gets to know about Ti ’s commit. Suppose further that at some
later point in time T jn observes some object version whose successor version was
installed by Ti . Then, however, MVCC-BOTO ’s read rule (i.e., Point (2(c)i)) would
be violated if the read operation were not rejected. Suppose, on the contrary, that Ti
had already rw-depended on T jn when T jn was validated against Ti . Then, however,
Serializability Condition 2 or 4 of Algorithm 6.3 would be violated if T jn were not
rw
wr
wr
rw
aborted. Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot occur in MVSG(MVH)
even if T jn commits after Ti .
II) As another alternative to form the cycle suppose that T jn (either directly or indirectly)
rw-depends on T j0 . Then, Ti may wr- and/or rw-depend on T jn .
i) Suppose Ti wr-depends on T jn which implies that c jn precedes bi and that T jn is an
element of PT (Ti ). Because the relations bi <MV H c j0 and b j0 <MV H ci hold, T j0 and Ti
182
are concurrent to each other and, therefore, can commit in arbitrary order.
time the former gets to know about Ti ’s commit. Additionally, suppose that at some
later point in time T j0 overwrites some object version observed by Ti . Then, however,
MVCC-BOT’s write rule (i.e., Point (3(a)ii) of Algorithm 6.1) would be violated if
the write operation were not rejected. Suppose, alternatively, that T j0 had already
rw-depended on Ti when T j0 was validated against Ti . Then, however, Serializability
Condition 1 or 3 of Algorithm 6.3 would be violated if T j0 were not aborted. That
rw
rw
rw
wr
is, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot occur in MVSG(MVH) if Ti commits
before T j0 .
on Ti by the time the latter gets to know about T j0 ’s commit. Now suppose that at
was installed by T j0 . Then, however, MVCC-BOTO ’s read rule (i.e., Point (2(c)i))
would be violated if the read operation were not rejected. Suppose, on the contrary,
that T j0 had already rw-depended on Ti when the latter was validated against T j0 . This,
however, leads to a violation of Serializability Condition 2 or 4 of Algorithm 6.3 if
rw
rw
rw
wr
Ti were not aborted. Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot be created in
ii) Suppose finally that Ti rw-depends on T jn which implies that the ordering relation b jn <MV H ci holds. Because the ordering relations bi <MV H c j0 , b j0 <MV H c jn ,
and b jn <MV H ci hold, transactions Ti , T j0 , and T jn are executed concurrently and thus,
can commit in arbitrary order. Without loss of generality, assume that the commit order is ci <MV H c j0 <MV H c jn and T jn does not rw-depend on T j0 by the time T jn gets to
know about T j0 ’s commit. Now suppose that at some later point in time T jn overwrites
some object version observed by T j0 . Then, however, MVCC-BOTO ’s write rule (i.e.,
Point (3(a)ii) of Algorithm 6.1) would be violated if the write operation were not rejected.
Suppose, alternatively, that T jn had already rw-depended on T j0 when T jn was validated
183
against T j0 . Then, however, Serializability Condition 1 or 3 of Algorithm 6.3 would be
rw
rw
rw
rw
violated if T jn were not aborted. Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot occur in MVSG(MVH) and we can conclude that MVCC-BOTO produces only serializable
histories.
6.3.3
MVCC-IBOT Scheme
By enforcing BOT data currency guarantees, read-write transactions may observe numerous outof-date data objects depending on the update frequency and update pattern of other concurrently
running read-write transactions. While reading from a consistent database snapshot as it existed by
a transaction’s starting point might be a desirable characteristic for quite a number of applications,
it may not be the best choice in terms of overall system performance. The reason is that the performance of transaction-based database system is heavily impacted by the transaction abort ratio or, in
more general terms, the amount of wasted work done by the system. In systems utilizing optimistic
CC schemes, the abort ratio typically correlates with the number of transactions that a validating
transaction Ti is validated against. In case of the MVCC-BOT and MVCC-BOTO schemes, the
cardinality of ST (Ti ) determines (among other things) the probability of Ti being successfully validated by the scheduler. Due to the strictness of the BOT data currency guarantees enforced by both
schemes, all or nearly all transactions that committed between Ti ’s starting and commit point are
recorded in ST (Ti ) (some transactions might be assigned to PT (Ti ) by the MVCC-BOTO scheme).
Therefore, the conflict potential in both protocols is not expected to be insignificant.
MVCC-IBOT is a protocol designed to address the performance problem that the MVCC-BOT
and MVCC-BOTO schemes are likely to experience due to frequent validation failures. As for
the MVCC-BOT and MVCC-BOTO schemes, MVCC-IBOT provides serializability consistency
to read-write transactions. However, and with the intention to improve system performance, it
slightly changes the way read operations are translated by the scheduler into corresponding object
version reads. In contrast to MVCC-BOT and MVCC-BOTO , MVCC-IBOT does not enforce that
read-write transactions have to observe a database snapshot as it existed by their starting points.
184
MVCC-IBOT implements a more timely approach by demanding that any object read by a readwrite transaction Ti should be at least as recent as the one that existed by its starting point, i.e., the
scheme ensures in-between-of-transaction (IBOT) data currency guarantees. The intuition behind
allowing read-write transactions to read “forward” from a database state later than their starting
points is to increase the scheme’s scheduling power which, in turn, may result in more correct
histories. In this respect, however, it should be noted that increasing the number of versions a
scheduler can choose from when mapping read operations to actual version read steps does not
automatically result in performance gains, but on the contrary, if performed injudiciously, it may
even cause performance degradations.
MVCC-IBOT incorporates straightforward, but efficient heuristics when translating read operations into object version reads. The basic idea used by MVCC-IBOT is to force each read-write
transaction Ti to read “forward” on object versions created after its starting point until the object
version read by Ti is overwritten by some read-write transaction T j ∈ Tactive (Ti ). This gives rise to
the following definition:
Definition 35 (Transaction Invalidation). We say that an active read-write transaction Ti gets
invalidated by a read-write transaction T j ∈ Tactive (Ti ), if T j installs the successor version of some
object version read by Ti and commits.
When Ti finds out that it has been invalidated during the last MIBC, it stops reading forward
and from now on it reads “only” those object versions having the largest timestamp < RFST S(Ti ),
where RFSTS (which stands for read forward stop timestamp) denotes the commit timestamp of
the invalidating transaction T j . In order to indicate that a read-write transaction Ti is allowed to read
“forward” on the current database state, we associate a read forward flag or RFF to Ti . If RFF(Ti )
is set to false, it means that Ti has completed its read forward phase (RFP). Determining whether
the RFP of an active read-write transaction Ti has ended is carried out by MVCC-IBOT’s CCR
processing algorithm which additionally pre-validates Ti :
1
2
3
4
5
6
7
8
begin
RFST S(Ti ) ←− ∞;
if RFF is set to true then
if Serializability Condition 2 is violated then
RFF(Ti ) ←− false;
RFST S(Ti ) ←− CT S(T j );
goto line 10
else
if Serializability Condition 1 holds then
if T j rw-depends on Ti then
insert T j along with ST (T j ) into ST (Ti );
if ST (Ti ) ∩ PT (Ti ) 6= 0/ then
abort Ti
9
10
11
12
13
14
else
abort Ti
15
16
17
185
end
Algorithm 6.5: CCR processing and transaction validation under MVCC-IBOT.
Note that Algorithm 6.5 assumes that transactions contained in CCR are chronologically ordered
according to their commit timestamps and that transactions are processed in that order.
Assigning object versions to read operations is handled by MVCC-IBOT’s scheduler in the
following manner:
186
(a) If RFF is set to true, ri [x] is translated into ri [xk ], where xk is the most recent
version of x received by the client.
(b) Otherwise, i.e., if RFF is set to false, ri [x] is mapped into ri [xk ], where xk is the
most recent version of x that was created by a read-write transaction Tk such that
CT S(Tk ) < RFST S(Ti ).
(c) If there exists a write operation w j [x j ] in MVH such that T j ∈ Tactive (Ti ) and T j
rw-depends on Ti , i.e., it has overwritten the object version xk observed by Ti
(xk x j ), then the read operation is rejected and Ti is aborted.
(d) To record the information that Tk precedes Ti in any serial order of the transactions
in MVH, the scheduler inserts Tk into PT (Ti ).
Algorithm 6.6: MVCC-IBOT’s scheduling algorithm.
MVCC-IBOT’s scheduling algorithm differs from the one used by the MVCC-BOT and
MVCC-BOTO schemes w.r.t. the following three facts: (a) Most obviously, it lacks a start rule
since recording transactions’ start timestamps are not required any more for CC; (b) the way how
the MVCC-IBOT’s scheduler maps transactions’ read operations to corresponding object version
reads is handled on the basis of the actual conflict situation in the system rather than by an a-priory
specified version function and (c) most importantly, transactions being validated cannot be aborted
until RFF is set to false since Serializability Condition 1 needs to be checked by the MVCC-IBOT’s
scheduler only if ST (Ti ) is non-empty. As there won’t be any transaction in ST (Ti ) until RFF has
been changed to false, Ti will never be aborted before this point.
As for the MVCC-BOT scheme, after a read-write transaction Ti has successfully executed its
last data operation, Ti pre-commits and the local transaction manager initiates Ti ’s final validation by
sending a FVM to the server. MVCC-IBOT’s FVMs have the same contents as those transferred to
the server by the MVCC-BOT scheme with the difference that the former additionally contains the
value of Ti ’s RFF parameter. Upon FV M(Ti ) is arrived at the server, Ti ’s is validated by applying
Algorithm 6.5 to all read-write transactions T j where CT S(T j ) > T Sc (CCRlast ). If the validation
check succeeds, Ti ’s effects become visible to other transactions and a commit notification is sent
to the client. Otherwise, Ti aborts and a respective abort message will be delivered.
187
So far, we have not provided proof that MVCC-IBOT operates correctly in the sense that it
produces only serializable histories and it fulfills the IBOT data currency criterion. In what follows,
we show that MVCC-IBOT meets its promised guarantees by proving the following theorem:
Theorem 10. Every history generated by MVCC-IBOT is serializable and each read operation
of any committed read-write transaction Ti is done from a database state as it existed somewhere
between Ti ’s starting and its commit point (including them).
Proof. The proof consists of two parts. First, we show that MVCC-IBOT’ read rule ensures IBOT
data currency. Second, we proof that MVCC-IBOT produces only serializable histories.
Part A: Let Ti denote a read-write transaction with RFST S(Ti ) being the logical time when
an object version xk previously read by Ti was first overwritten by some read-write transaction
T j ∈ Tactive (Ti ), i.e., xk x j , DSRFST S(Ti ) being the database state as it existed at RFST S(Ti ), and
DSEOT (Ti ) representing the database state as it exists at Ti ’s commit point. We will now show
that all data objects read by Ti belong either to DSRFST S(Ti ) or to DSEOT (Ti ) and that MVCC-IBOT
ensures therefore IBOT data currency. Which object version MVCC-IBOT’s scheduler selects when
mapping read operations to actual object version reads depends on the state of RFF(Ti ). If RFF(Ti )
is set to true, MVCC-IBOT’s read rule ensures that Ti saw the database state that was up-to-date
before the updates of a read-write transaction T j ∈ Tactive (Ti ) that invalidated Ti were incorporated
into the database. If RFF(Ti ) is still set to true at Ti ’s commit point, Ti has read from DSEOT (Ti ) .
Otherwise, Ti has seen DSRFST S(Ti ) . Object versions read before Ti ’s invalidation had not been
updated prior to RFST S(Ti ) and thus, belong to DSRFST S(Ti ) . Since DSEOT (Ti ) and DSRFST S(Ti ) are at
least as recent as DSST S(Ti ) and DSEOT (Ti ) , by definition, is identical to the database state at EOT (Ti ),
MVCC-IBOT provides IBOT data current guarantees.
Part B: Again, let MVH denotes a serializable multi-version history with MVSG(MVH) being
its multi-version serialization graph. Because MVH is serializable, it follows that MVSG(MVH)
is acyclic.
Now suppose, by way of contradiction, that MVSG(MVH) contains a cycle
hTi → T j0 → . . . → T jn → Ti i, where Ti and T jn with n ≥ 0 denote read-write transactions that have
been executed by the MVCC-IBOT scheme.
188
(1) Cycle consists of two transactions only: Let us initially assume that the cycle involves only
two transactions, namely Ti and T j0 , and is formed due to either of the following dependencies
between them: (a) Ti and T j0 wr-depend on each other, (b) Ti and T j0 rw-depend on each other,
or (c) Ti wr-depends on T j0 and T j0 rw-depends on Ti or vice versa.
(a) Suppose that Ti wr-depends on T j0 . Because MVCC-IBOT ensures that read-write transactions observe committed data only, it follows that the ordering relation c j0 <MV H ci
holds. Suppose further that T j0 wr-depends on Ti . Then, it follows that the ordering relation ci <MV H c j0 holds too. This, however, leads to a contradiction since, according to
Point (3) of Definition 9, a transaction may only commit once during its lifetime. Thus,
wr
wr
MVSG(MVH) cannot contain the cycle hTi δ T j0 δ Ti i, if the MVCC-IBOT scheme is
used.
(b) Suppose that Ti rw-depends on T j0 , ci precedes c j0 in MVH, and T j0 does not rw-depend on
Ti by the time the former is informed about Ti ’s commit. Then, it follows that Ti is a member
of ST (T j0 ) according to Algorithm 6.5. Now suppose at a later stage T j0 overwrites some
object version observed by Ti which implies that T j0 now rw-depends on Ti . Then, however,
MVCC-IBOT’s write rule (i.e., Point (3(a)i) of Algorithm 6.1) would be violated if this
operation were allowed to occur. Suppose otherwise that T j0 had already rw-depended
on Ti by the time T j0 was validated against Ti . Then, Serializability Condition 2 or the
condition specified at line 13 of Algorithm 6.5 would be violated if T j0 were not aborted.
Alternatively, suppose that Ti rw-depends on T j0 , c j0 precedes ci in MVH, and T j0 does not
rw-depend on Ti by the time the former gets to know about Ti ’s commit. Now suppose
at a later stage Ti observes some object version whose successor version was installed by
T j0 , thus T j0 rw-depends on Ti . Then, however, MVCC-IBOT’s read rule (i.e., Point (1b))
would be violated if this read operation were not rejected. Now suppose that T j0 had already
rw-depended on Ti by the time the latter was validated against T j0 . Then, Serializability
Condition 1 or the condition specified at line 13 of Algorithm 6.5 would be violated if Ti
rw
rw
were not aborted. Thus, the cycle hTi δ T j0 δ Ti i cannot be produced under the MVCC-
189
IBOT scheme.
(c) Suppose Ti wr-depends on T j0 .
RFST S(Ti ) > CT S(T j0 ).
Then,
it follows that c j0 <MV H ci and
Now suppose further that T j0 rw-depends on Ti which im-
plies that bi precedes c j0 and CT S(T j0 ) ≥ RFST S(Ti ), obviously leading to a contradiction.
Using the same line of reasoning, we can actually show that the dependencies Ti rwdepends on T j0 and T j0 wr-depends on Ti cannot co-exist without that the MVCC-IBOT
rw
wr
wr
rw
scheme is violated. Thus, the cycles hTi δ T j0 δ Ti i and hTi δ T j0 δ Ti i cannot occur in
MVSG(MVH) and, therefore, it may not contain a cycle involving exactly two transaction
vertices.
(2) Cycle consists of three or more transactions: Now suppose that the cycle involves three or even
more transactions. Irrespective of the size of the cycle, it must have an edge Ti → T j0 . This
edge occurs when T j0 wr- or rw-depends on Ti .
(a) Let us initially assume that T j0 wr-depends on Ti which implies which implies that
ci <MV H c j0 , RFST S(T j0 ) > CT S(Ti ), and Ti belongs to PT (T j0 ) according to MVCCIBOT’s read rule (i.e., Point (1d)).
I) Suppose also that T jn (either directly or indirectly) wr-depends on T j0 which implies
that c j0 <MV H c jn , RFST S(T jn ) > CT S(T j0 ), and T j0 ∈ PT (T jn ).
i) Now suppose that Ti wr-depends on T jn which implies that c jn <MV H ci ,
RFST S(Ti ) > CT S(T jn ), and T jn ∈ PT (Ti ), leading to a contradiction since ci cannot precede and succeed c jn at the same time according to Point (3) of Definition 9.
wr
wr
wr
wr
Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot be produced in MVSG(MVH).
ii) On the contrary, suppose that Ti rw-depends on T jn which implies that b jn precedes
ci and CT S(Ti ) ≥ RFST S(T jn ). Since the conditions CT S(Ti ) ≥ RFST S(T jn ) and
RFST S(T jn ) > CT S(T j0 ) hold, it follows that CT S(Ti ) > CT S(T j0 ) holds too, which,
in turn, implies the ordering relation c j0 <MV H ci . This, however, contradicts the
wr
wr
wr
rw
ordering relation ci <MV H c j0 derived above. Thus, the hTi δ T j0 δ . . . δ T jn δ Ti i
cannot be formed in MVSG(MVH) either.
190
II) Consequently, the cycle may only be produced when T jn (either directly or
indirectly) rw-depends on T j0 which implies that b j0 precedes c jn and that
CT S(T jn ) ≥ RFST S(T j0 ).
i) Suppose further that Ti wr-depends on T jn which implies that c jn <MV H ci ,
RFST S(Ti ) > CT S(T jn ),
and
T jn ∈ PT (Ti ).
Since
the
conditions
CT S(T jn ) ≥ RFST S(T j0 ) and RFST S(T j0 )>CT S(Ti ) hold, it follows that the condition CT S(T jn ) > CT S(Ti ) holds too, which, in turn, implies the ordering relation
ci <MV H c jn . This, however, contradicts the ordering relation c jn <MV H ci derived
wr
rw
rw
wr
above, thus the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot occur in MVSG(MVH).
ii) To conclude this series of possible cycles, suppose that Ti rw-depends on T jn which
implies that b jn precedes ci and CT S(Ti ) ≥ RFST S(T jn ). Since the ordering relations
b jn <MV H c j0 and b j0 <MV H c jn hold, transactions T j0 and T jn are executed concurrently and, therefore, can commit in arbitrary order.
? Suppose that T j0 commits before T jn and that T jn does not rw-depend on T j0 by the
time T jn gets to know about T j0 ’s commit. Suppose further that at some later point in
time T jn overwrites some object version observed by T j0 . Then, however, MVCCIBOT’s write rule (i.e., Point (3(a)ii) of Algorithm 6.1) would be violated if the
write operation were not rejected. Suppose, alternatively, that T jn had already rwdepended on T j0 when T jn was validated against T j0 . In this situation Serializability
Condition 1 or 2 of Algorithm 6.5 would be violated if T jn were not aborted. Thus,
wr
rw
rw
rw
the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot be created in MVSG(MVH) when T j0
commits before T jn .
on T j0 by the time T j0 gets to know about T jn ’s commit. Suppose further that at
some later point in time T j0 observes some object version whose successor version
was installed by T jn , thus T jn rw-depends on T j0 . Then, however, MVCC-IBOT’s
read rule (i.e., Point (1c)) would be violated if the read operation were not rejected.
Suppose, on the contrary, that T jn had already rw-depended on T j0 when the latter
191
was validated against T jn . Then, however, Serializability Condition 2 or the condition specified at line 13 of Algorithm 6.5 would be violated if T j0 were not aborted.
wr
rw
rw
rw
Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot be created when T j0 commits
after T jn .
(b) Alternatively, the cycle can be produced when T j0 rw-depends on Ti which implies that bi
precedes c j0 in MVH and CT S(T j0 ) ≥ RFST S(Ti ).
I) Now suppose that T jn (either directly or indirectly) wr-depends on T j0 which implies
and
T jn ∈ PT (Ti ).
Since
the
conditions
RFST S(Ti ) > CT S(T jn ) and CT S(T j0 ) ≥ RFST S(Ti ) hold, it follows that
CT S(T j0 > CT S(T jn ) holds too.
This, however, contradicts the previously de-
rived ordering relation c j0 <MV H c jn and thus, Ti cannot wr-depend on T jn if the
rw
wr
wr
wr
MVCC-IBOT scheme is used. As a result, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i
cannot occur in MVSG(MVH) when Ti wr-depends on T jn .
ii) On the contrary, suppose that Ti rw-depends on T jn which implies that b jn
precedes ci in MVH and CT S(Ti ) ≥ RFST S(T jn ).
Since the ordering relations
b jn <MV H ci and bi <MV H c jn hold, transactions Ti and T jn are executed concurrently
and, therefore, can commit in arbitrary order.
? Suppose that Ti commits before T jn and that Ti does not rw-depend on T jn by the
time T jn gets to know about Ti ’s commit. Suppose further that at some later point
in time T jn observes some object version whose successor version was installed by
Ti , thus Ti now rw-depends on T jn . Then, however, MVCC-IBOT’s read rule (i.e.,
Point (1c)) would be violated if the read operation were not rejected. Suppose,
alternatively, that Ti had already rw-depended on T jn when the latter was validated
against Ti . Then, however, Serializability Condition 2 or the condition specified at
line 13 of Algorithm 6.5 would be violated if T jn were not aborted. That is, the
rw
wr
wr
rw
cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot occur in MVSG(MVH) when Ti commits
192
before T jn .
? Suppose, on the contrary, that Ti commits after T jn and that Ti does not rw-depend
on T jn by the time the former gets to know about T jn ’s commit. Suppose further that
at some later stage Ti overwrites some object version observed by T jn , thus Ti now
rw-depends on T jn . Then, however, MVCC-IBOT’s write rule (i.e., Point (3(a)ii) of
Algorithm 6.1) would be violated if the write operation were not rejected. Suppose,
alternatively, that Ti had already rw-depended on T jn when Ti was validated against
T jn . Then, however, Serializability Condition 1 or 2 of Algorithm 6.5 would be
rw
wr
wr
rw
violated if Ti were not aborted. Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot
occur in MVSG(MVH) despite the fact that Ti commits after T jn .
II) Now suppose that T jn (either directly or indirectly) rw-depends on T j0 which implies
that b j0 precedes c jn and CT S(T jn ) ≥ RFST S(T j0 ).
i) Suppose also that Ti wr-depends on T jn which implies that c jn <MV H ci ,
RFST S(Ti ) > CT S(T jn ), and T jn ∈ PT (Ti ). Since the ordering relations b j0 <MV H ci
and bi <MV H c j0 hold, transactions Ti and T j0 are executed concurrently and, therefore,
can commit in arbitrary order.
time the former gets to know about Ti ’s commit. Suppose further at some later
point in time T j0 overwrites some object version observed by Ti , thus T j0 now rwdepends on Ti . Then, however, MVCC-IBOT’s write rule (i.e., Point (3(a)ii) of
Algorithm 6.1) would be violated if the write operation were not rejected. Suppose, alternatively, that T j0 had already rw-depended on Ti when the former was
validated against Ti . In this particular situation, Serializability Condition 1 or 2
of Algorithm 6.5 would be violated if T j0 were not aborted. That is, the cycle
rw
rw
rw
wr
hTi δ T j0 δ . . . δ T jn δ Ti i cannot occur in MVSG(MVH) if Ti commits before T j0 .
on Ti by the time the latter gets to know about T j0 ’s commit. Suppose further at
193
was installed by T j0 , thus T j0 now rw-depends on Ti . Then, however, MVCCIBOT’s read rule (i.e., Point (1c)) would be violated if the read operation were
not rejected. Suppose, alternatively, that T j0 had already rw-depended on Ti when
the latter was validated against T j0 . Then, however, Serializability Condition 2 or
the condition specified at line 13 of Algorithm 6.5 would be violated if Ti were
rw
rw
rw
wr
not aborted. Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot be created in
ii) Finally suppose that Ti rw-depends on T jn which implies that b jn precedes ci and
CT S(Ti ) ≥ RFST S(T jn ). Since the ordering relations bi <MV H c j0 , b j0 <MV H c jn ,
and b jn <MV H ci hold, transactions Ti , T j0 and T jn are executed concurrently and,
therefore, can commit in arbitrary order. Without loss of generality, assume that the
commit order is ci <MV H c j0 <MV H c jn and that T jn does not rw-depend on T j0 by the
time the former gets to know about T j0 ’s commit. Suppose further at some later point
in time T jn overwrites some object version observed by T j0 , thus T jn now rw-depends
on Ti . Then, however, MVCC-IBOT’s write rule (i.e., Point (3(a)ii) of Algorithm 6.1)
that T jn had already rw-depended on T j0 when the former was validated against T j0 .
Then, again Serializability Condition 1 or 2 of Algorithm 6.5 would be violated if
rw
rw
rw
rw
T jn were not aborted. Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot occur in
MVSG(MVH). Consequently, MVSG(MVH) is acyclic and, therefore, MVCC-IBOT
produces only correct histories in the sense that they are serializable.
We conclude this subsection by investigating the relationship between MVCC-BOT and
MVCC-IBOT. MVCC-IBOT appears to be less restrictive than MVCC-BOT (protocol P1 is said
to be more restrictive than another protocol P2 if P1 permits fewer histories than P2 ), leading us to
of the following theorem:
194
Theorem 11. MVCC-IBOT’s consistency and currency definitions are less restrictive than those of
the MVCC-BOT scheme.
Proof. We first show that there exist histories that are allowed by MVCC-IBOT, but are disallowed
by MVCC-BOT. For this purpose we use history MV H7 illustrated below:
Example 8.
MV H7 = bc1 b1 r1 [x0 ] bc2 b2 r2 [y0 ] bc3 r2 [z0 ] w2 [z2 ] c2 bc4 r1 (z2 ) w1 [x1 ] c1 [x0 x1 , z0 z2 ]
MV H7 is disallowed by MVCC-BOT as T1 observed object version z2 which had been installed
by T2 after T1 ’s starting point (it can easily be seen that ST S(T1 ) ≤ CT S(T2 )). Unlike the MVCCBOT scheme, MVCC-IBOT’s scheduler allows T1 to read “forward” on T2 and thus seeing its effects
as the latter had not overwritten any object version observed by T1 , and therefore Serializability
Condition 2 is not violated. Further, as the operations in MV H7 do not violate MVCC-IBOT’s read
and write rule, MV H7 is allowed by MVCC-IBOT.
It remains to proof that all histories allowed by MVCC-BOT are allowed by MVCC-IBOT
as well. This proof is straightforward as both protocols enforce (among other things) that a validating transaction Ti may not violate Serializability Condition 1 in order to be committed. In
both protocols the validity of this condition can be easily verified by intersecting Ti ’s write set
with the read sets of all transactions in ST (Ti ). Under the MVCC-BOT scheme, this condition
must hold for all concurrently active transactions that committed during Ti ’s execution time, i.e.,
|ST (Ti )| = |Tactive (Ti )|. Under the MVCC-IBOT scheme, however, Serializability Condition 1 is to
be valid only for those transactions in Tactive (Ti ) that committed after RFF(Ti ) had been set to false,
i.e., |ST (Ti )| ≤ |Tactive (Ti )|. Therefore, transactions executed under the MVCC-IBOT protocol are
more likely to pass their validation checks than those run under the MVCC-BOT scheme and hence,
MVCC-IBOT permits more correct histories.
6.3.4
195
Optimizing the MVCC-IBOT Scheme
The MVCC-IBOT algorithm as previously presented does not yet include any optimization routines.
Similarly to MVCC-BOT, MVCC-IBOT’s performance may suffer from spurious data conflicts
between a validating transaction Ti and already validated transactions that are recorded in ST (Ti ).
Such erroneously identified conflicts occur if an already validated and committed transaction T j that
can be serialized before Ti is “erroneously” assigned into ST (Ti ) in lieu of PT (Ti ). The following
example illustrates such a situation:
Example 9.
MV H8 = bc1 b1 r1 [z0 ] b2 r2 [y0 ] bc2 w1 [z1 ] b3 b4 r3 [y0 ] c1 r4 [z1 ] r4 [x0 ] bc3 w4 [z4 ] c4 w2 [y2 ]
[x0 x3 , y0 y2 , z0 z1 z4 ]
r3 [x0 ] c2 bc4 w3 [x3 ] c3
In MV H8 each read-write transaction is executed with serializability correctness and IBOT data
currency guarantees. However, MV H8 should not be produced by the MVCC-IBOT scheme despite
the fact that MVSG(MV H8 ) is acyclic and each read operation in MV H8 satisfies MVCC-IBOT’s
data currency criterion. The MVCC-IBOT scheduler would actually abort T3 since its CCR processing and transaction validation algorithm, i.e., Algorithm 6.5, (erroneously) assigns T4 into ST (T3 )
instead of PT (T3 ), and T3 ’s write operation w3 [x3 ] conflicts with T4 ’s read operation r4 [x0 ]. But,
as the MVSG(MV H8 ) in Figure 6.4 shows, T4 could actually be serialized before T3 and should
therefore belong to PT (T3 ).
ww
T0 wr,ww
wr,
T2
wr
,w
rw
T3
ww
wr,
T1
ww
wr,
rw
w
T4
Figure 6.4: Multi-version serializability graph of MVH8 .
As the previous example has shown, MVCC-IBOT falls short of recognizing whether a validated transaction T j ∈ Tactive (Ti ) with CT S(T j ) > RFST S(Ti ) can be serialized before Ti and, therefore, should be assigned to PT (Ti ). A protocol that eliminates MVCC-IBOT’s problem is called
196
MVCC-IBOT O and will be described in remainder of this subsection. Like MVCC-BOTO , MVCCIBOTO exploits the CC information periodically delivered to clients when identifying recently committed transactions that can be safely serialized before Ti . MVCC-IBOTO assigns a validated transaction T j ∈ Tactive (Ti ) into PT (Ti ) if RFF is set to false and, additionally, Serializability Conditions 2
and 3 are satisfied.
The algorithm used by MVCC-IBOTO for transaction classification and validation is depicted
below:
1
2
3
4
5
6
7
8
begin
RFST S(Ti ) ←− ∞;
if RFF is set to true then
RFF(Ti ) ←− false;
RFST S(Ti ) ←− CT S(T j );
goto line 10
else
if Serializability Conditions 2 and 3 hold then
insert T j into PT (Ti )
else
if Serializability Conditions 1 and 4 are satisfied then
insert T j into ST (Ti )
else
abort Ti
9
10
11
12
13
14
15
16
17
end
Algorithm 6.7: CCR processing and transaction validation under MVCC-IBOTO .
It is important to note that MVCC-IBOTO does not store any validated transaction
T j ∈ Tactive (Ti ) with CT S(T j ) < RFST S(Ti ) into PT (Ti ) since those transactions can be safely
serialized before Ti and, therefore, need not be recorded in PT (Ti ). However, those read-write
transactions that terminated with commit timestamps > RFST S(Ti ) need to be classified and their
identifiers along with their read and write sets must be recorded either in PT (Ti ) or ST (Ti ). Like
the MVCC-BOTO scheme, MVCC-IBOTO groups recently committed transactions into either of
the two sets once their successful termination is broadcast by means of a CCR. At this point, Ti
197
typically has not yet issued its last read operation and, therefore, classification decisions are based
on incomplete knowledge of Ti ’s final read set. To address the problem, we extend MVCC-IBOT’s
scheduling algorithm by an additional validation routine being integrated into the scheduler’s read
rule that ensures that previously made transaction classifications are re-examined after Ti has processed a further read operation. Remember that this approach is exercised by the MVCC-BOTO
scheme as well. The complete scheduling algorithm of the MVCC-IBOTO scheme is finally illustrated below:
(a) If RFF is set to true, a read operation ri [x] is translated into ri [xk ], where xk is the
most recent version of object x received by the client.
(b) Otherwise, i.e., if RFF is set to false, a read operation ri [x] is mapped into ri [xk ],
where xk is the most recent object version created by some transaction Tk such
that CT S(Tk ) < RFST S(Ti ).
(c) Also, if RFF is set to false, Read Rule 2c of Algorithm 6.4 is enforced.
(d) To record the information that Tk precedes Ti in any serial history, the scheduler
inserts Tk into PT (Ti ).
Algorithm 6.8: MVCC-IBOTO ’s scheduling algorithm.
As for the MVCC-BOTO scheme, the server needs Ti ’s ST information in order to efficiently
perform its final transaction validation. It can actually be obtained in two ways: (a) Either the
client sends that information to the server or (b) the server itself computes the set of transactions
that (either directly or indirectly) rw-depend on Ti , and, therefore, cannot be serialized before it.
Availability of these two options allows clients to trade off network bandwidth for additional CPU
overheads at the server and vice versa. Since we expect ST (Ti ) to be relatively small in size,
whereas its associated computational costs are expected to be fairly large, the recommended default
is to send such data piggybacked on FV M(Ti ) to the server.
Again, we conclude this subsection by showing that MVCC-IBOTO produces only correct histories as stated in Theorem 12 below:
Theorem 12. MVCC-IBOTO produces only correct multi-version histories in the sense that they
198
are serializable and any committed read-write transaction Ti in MVH has observed a consistent
snapshot of the database as it existed somewhere between Ti ’s starting and commit point (including
them).
Proof. The following proof is again split into two parts. We first show that MVCC-IBOTO ensures
IBOT data currency to read-write transactions and thereafter, we provide evidence that it produces
only serializable histories.
Part A: Similarly to the proof of Theorem 10, we start by proving that MVCC-IBOTO enforces
IBOT data currency to any committed read-write transaction Ti . Which object versions Ti actually
reads during its execution is determined by MVCC-IBOTO ’s read rule. In the proof of Theorem 10,
we have shown that MVCC-IBOT’s read rule provides IBOT data currency to read-write transactions. Since MVCC-IBOTO ’s read rule contains only a slight modification of MVCC-IBOT’s rule,
it is sufficient to show that the modified part does not violate the IBOT data currency criterion.
Compared to the MVCC-IBOT scheme, MVCC-IBOTO ’ read rule contains a modified version of
Statement (1c). Since the operations enforced by Statement (1c) do not influence the way how
MVCC-IBOTO maps read operations to object version reads, but are rather concerned with checking an active transaction’s serializability, it follows that MVCC-IBOTO ensures IBOT data currency
guarantees too.
Part B: We will now show that MVCC-IBOTO produces only serializable histories. Again,
let MVH denote a serializable multi-version history with MVSG(MVH) being its multi-version
serialization graph. Since MVH is serializable, it follows that MVSG(MVH) is acyclic. Now
suppose, by way of contradiction, that MVSG(MVH) contains a cycle hTi → T j0 → . . . → T jn → Ti i,
where Ti and T jn with n ≥ 0 denote read-write transactions that have been executed by the MVCCIBOTO scheme.
(1) Cycle consists of two transactions only: As in previous proofs, we start by assuming that the
cycle involves only two transactions, namely Ti and T j0 , i.e., n = 0. Then, to form the cycle,
one of the following conflict dependencies between Ti and T j0 must hold: (a) Ti and T j0 wrdepend on each other, (b) Ti and T j0 rw-depend on each other, or (c) Ti wr-depends on T j0 and
199
T j0 rw-depends on Ti or vice versa.
(a) Suppose that Ti wr-depends on T j0 . Because MVCC-IBOTO ensures that read-write transactions observe committed data only, it follows that the ordering relation c j0 <MV H ri [x j ]
holds (provided that T j0 installs an object version x j0 and Ti later reads the created version).
Since T j0 wr-depends on Ti , it follows that the ordering relation ci <MV H r j0 [xi ] holds too
(provided that Ti installs an object version xi and T j0 later reads the created version). This,
however, leads to a contradiction since, according to Points (3) and (4) of Definition 9, a
transaction can only commit after it was initiated. Thus, MVSG(MVH) cannot contain the
wr
wr
cycle hTi δ T j0 δ Ti i, if the MVCC-IBOTO scheme is used.
(b) Suppose that Ti rw-depends on T j0 and provided that ci precedes c j0 and T j0 does not
rw-depend on Ti by the time the former is informed about Ti ’s commit, it follows that
RFST S(T j0 ) ≤ CT S(Ti ) and Ti is a member of ST (T j0 according to Algorithm 6.7. Now
suppose at a later stage T j0 overwrites some object version observed by Ti which implies
that T j0 now rw-depends on Ti . Then, however, MVCC-IBOTO ’s write rule (Point (3(a)i)
of Algorithm 6.1) would be violated if this operation were allowed to occur. Suppose otherwise that T j0 had already rw-depended on Ti by the time T j0 was validated against Ti .
Then, Serializability Condition 1 or 2 of Algorithm 6.7 would be violated if T j0 were not
aborted. Alternatively, suppose that Ti rw-depends on T j0 , c j0 precedes ci in MVH, and T j0
does not rw-depend on Ti by the time the former gets to know about Ti ’s commit. Now
suppose at a later stage Ti observes some object version whose successor version was installed by T j0 , thus T j0 rw-depends on Ti . Then, however, MVCC-IBOT’s read rule (i.e.,
Point (1c)) would be violated if this read operation were not rejected. Now suppose that
T j0 had already rw-depended on Ti by the time the latter was validated against T j0 . Then,
Serializability Condition 1 or 2 of Algorithm 6.7 would be violated if Ti were not aborted.
rw
rw
Thus, the cycle hTi δ T j0 δ Ti i cannot be produced under the MVCC-IBOTO scheme.
(c) Suppose that Ti wr-depends on T j0 .
Then, it follows that RFST S(Ti ) > CT S(T j0 ).
Now suppose that T j0 rw-depends on Ti which implies that bi precedes c j0 and
200
CT S(T j0 ) ≥ RFST S(Ti ), leading to a contradiction. Using the same line of reasoning, we
can show that the dependencies Ti rw-depends on T j0 and T j0 wr-depends on Ti cannot cowr
rw
exist without that the MVCC-IBOTO scheme is violated. Thus, the cycles hTi δ T j0 δ Ti i
wr
rw
and hTi δ T j0 δ Ti i cannot occur in MVSG(MVH) and, therefore, it may not contain any
cycle involving exactly two transactions.
(2) Cycle consists of three or more transactions: In the more complex case, the cycle may involve
three or more read-write transactions. Irrespective of how many transactions form the cycle, it
must have an edge Ti → T j0 .
(a) Let us initially assume there is a wr-edge from Ti to T j0 , i.e., T j0 wr-depends on Ti which
implies that ci <MV H c j0 , RFST S(T j0 ) > CT S(Ti ), and Ti belongs to PT (T j0 ).
I) Besides suppose that T jn (either directly or indirectly) wr-depends on T j0 which implies that c j0 <MV H c jn , RFST S(T jn ) > CT S(T j0 ), and T j0 ∈ PT (T jn ).
RFST S(Ti ) > CT S(T jn ), and T jn ∈ PT (Ti ), leading to a contradiction since ci canwr
wr
not precede and succeed c jn at the same time. As a result, the cycle hTi δ T j0 δ
wr
wr
. . . δ T jn δ Ti i cannot be formed in MVSG(MVH).
ii) Suppose, alternatively, that Ti rw-depends on T jn which implies that b jn precedes
ci and CT S(Ti ) ≥ RFST S(T jn ). Since the conditions CT S(Ti ) ≥ RFST S(T jn ) and
RFST S(T jn ) > CT S(T j0 ) hold, it follows that CT S(Ti ) > CT S(T j0 ) holds too, which,
in turn, implies the ordering relation c j0 <MV H ci . This, however, contradicts the
wr
wr
wr
rw
ordering relation ci <MV H c j0 . Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot be
produced under the MVCC-IBOTO scheme.
II) Consequently, the cycle may only occur when T jn (either directly or indirectly) rwdepends on T j0 which implies that b j0 precedes c jn and CT S(T jn ) ≥ RFST S(T j0 ).
and
T jn ∈ PT (Ti ).
Since
the
conditions
CT S(T jn ) ≥ RFST S(T j0 ) and RFST S(T j0 ) > CT S(Ti ) hold, it follows that the
201
condition CT S(T jn ) > CT S(Ti ) holds too, which, in turn, implies the ordering
relation ci <MV H c jn . This, however, contradicts the ordering relation c jn <MV H ci
wr
rw
rw
wr
derived above. Consequently, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot occur in
MVSG(MVH) as well.
ii) To conclude this series of possible cycles, suppose that Ti rw-depends on T jn which
implies that b jn precedes ci and CT S(Ti ) ≥ RFST S(T jn ). Since the ordering relations
b jn <MV H c j0 and b j0 <MV H c jn hold, transactions T j0 and T jn are executed concurrently and, therefore, can commit in arbitrary order.
? Suppose that T j0 commits before T jn and that T jn does not rw-depend on T j0 by the
time T jn gets to know about T j0 ’s commit. Suppose further that at some later point
in time T jn overwrites some object version observed by T j0 , thus T jn rw-depends
on T j0 . Then, however, MVCC-IBOTO ’s write rule (i.e., Point (3(a)ii) of Algorithm 6.1) would be violated if the write operation were not rejected. Suppose, alternatively, that T jn had already rw-depended on T j0 when T jn was validated against
T j0 . In this case Serializability Condition 1 or 3 of Algorithm 6.7 would be viowr
rw
rw
rw
lated if T jn were not aborted. Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot be
created in MVSG(MVH) when T j0 commits before T jn .
on T j0 by the time the latter gets to know about T jn ’s commit. Suppose further that at
some later point in time T j0 observes some object version whose successor version
was installed by T jn , thus T jn rw-depends on T j0 . Then, however, MVCC-IBOTO ’s
read rule (i.e., Point (1c)) would be violated if the read operation were not rejected.
Suppose, alternatively, that T jn had already rw-depended on T j0 when T j0 was validated against T jn . Then, however, Serializability Condition 2 or 4 of Algorithm 6.7
wr
rw
rw
rw
would be violated if T j0 were not aborted. Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i
cannot be created when T j0 commits after T jn .
(b) What remains to be shown is that the cycle cannot be formed when T j0 rw-depends on Ti . If
T j0 rw-depends on Ti , it follows that bi precedes c j0 in MVH and CT S(T j0 ) ≥ RFST S(Ti ).
202
I) Now suppose that T jn (either directly or indirectly) wr-depends on T j0 which implies
RFST S(Ti )
>
CT S(T jn ),
and
T jn ∈ PT (Ti ).
Since
the
conditions
RFST S(Ti ) > CT S(T jn ) and CT S(T j0 ) ≥ RFST S(Ti ) hold, it follows that
CT S(T j0 > CT S(T jn ) holds too.
This, however, contradicts the the previously
rw
wr
wr
wr
derived ordering relation c j0 <MV H c jn and thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i
cannot occur in MVSG(MVH) when Ti wr-depends on T jn .
ii) Suppose, on the contrary, that Ti rw-depends on T jn which implies that b jn
precedes ci in MVH and CT S(Ti ) ≥ RFST S(T jn ).
Since the ordering relations
b jn <MV H ci and bi <MV H c jn hold, transactions Ti and T jn are executed concurrently
and, therefore, can commit in arbitrary order.
? Suppose that Ti commits before T jn and that Ti does not rw-depend on T jn by the
time T jn gets to know about Ti ’s commit. Suppose further that at some later point
in time T jn observes some object version whose successor version was installed
by Ti , thus Ti rw-depends on T jn . Then, however, MVCC-IBOTO ’s read rule (i.e.,
alternatively, that Ti had already rw-depended on T jn when the latter was validated
against Ti . Then, however, Serializability Condition 2 or 4 of Algorithm 6.7 would
rw
wr
wr
rw
be violated if T jn were not aborted. That is, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i
cannot occur in MVSG(MVH) when Ti commits before T jn .
? Suppose, on the contrary, that Ti commits after T jn and that Ti does not rw-depend
on T jn by the time Ti gets to know about T jn ’s commit. Suppose further that at some
later stage Ti overwrites some object version observed by T jn , thus Ti now rwdepends on T jn . Then, however, MVCC-IBOTO ’s write rule (i.e., Point (3(a)ii) of
Algorithm 6.1) would be violated if the write operation were not rejected. Suppose,
alternatively, that Ti had already rw-depended on T jn when Ti was validated against
T jn . Then, however, Serializability Condition 1 or 3 of Algorithm 6.7 would be
203
rw
wr
wr
rw
violated if Ti were not aborted. Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot
occur in MVSG(MVH) even if Ti commits after T jn .
II) Now suppose that T jn (either directly or indirectly) rw-depends on T j0 which implies
that b j0 precedes c jn and CT S(T jn ) ≥ RFST S(T j0 ).
RFST S(Ti ) > CT S(T jn ), and T jn ∈ PT (Ti ). Since the ordering relations b j0 <MV H ci
and bi <MV H c j0 hold, transactions Ti and T j0 are executed concurrently and, therefore,
can commit in arbitrary order.
? Suppose that Ti commits before T j0 and T j0 does not rw-depend on Ti by the time
T j0 gets to know about Ti ’s commit. Suppose further at some later point in time T j0
overwrites some object version observed by Ti , thus T j0 rw-depends on Ti . Then,
however, MVCC-IBOTO ’s write rule (i.e., Point (3(a)ii) of Algorithm 6.1) would
be violated if the write operation were not rejected. Suppose, alternatively, that
T j0 had already rw-depended on Ti when T j0 was validated against Ti . In this case
Serializability Condition 1 or 3 of Algorithm 6.7 would be violated if T j0 were not
rw
rw
rw
wr
if Ti commits before T j0 .
on Ti by the time Ti gets to know about T j0 ’s commit. Suppose further at some later
point in time Ti observes some object version whose successor version was installed
by T j0 , thus T j0 rw-depends on Ti . Then, however, MVCC-IBOTO ’s read rule (i.e.,
alternatively, that T j0 had already rw-depended on Ti when Ti was validated against
T j0 . Then, again Serializability Condition 2 or 4 of Algorithm 6.7 would be violated
rw
rw
rw
wr
if Ti were not aborted. Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot be created
in MVSG(MVH) even if Ti commits after T j0 .
ii) Last, but not least, suppose that Ti rw-depends on T jn which implies that b jn
precedes ci and CT S(Ti ) ≥ RFST S(T jn ). Since the ordering relations bi <MV H c j0 ,
204
b j0 <MV H c jn , and b jn <MV H ci hold, transactions Ti , T j0 and T jn are executed concurrently and, therefore, can commit in arbitrary order. Without loss of generality,
assume that the commit order is ci <MV H c j0 <MV H c jn and that T jn does not rwdepend on T j0 by the time T jn gets to know about T j0 ’s commit. Suppose further at
some later point in time T jn overwrites some object version observed by T j0 , thus
T jn rw-depends on Ti . Then, however, MVCC-IBOTO ’s write rule (i.e., Point (3(a)ii)
of Algorithm 6.1) would be violated if the write operation were not rejected. Suppose, alternatively, that T jn had already rw-depended on T j0 when T jn was validated
against T j0 . In this case Serializability Condition 1 or 3 of Algorithm 6.7 would be
rw
rw
rw
rw
violated if T jn were not aborted. Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot occur in MVSG(MVH). Consequently, MVSG(MVH) is acyclic and, therefore,
MVCC-IBOTO produces only correct histories in the sense that they are serializable.
6.3.5
MVCC-EOT Scheme
Previously described protocols do not provide strong data currency guarantees to read-write transactions which enforces that all object versions processed by their read operations are still up-to-date
by the transaction’s commit point. Strong data currency guarantees along with the adherence to the
serializability criterion are required in many conventional and mobile database-supported applications such as stock market, air traffic control, factory floor, on-board airline databases, to name just
a few. Apart from the data currency issue, a further motivating factor for yet another CC protocol
is related to the relatively high space and time costs intrinsic to all previously proposed schemes
making them unattractive for very resource-poor mobile clients. The protocol that eliminates those
problems, though at the cost of performance degradation (see Section 6.5.4), is called MVCC-EOT.
As its name implies, MVCC-EOT provides clients with end-of-transaction data currency guarantees along with serializability correctness. Since the basic idea and implementation underlying
MVCC-EOT are akin to the invalidation-only method [123, 124], we only briefly sketch its core
components. Like other protocols in the MVCC-* suite, MVCC-EOT exploits periodically dissem-
205
inated CCRs in order to pre-validate active read-write transactions. Whenever a new CCR appears
on the broadcast channel, MVCC-EOT, or, more precisely, the client transaction manager validates
an active read-write transaction Ti against each read-write transaction T j included into the report by
the following algorithm:
4
begin
abort Ti
5
end
1
2
3
Algorithm 6.9: CCR processing and transaction validation under MVCC-EOT.
To enforce EOT data currency guarantees, MVCC-EOT aborts and subsequently restarts any
active read-write transaction Ti that has observed a stale object version. With regard to the data
currency guarantees, MVCC-EOT is obviously strictly more restrictive than MVCC-IBOT which
results in permitting less correct histories. The reason is that the MVCC-IBOT protocol is less vulnerable to invalidation notifications saying that a read-write transaction T j ∈ Tactive (Ti ) has overwritten the object version previously read by the active read-write transaction Ti . While MVCC-EOT
always response to such a message by immediately aborting Ti , for MVCC-IBOT it is merely an
indication to terminate Ti ’s RFP and to change the way the scheduler translates read operations into
actual version read steps. Under the MVCC-IBOT scheme, an invalidating transaction T j causes an
active read-write transaction Ti to abort only in those cases when either (a) Ti rw-depends on T j , or
(b) T j rw-depends on Ti , another read-write transaction Tk is contained in ST (T j ), i.e., Tk wr- or rwdepends on T j , and Ti , in turn, wr-depends on Tk , i.e., Tk ∈ PT (Ti ). Therefore, the MVCC-IBOT’s
performance is expected to be superior to that of the MVCC-EOT scheme. Determining whether
MVCC-BOT or MVCC-EOT produces more histories is not as straightforward as before since both
protocols are incomparable. Both schemes ensure different data currency guarantees by employing
dissimilar transaction validation algorithms. We will not further discuss this issue here and refer the
interested reader to our simulation study that shows comparative performance of both protocols.
To complete the protocol’s description, we formulate the rules according to which the MVCC-
206
EOT’s scheduler processes read and write operations issued by an active read-write transaction Ti :
1. Read Rule: A read operation ri [x] is transformed into ri [xk ], where xk is the latest
committed version of x received by the client up to now.
2. Write Rule: A write operation wi [x] is transformed into wi [xi ] and executed.
Algorithm 6.10: MVCC-EOT’s scheduling algorithm.
Similarly to other protocols in the MVCC-* suite, MVCC-EOT initiates Ti ’s final validation
as soon as Ti ’s last data operation has been processed by the client. For that purpose, the client
transaction manager transmits FV M(Ti ) containing the following components to the server: (a)
ReadSet(Ti ), (b) W riteSet(Ti ), and (c) T Sc (CCRlast ). It is interesting to note that final validation
messages emitted by the MVCC-EOT scheme contain a proper subset of the information contained
by other schemes of the MVCC-* suite. Thus, it incurs the lowest local storage and network costs
among all protocols of the MVCC-* suite to mobile clients.
Last, but not least, we give the proof that MVCC-EOT produces only correct histories in the
sense that they are serializable and all committed transactions fulfill EOT data currency guarantees.
Theorem 13. MVCC-EOT produces only serializable multi-version histories that ensure EOT data
currency guarantees to read-write transactions, i.e., for each read operation ri [x j ] ∈ ReadSet(Ti )
of any committed read-write transaction Ti in MVH there exists no object version xk at Ti ’s commit
point such that x j xk .
Proof. Again, the proof is divided into two parts. We start by proving that MVCC-EOT ensures
EOT data currency and then show that the protocol generates only serializable histories.
Part A: Let Ti denote a read-write transaction with EOT (Ti ) being the logical time of Ti ’s final
validation, and DSEOT (Ti ) representing the database state at Ti ’s commit time. We claim that the
values read by Ti correspond to DSEOT (Ti ) . Now suppose, by way of contradiction, that Ti has
read an object version x j not belonging to the DSEOT (Ti ) , i.e., x j has been overwritten by some
version xk and is therefore discarded from DSEOT (Ti ) . Then, however, Serializability Condition 2 of
Algorithm 6.10 would be violated and, therefore, Ti ’s commit is not possible. Thus, MVCC-EOT
provides EOT data currency guarantees.
207
Part B: Let MVH denote any serializable multi-version history with MVSG(MVH) being
its multi-version serialization graph. As MVH is serializable, MVSG(MVH) is acyclic. Let
us now suppose, by means of contradiction, that MVSG(MVH) contains a cycle of the form
hTi → T j0 → . . . → T jn → Ti i, where Ti and T jn with n ≥ 0 denote read-write transactions that have
been processed under the MVCC-EOT scheme. Then, in order for the cycle to be produced, Ti must
have both an incoming and outgoing edge. Thereby, the outgoing edge Ti → T j0 is from a read-write
transaction Ti that either observed an object version whose successor version was installed by T j0
or that installed an object version read by T j0 and the incoming edge T jn → Ti is to a read-write
transaction Ti that either created the successor object version read by T jn or observed an object version written by T jn . It is important to note that T j0 and T jn do not necessarily need to be distinct
transactions.
(a) To start with, initially suppose that T j0 rw-depends on Ti .
I) Suppose further that Ti commits after T j0 , i.e., c j0 <MV H ci , and the rw-dependency between T j0 and Ti had already existed by the time Ti was validated against T j0 . Then,
Serializability Condition 2 of Algorithm 6.10 would be violated if Ti were not aborted.
Suppose, alternatively, that T j0 had not rw-depended on Ti by the time when the latter
was validated against T j0 . Then, however, MVCC-EOT’s read rule would be violated if
rw
?
?
?
Ti had missed T j0 ’s effects. Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot occur in
MVSG(MVH) provided that the condition c j0 <MV H ci holds.
II) Now suppose that T j0 rw-depends on Ti and Ti commits before T j0 , i.e., ci <MV H c j0 . Suppose also that T jn (either directly or indirectly) wr- or rw-depends on T j0 which implies that
the ordering relation c j0 <MV H c jn holds. Note that it is possible to reason that T j0 commits
before T jn since otherwise MVCC-EOT’s validation algorithm (see Algorithm 6.9) or read
rule (see Algorithm 6.10) would be violated. Suppose further that Ti wr- or rw-depends
on T jn which implies that the ordering relation c jn <MV H c ji holds. Then, however, the
relation ci <MV H c j0 <MV H ci holds too, leading to a contradiction according to Point (3)
of Definition 9. Thus, Ti cannot get involved into a cycle when T j0 rw-depends on it, i.e.,
208
rw
?
?
?
the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot be produced by MVCC-EOT even if Ti commits
before T j0 .
(b) It remains to prove that Ti does not generate a cycle when T j0 wr-depends on it. To do so,
suppose that T j0 wr-depends on Ti which implies that the ordering relation ci <MV H c j0 holds.
Suppose further that T jn (either directly or indirectly) wr- or rw-depends on T j0 which implies
that the ordering relation c j0 <MV H c jn holds. Last not least, suppose that Ti wr- or rw-depends
on T jn which implies that the ordering relation c jn <MV H ci holds. Consequently, and due to
the transitive nature of <MV H , it follows that the ordering relation ci <MV H c j0 <MV H ci holds
too. This, however, contradicts Point (3) of Definition 9 since a transaction can only commit
wr
?
?
?
once during its lifetime. Thus, the cycle hTi δ T j0 δ . . . δ T jn δ Ti i cannot be produced without
violating the MVCC-EOT scheme and, therefore, we can conclude that MVCC-EOT produces
only serializable histories.
6.4
Performance-related Issues
In the subsequent subsection, we elaborate on some issues considerably influencing the performance of the MVCC-* schemes, namely data caching, intermittent network connectivity, and exploiting semantic knowledge for concurrency control.
6.4.1
Caching
As all protocols of the MVCC-* suite rely on multi-versioning and thus require that multiple versions of frequently modified objects are available in the system, a global version storage strategy
needs to be evolved to address the issue. A simple, but (very) costly, strategy would be to maintain all the versions of each database object somewhere in the system. However, such a storage
policy could even exceed today’s primary and secondary storage capacities if we (not unrealistically) assume that object updates occur frequently. Fortunately, there is no need to maintain a
6.4. Performance-related Issues
209
complete history of all the generated versions of each database object to efficiently facilitate multiversion concurrency control. In order to provide optimal data support for any of the protocols in
the MVCC-* suite, the system needs to maintain only those object versions that are at least as current as the versions that existed by the time the oldest active read-write transaction in the system
started its execution. Even though such an approach prevents the number of versions from growing
indefinitely, there is still no guarantee that the server is capable of storing all those versions. As a
matter of fact, since read-write transactions are usually long running in nature as clients may become disconnected and experience long network propagation delays, the number of versions that
need to be kept could still exceed the storage capacities of the server. Taking that into account, we
suggest to impose a limit on the number of versions the server is allowed to maintain. Previously
presented experimental results in Subsection 5.4.5.1 have shown that MVCC protocols designed for
processing read-only transactions achieve good performance results if the server keeps up to five of
the most recently installed versions of each database object. Since the MVCC protocols used in the
simulation study presented in Subsection 5.4.5.1 impose similar data storage requirements on the
database system as the MVCC-* schemes, we expect such a version limit to be practicable even for
systems where read-write transactions are processed.
Apart from the fact that as many as possible requested object versions should be available in
the system, another important performance issue is that the desired object version should be located
as close as possible to the client application. Ideally, all database versions that are of potential
use for the client would be stored within the client cache. However, the size of the client cache is
typically much smaller than the size of the database, allowing the client to cache only a subset of
the useful versions of the database. Therefore, the basic problem is to determine which versions
of which objects should be cached at the client to achieve the best overall system performance and
to decide what object version should be evicted once the client cache is full and a new version is
requested. The latter issue is referred to as a cache replacement problem and can be resolved by
deploying a judicious cache replacement policy that selects among the cached object versions the
one with the lowest expected utility for the client in the future. As a matter of fact, finding optimal
cache replacement victims in a multi-version dissemination-based environment is more complex
210
than in conventional distributed environments since the caching utility of an object version x j for a
client running a transaction Ti depends on a number of factors, namely (a) the access recency and
frequency of x in the recent past, (b) the update probability of x in the near future, (c) the version
storage policy of the server, (d) the re-acquisition costs of x j once evicted from the client cache,
and (e) the re-processing costs of any active transaction Ti that would occur when a fetch request for
x j fails because the version has been evicted from the system and, therefore, the transaction needs
to be aborted.
In Chapter 5 we introduced an efficient integrated cache replacement and prefetching policy,
called MICP, which takes all those parameters into account when determining eviction victims.
However, MICP was designed to efficiently support read-only transactions and, therefore, might
either not be suitable for read-write transactions or might need to be adapted to meet their specific
requirements. If one considers the above parameters applied by MICP to determine replacement
victims, it is obvious that they are not transaction type-specific and are therefore applicable for
read-write transactions as well. Additionally, MICP’s underlying partitioned cache structure (see
Figure 5.2) suits read-write transactions as perfect as read-only ones as both types require current and non-current data objects for transaction processing and suffer heavily from object version
misses. Remember that MICP tries to minimize the performance penalty produced by such cache
misses by (a) allocating dedicated memory space to non-re-cacheable object versions to avoid them
to compete with re-cacheable object versions for memory slots and (b) using a cost-based rather
than probability-based metric to choose replacement victims. There is, however, one facet of MICP
where its operations are not independent of the transaction type concerned, namely in part of the
conditions for evicting useless object versions. Garbage collection of object versions is actually
not only transaction type-dependent, but is even a matter of the CC scheme being used. As different protocols enforce dissimilar correctness and currency guarantees, the conditions for discarding
object versions from the cache deviate from one protocol to another. MICP in this respect relies on
a close cooperation with the client transaction manager whose task is to provide the information
necessary to enforce the correct and instant eviction of locally cached object versions once they
become useless. Due to this decoupling, MICP is suitable for virtually any transaction type and CC
211
scheme and can therefore be used without any adaptation with the MVCC-* suite as well. Whether
or not MICP is able to outperform other well-known caching and prefetching policies when being used by clients processing read-write transactions rather than read-only ones, is examined in
Subsection 6.5.5.4.
6.4.2
Disconnections
For reasons of presentational convenience, the MVCC-* suite was introduced with the fundamental
assumption that clients and the server are persistently connected. However, this does not match the
reality of mobile computing. Mobile clients are typically only weakly connected to the server, i.e.,
there are periods where clients may be disconnected as well as other periods when data transfer
between both parties can occur. Network disconnections can be either involuntary such as due to a
network, client or server failure, or voluntary such as when the user temporarily unplugs its device
from the network. Irrespective of the cause of the disconnection and the CC protocol deployed, disconnection from the network has a detrimental effect on the client’s operations. (a) First, as clients
do not perceive CCRs anymore, they cannot validate their ongoing transactions against recently
committed ones and, therefore, may perform wasted work by continuing to process transactions
destined to abort. (b) Second, since clients are unable to update their caches during disconnection
periods, the currency of the client cache degrades and may subsequently cause stale cache reads.
(c) Third, and most important from the performance perspective, transaction processing may be
hindered or even interrupted, e.g., if a requested object version is not cache-resident or a final commit message cannot instantly be transferred to the server. It is obvious that all of the aforementioned
limitations undoubtedly restrict client transaction processing irrespective of the CC protocol being
deployed. However, the impact of network disruptions on the protocols of the MVCC-* suite varies
from one scheme to another.
In what follows, we discuss whether the protocols in the MVCC-* suite operate correctly under
disconnections and if not, we propose measures to rectify the problem. As a matter of fact, clients do
not receive CCRs when disconnected from the broadcast channel. Fortunately, with the exception
of the MVCC-IBOT and MVCC-IBOTO protocols, the MVCC-* suite does not required CCRs for
212
protocol correctness reasons, but rather to guarantee short response times to applications using it.
The MVCC-IBOT and MVCC-IBOTO protocols, however, require those reports in order to figure
out for each active read-write transaction Ti whether its RFP is completed or not and, if so, what is
its RFSTS. Remember that the RFSTS is necessary for the scheduler of the both protocols in order
to correctly map object read operations into respective version reads. Also, the RFSTS of an active
read-write transaction Ti influences its conflict relationships with the other read-write transactions
in the history MVH. Therefore, RFST S(Ti ) decides on Ti ’s serialization order within MVH and
hence, if chosen wrongly, may turn MVH into an unserializable history as the following example
shows:
Example 10.
MV H9 = bc1 b1 r1 [z0 ] r1 [x0 ] b2 r2 [x0 ] bc2 w1 [z1 ] w2 [x2 ] r2 [y0 ] bc3 w2 [y2 ] c2 bc4 r1 [y2 ] c1
[x0 x2 , y0 y2 , z0 z1 ]
History MV H9 depicts the operations of two concurrent transactions T1 and T2 that run at clients
C1 and C2 , respectively. As MV H9 shows, transaction T2 committed during MIBC 3 and hence, its
CC related information were broadcast at the beginning of MIBC 4. Now suppose that C1 lost
its network connection from the broadcast channel between the middle of MIBC 3 and the end of
MIBC 4 and, therefore, missed the CCR containing T2 ’s CC information. By the time C1 reconnects to the network, RFF(T1 ) is still set to true allowing T1 to read “forward” on recently committed transactions. This, however, may actually result in an “incorrect” or unserializable history as
illustrated in Figure 6.5.
w
w
wr,
T1
T0 wr,ww
rw
wr
T2
To protect MVCC-IBOT and MVCC-IBOTO from producing unserializable histories due to
CCR misses, there are basically two alternatives: (a) We could pessimistically assume that any
active read-write transaction Ti becomes invalidated during a client’s disconnection period. To
comply with this assumption, at disconnection time we could automatically set RFST S(Ti ) to the
213
largest commit timestamp of any read-write transaction being reported to the client as having successfully completed its execution, which has the effect as if Ti had been invalidated just before the
disconnection actually occurred. (b) We could optimistically assume that disconnection periods are
of short duration and invalidations occur only infrequently. Consequently, Ti ’s RFP does not need
to be automatically ended by the time the disconnection actually takes place. Rather, upon reconnection the client could determine whether Ti had been invalidated during the disconnection period
and, if so, RFST S(Ti ) could be set accordingly.
If we compare both alternatives, at first glance the latter alternative may appear to be more
attractive since it prevents an active read-write transaction Ti from unnecessarily terminating its
RFP and guarantees that RFST S(Ti ) reflects Ti ’s actual invalidation time. However, the optimistic
approach has a serious disadvantage w.r.t. reconnections since it forces transaction processing to be
blocked until every missed CCR has been processed by the client. As communication bandwidth
in mobile environments is severely limited and round trip times (RTTs) are high compared to stationary computing, transaction processing is expected to be interrupted for a significant amount of
time. The pessimistic approach avoids such reconnection-induced waiting times at the cost of a
potentially increasing number of transaction aborts caused by additionally occurring data conflicts
between concurrent read-write transactions. Furthermore, this approach is much easier to implement and has therefore been selected for our experimental study. Finally, we can conclude that
disconnections may be useful in order to preserve scarce battery power of mobile devices. However, as far as CC is concerned, they are rather unbeneficial causing the overall system performance
to degrade. However, it is important to note that despite those drawbacks intermittent connectivity
does not violate the correctness of our proposed protocols.
6.4.3
Conflict Reducing Techniques
In this subsection we will briefly sketch some methods aimed at reducing the number of data conflicts that occur when applying the MVCC-* suite. The basic idea behind those methods is to exploit
semantic knowledge that can be derived from the objects being accessed and modified, from the application that operates on them, from the database structure and integrity constraints, and that can
214
be provided by the application programmer and application user in order to identify so-called false
or permissible conflicts among those detected by the protocols’ transaction validation algorithm. In
the following we examine methods for increasing the amount of permissive concurrency generated
by the MVCC-* suite.
Remember that the MVCC-* suite has been designed to provide consistency management for
general purpose database applications at relatively low implementational costs. To meet those prerequisites, all the protocols of the MVCC-* suite operate on a relatively low level of abstraction,
namely simple object reads and writes. As those operations do not capture any semantics of their
higher-level operations or other valuable knowledge, conflict avoidance measures are limited to
avoiding conflicts caused by grouping of information and by restricting the number of versions of
the same data object stored in the system. False conflicts due to information grouping are relaxed
since our protocols carry out concurrency control on an object rather than on a page basis. The
underlying system architecture of the MVCC-* suite does not impose any fixed upper limit on the
number of object versions the system is allowed to preserve simultaneously. Even though there is a
physical limit on the number of versions that the server can maintain for each database object, there
exists no system-wide upper limit on this number because clients may compensate for the version
restriction of the server by storing non-re-cacheable objects in their local caches.
All of those conflict-reducing techniques can be extended by the following non-exclusive list
of methods: The first approach to diminishing false conflicts is by specifying dependency relations
among read and write operations of the same read-write transaction. One of the inherent problems of the read/write model is that the transaction manager does not have any information as to
whether some write operation wi [xi ] depends on some earlier read operation, such as ri [x j ] or ri [yk ],
invoked by the same read-write transaction Ti . As a consequence, the read/write model conservatively assumes that any object version written by a transaction Ti depends on the values of all object
versions previously read. To eliminate transaction aborts due to that conservatism, the application
programmer or skilled user could specify, on the basis of program-specific knowledge, for each
write operation wi [xi ] of Ti its dependency relation DR(xi ) w.r.t. previous read operations. Under
this concept, a read operation ri [x j ] with ri [x j ] <MV H wi [xi ] is contained in DR(xi ) if and only if
215
wi [xi ] depends on ri [x j ]. By specifying dependency relations for any created object version of a
read-write transaction Ti , false write-read conflicts concerning Ti and some previously committed
transaction T j can easily be detected as the following example shows:
Example 11.
MV H10 = bc1 b1 r1 [y0 ] r1 [x0 ] b2 r2 [y0 ] bc2 w1 [x1 ] r2 [x0 ] bc3 w2 [y2 ] c2 bc4 c1
[DR(x1 ) = {x0 }, DR(y2 ) = {x0 , y0 }, x0 x1 , y0 y2 ]
History MVC10 indicates which read and write operations of transactions T1 and T2 are executed
in what order. Both transactions modify the database state by updating objects x and y, respectively.
Additionally, two dependency relations are associated with MVC10 (added below MVC10 to the right
of page in square brackets), indicating that object version x1 depends on the value of x0 , whereas
object version y2 depends on the values of x0 and y0 . Because of that information, we know that
T1 ’s read operation r1 [y0 ] does not contribute to T1 ’s write operation and, therefore, r1 [y0 ] can be
conceptually treated as a separate transaction T10 that can be disentangled from the rest of transaction
T1 . As we now have three separate transactions T1 , T10 , and T2 , MVC10 can be serialized in the order
T10 → T2 → T1 which was not possible before splitting T1 into two transactions.
Another technique for reducing the number of transaction aborts is to provide alternatives for
all those write operations that are likely to cause conflicts with other concurrent transactions [126,
127, 150]. Such alternative writes can be used as fallback operations to resolve ww- or rw-conflicts
detected by the client or server transaction manager. As an example illustrating the basic idea
behind this concept, consider a mobile train ticket selling and reservation system:
Example 12.
In such a system, train passengers can reserve and buy tickets directly on the train. Suppose a customer is trying to book a window seat for the second class of the train going from Zurich to Milan
leaving Zurich at 8:10 pm on the 11/10/2004. Further suppose that by the time the train guard enters
the ticket request into the system, the mobile device running the ticket application is disconnected
from the central database server. Additionally, assume the guard gets informed by the local system
that the desired seat category was nearly sold out by the time the client got disconnected from the
216
server. Hence, the customer would be asked to provide alternative travel arrangements in order to
prepare for the case that the intended booking fails. By providing alternative write operations to
those specified in the initial booking transaction, the customer increases the likelihood of the transaction being validated positively at the server since the transaction manager now has the opportunity
to prevent the transaction from being aborted due to ww- or rw-conflicts with other concurrent, but
previously committed, transactions.
Last, but not least, it is important to note that this approach is especially attractive for operationheavy and conflict-troubled transactions. However, as shown in Subsection 6.5.5.2, even short
transactions can significantly benefit from this approach.
A further approach that has the potential to achieve a higher degree of concurrency of transactions is by analyzing higher-level application and database operations with the purpose of identifying pairs of operations that either unconditionally commute or those that commute only in
specific database states. The concept of commutativity is well-known to the database community [11, 25, 66, 140, 157, 158] and is therefore only briefly discussed in context of the MVCC-*
suite. Since state-based commutativity is a generalization of unconditional commutativity, the former is more representative and hence underlies the following discussion. In order to apply and
decide on state-based commutativity, transactions and their statements are associated with pre- and
post-conditions, i.e., a transaction Ti is represented as a triple of the form {CPre,i } Ti {CPost,i } and
the j-th atomic statement Si, j of Ti is specified analogously. Thereby, pre-conditions are a set of assertions about expected state of database objects as well as program parameters that must be obeyed
in order to guarantee the correct execution of the transaction/operation; and post-conditions are a
set of conditions about the state of the database and the program after the transaction/operation
has finished its execution. Now let us turn to an example illustrating the concept of state-based
commutativity and its potential in the framework of the MVCC-* suite:
217
Example 13.
Consider a mobile airport flight control and ticket sales database providing information to the
ground personnel and to all the computerized staff at the airport. Suppose there exists a portable
travel agency whose services can be utilized by each person at the airport that possesses the necessary equipment to communicate with its portable computer/PDA through the wireless airport
network with the agency. The travel agency offers its customers the means to reserve and book
so-called “last minute” flights. Suppose the travel database contains among other things the following relation (the underlined attribute denotes the primary key of the relation): Flight-Leg(LegId,
AirplainId, Avail-Seats, Reserved-Seats). To achieve high response times, Flight-Leg is directly
accessible and modifiable by all authorized customers of the airport travel agency. Now suppose
that the following multi-level history has been produced by two agency customers C1 and C2 that
have issued transaction T1 and T2 , respectively:
Level 2
Level 1
Level 0
T1
SELECT AirplainId, Avail−Seats
FROM Flight−Leg
WHERE LegId=’B1234D’
r1[x0]
T2
UPDATE Flight−Leg
SELECT AirplainId, Reserved− UPDATE Flight−Leg
SET Avail−Seats=Avail−Seats − :x SET Reserved−Seats=Reserved−Seats + :y
Seats, Avail−Seats
FROM Flight−Leg
r2[x0]
r1[x0]
w1[x1]
r2[x0]
w2[x2]
Figure 6.6: Two level history showing lower and higher order operations of a ticket buying and ticket
reservation transaction.
It is intuitively clear that the database is in a consistent state if the following conditions are satisfied:
I1 : Avail-Seats ≥ Reserved-Seats and I2 : Avail-Seats ≥ 0. Therefore, T1 and T2 can be characterized by the triples {Avail-Seats − x ≥ 0 ∧ Temp = Avail-Seats} T1 {Avail-Seats = Temp − x} and
{Reserved-Seats + y ≤ Avail-Seats ∧ Temp = Reserved-Seats} T2 {Reserved-Seats = Temp + y},
respectively. For simplicity considerations, we do not define individual pre- and post-conditions for
the SQL-statements of T1 and T2 and assume that the transactional conditions also apply to them.
Now let us go back to the multi-level history in Figure 6.6. If one considers read and write operations at level 0 of the history, it becomes obvious that T1 and T2 are not serializable since the
corresponding MVSG is cyclic. Despite this fact, the history is semantically correct in the sense
218
that its post-condition is the same as the post-condition of a serial history of transactions T1 and
T2 if we assume that the following pre-conditions were valid before both transactions started their
execution: I3 : ((Avail-Seats − x) − (Reserved-Seats + y)) ≥ 0. If I3 holds at their starting points, it
does not matter whether T1 ’s operations are executed before, during, or even after T2 ’s operations
since I3 covers the pre-conditions of T1 and T2 . This means that the history’s result is independent of the execution order of T1 and T2 and integrity constraints I1 and I2 will not be invalidated.
Consequently, T1 and T2 can be positively validated, i.e., accepted, at the clients and server.
To sum up, we conclude that the three proposed techniques are actually compatible with each
other in the sense that they are complementary rather than competitive. These techniques have the
potential to achieve a higher degree of concurrency by either splitting original transactions into
logically independent subunits, or by providing alternative write operations to those specified in
the original transaction, or by specifying non-interference conditions for transactional operations.
6.5
Performance Evaluation
This subsection describes the experimental setup for evaluating the MVCC-* suite presented above.
The experiments were performed using a discrete event-driven simulator implemented with the
CSIM simulation package [136]. We opted for a simulation approach rather than a mathematical
analysis since performance metrics such as throughput, abort rate, etc. are dependent on various
parameters such as workloads, connectivity between clients and servers, cache replacement strategy, etc. which are not particularly amenable to analysis. The subsection is organized as follows:
a description of the experimental setup including the system used and the workload model is given
in Sections 6.5.1 and 6.5.2. In Section 6.5.3, we present the motivation behind choosing SI as a
comparison scheme for the various protocols of the MVCC-* suite in the performance study. Sections 6.5.4 and 6.5.5 present experimental results obtained by our simulation work. The results show
that MVCC-IBOTO is superior to the protocols of the MVCC-* suite. The results also show that
the cost of providing strong consistency (serializability) to read-write transactions is relatively high,
i.e., it is much more expensive to provide serializability to mobile clients than weaker consistency
219
guarantees such as SI, contradicting the results experimentally measured in the stationary clientserver environment [9]. Results further show that MICP even outperforms LRFU-P when used as
client cache replacement and prefetching policy to efficiently support read-write transactions and
not only read-only transactions.
6.5.1
Simulator Model
We constructed the simulator and workloads by extending the settings used in the simulation studies
presented in Chapters 4 and 5. These studies were conducted to evaluate the performance tradeoffs
when providing various degrees of consistency and data currency to read-only transactions in mobile broadcast-based environments and to quantify the performance improvements achievable when
using MICP as cache replacement and prefetching policy at mobile clients in lieu of other wellknown policies. In order to provide means for comparing the performance results to those in our
previous studies, we preserved key simulation parameters and only extended them where necessary.
The parameters of the study are listed in Tables 6.2 and 6.3. The simulation components will only
be briefly discussed in the following since they are well-known from previous chapters.
Broadcast Server and Mobile Clients:
The core of the simulator consists of 10 mobile clients and a single broadcast/database server.
Client processors run at 100 MIPS while the server’s CPU has the power of 1,200 MIPS. These values reflect typical processor speeds of mobile PDAs and high-performance workstations observed
in production systems about three years ago. CPU costs are associated with the events listed in
Table 6.2. With respect to storage capacity, clients are diskless and have a relatively small memory
cache capable of storing at most 2% of the objects maintained in the database. The client cache is
modeled as a hybrid consisting of a small page cache (20% of the CCSize) and a large object cache
(80% of the CCSize). The page cache and the object cache are managed by the LRU and MICP-L
policies, respectively. The server, on the other hand, is equipped with a relatively large memory
cache that may store up to 20% of the database. Similar to the client cache, the server cache is
partitioned into a page cache and an object cache. The page cache is managed using an LRU policy
and the object cache is implemented as a mono-version cache maintained in FIFO order, i.e., the
220
Parameter
Page size (PGSize)
Cache replacement policy
Rotational speed
Media transfer rate
Average seek time (read)
Page fetch time
Disk array size
Client CPU speed
Server CPU speed
Parameter Value
10,000 objects
100 bytes
4,096 bytes
20% of DBSize
20% of SBSize
80% of SBSize
LRU
5,000 instr
10,000 RPM
40.00 Mbps
4.5 ms
3.0 ms
7 instr/byte
7.6 ms
4
100 MIPS
1,200 MIPS
300 instr
5,000 instr
300 instr
300 instr
300 instr
50,000 instr
5,000 instr
Table 6.2: Summary of the system parameter settings – I.
object cache is treated similar to the MOB as proposed in [53]. Besides, the server has secondary
storage which is modeled as a disk array consisting of 4 disks. Data pages are statically assigned
to one of the available disks and each disk is modeled as FIFO queue scheduling operations in the
order of their arrival. Disk parameters are listed in Table 6.2 and reflect typical transfer and access
times of existing devices.
Database and Broadcast Program:
The simulated database consists of a set of 10,000 objects sized 100 bytes each. As each
disk page is 4 KB in size, the database is stored on 250 disk pages. We selected a relatively
small database size in order to make the simulations computationally feasible for today’s computer
hardware. The broadcast program determines which objects of the database are how frequently
221
disseminated to the client population. For reasons of simplicity, the program is modeled by a single
broadcast disk, i.e., all data objects are disseminated with the same frequency. To account for the
fact that data access in databases is typically skewed [68] and, therefore, access probabilities differ
widely among objects, only the most popular 20% of the database are broadcast. Units of data
transfer are disk pages, and in order to keep the size of an MBC short, the server broadcasts only
the most recent version of any scheduled data object. The broadcast program is static in nature
and is organized into 5 equally structured segments. Each segments consists of a data segment, a
(1,m) index [78], and a CCR as described in Table 6.2.1.
Network Model:
The network infrastructure of a complete hybrid data delivery environment consists of three
communication paths: (a) a broadband broadcast channel, (b) multiple uplink channels, and (c)
multiple downlink channels. The network parameters of the communication paths are modeled after a real system such as Hughes Network System’s DirecPC1 [70]. The broadcast bandwidth is set
to 12 Mbps and the unicast bandwidth is set to 400 Kbps downstream and to 19.2 Kbps upstream.
The unicast network is modeled as a FIFO queue and in order to model communication bandwidth
restrictions, the number of uplink and downlink channels is limited to two communication links
in each direction. Charged network costs consist of fixed and variable cost components and are
levied for point-to-point messages only. To model the fact that mobile communication networks are
inherently unreliable, thus communication links can be interrupted, we use an additional client parameter termed disconnection probability. Thereby, a disconnection probability of zero means that
clients do not suffer from intermittent connectivity and a probability of one indicates that network
connection between the server and the clients cannot be established at all. In order to determine
how the protocols perform under perfect network conditions, we experimented with a disconnection probability of zero. However, we later changed this assumption in the sensitivity analysis of
the simulator.
1 DirecPC
is now being called DIRECWAY [71].
222
Parameter
2% of DBSize
20% of CCSize
Client object cache size (OCSize)
80% of CCSize
LRU
MICP (LRFU-P, P-P)
REC size
≥ 50% of OCSize
NON-REC size
≤ 50% of OCSize
Aging factor α
0.7
Replacement policy control parameter λ
0.01
PCB calculation frequency
5 times per MIBC
1
20% (20 – 100%) of DBSize
5
5
Bucket size
4,096 bytes
Bucket header size
96 bytes
Index header size
96 bytes
Index record size
12 bytes
Object ID size
8 bytes
Network Parameters
Broadcast bandwidth
12 Mbps
Downlink bandwidth
400 Kbps
Uplink bandwidth
19.2 Kbps
Fixed network costs
6,000 instr
7 instr/byte
300 ms
2
Client disconnection probability
0% (10–50%)
Client disconnection period
5 MIBCs
Table 6.3: Summary of the system parameter settings – II.
6.5.2
223
Workload Model
The simulated workload is synthetically produced by two different workload generators. The workload generators differ from each other in the way and in the place where they produce read-write
transactions. In our simulator we use one generator that continuously produces read-write transactions at the server. Its purpose is to generate data contention within the system and hence, it
indirectly controls the data conflict rate of concurrent transactions. In the standard setting of the
simulator, the server workload generator issues two read-write transactions per MIBC and each
transaction has a fixed length of 25 data operations. Objects read and written by those read-write
transactions follow a Zipf distribution [168] with parameter θ = 0.80 and the write-read ratio, i.e.,
the number of writes versus reads, amounts to 1/4, which approximately reflects the average data
access and update behavior of transactions in production systems [68]. The second workload generator operates at mobile clients and differs from that at the server in two ways: (a) It produces
read-write transactions of variable length from 5 to 25 data operations depending on the respective
simulator setting with default transaction size of 10 data operations. (b) Data access and update operations of client transactions are slightly more skewed than those of server transactions and follow
a Zipf distribution with parameter θ = 0.95. To produce resource contention in the network and at
the server, the basic setup of the simulator imitates the activities of 10 clients. This number is then
varied up to 50 clients in the sensitivity analysis. The data access behavior of read-write transactions
that have to be aborted due to an irresolvable conflict is controlled by the abort variance parameter
which is set to 100 percent, meaning that restarted transactions do not necessarily read or write the
same set of objects as their original transactions. Such a parameter value stresses the system since
re-issued transactions do not profit from caching operations of their initial transactions. Table 6.4
summarizes the workload parameters used in the experimental study.
6.5.3
Comparison with other CC Protocols and Integrating Conflict Prevention
Measures into the MVCC-* Suite
In order to be able to evaluate the performance of our protocols in comparison to previously proposed ones, we decided, additionally to the algorithms of the MVCC-* suite, to implement the
224
Workload Parameters
Parameter
Number of database servers
Number of clients
Number of server-initiated read-write transactions per MIBC
Read-write transactions size (server)
Read-write transactions size (clients)
Write-read ratio of client/server transactions
Abort variance
1
10 (10 – 50)
2
25 operations
0.95
10 (5 – 25) operations
0.80
1
1/4
100%
100%
Table 6.4: Summary of the workload parameter settings.
Snapshot Isolation (SI) scheme [23] into the simulator. We chose the SI scheme for comparison
since it provides the same currency guarantees as the MVCC-BOT protocol, but ensures strictly
weaker data consistency to transactions. As it does not avoid all known anomalies and may thus
produce histories containing phantoms or write skews, SI does not guarantee serializability. However, as this protocol avoids many anomalies and is nowadays implemented in many database products such as Oracle 10g [82, 117] or PostgreSQL [57], it is attractive to benchmark our protocols
against it.
Besides those comparisons, we wanted to quantify the performance impact of extending the
protocols of the MVCC-* suite by some of the conflict reducing and resolving measures proposed in
Section 6.4.3. From the three proposed approaches, we selected one that is based on the prerequisite
that clients provide alternatives for each intended write operation. We opted for the alternativity
technique since there exists a wide spectrum of applications (e.g., think of sales, appointment, or
procurement applications) where it can be applied, and it neither requires any severe adaptations of
the MVCC-* suite nor does it have to be presented in the context of some application scenario. The
alternativity technique has been integrated into the MVCC-* suite as follows: Whenever a write
operation is performed by an active read-write transaction Ti , the user or, to be more exact, the
workload generator randomly selects from a set of non-conflicting writes up to 3 alternative write
operations for any of the original write actions of Ti . Those alternative operations will then be used
whenever a rw-conflict between Ti and some validated read-write transaction T j is detected. If any
225
of those additionally provided write operations can resolve the conflict, processing of Ti continues.
Otherwise, Ti is aborted as usual.
6.5.4
Basic Experimental Results
We now present the results of the experiments run under the baseline settings of the simulator. Regarding the statistical accuracy of all subsequently illustrated performance measures it is important
to note that they lie within a 90% confidence interval with a relative error of ±5%. Figures 6.7(a)
and 6.7(b) depict these results as the number of operations per transaction increases from 5 to 25.
Thereby, Figure 6.7(a) represents the throughput rate per second as a function of the transaction
length for both the protocols of the MVCC-* suite and SI. Note that we do not show performance
results for MVCC-IBOT and MVCC-BOT in Figure 6.7 since both protocols are inferior to their
optimized variants. The results show that MVCC-IBOT O outperforms MVCC-BOTO and MVCCEOT by about 31% and 83%, respectively, in the sense that the system performance would degrade
by the specified percentage if MVCC-IBOTO would not be deployed as CC protocol. Additionally,
it can be seen that SI’s performance is superior to that of all the protocols of the MVCC-* suite.
Figure 6.7(b), in its turn, shows the relative performance difference between SI and the MVCC-*
suite. On average, the performance penalty relative to SI is about 40% for the best performing
MVCC-* protocol, i.e., the penalty of providing serializability to mobile transactions is significant.
The plots also show that the penalty of ensuring serializability rises with increasing transaction
length since more transactions need to be restarted by the protocols of the MVCC-* suite relative
to SI. It is interesting to note that the results visualized in Figure 6.7 diverge significantly from
those experimentally investigated for stationary environments. In [9] the performance degradation
from providing serializability (IL 3) relative to protocols with lower consistency guarantees (IL 2)
was examined for a conventional client-server database system where communication is carried
out through a high bandwidth, reliable, and low latency network. There, the experimental results
show that serializability can be achieved by a performance penalty of only 1% to 9% relative to
protocols that ensure IL 2 data consistency guarantees. Consequently, the author concludes that it
is not worthwhile to execute transactions under weaker consistency levels than serializability since
226
lower levels do not exclude the possibility of violating the integrity of the database. Despite significantly higher serialization costs experienced in mobile environments, we still believe that clients
(whether mobile or stationary) should not trade-off data consistency for performance improvements
by means of violating database integrity constraints.
6
MVCC−IBOTO
MVCC−BOT O
MVCC−EOT
RPP compared to SI (Percent)
5
Throughput / Second
120
SI
MVCC-IBOTO
MVCC-BOTO
MVCC-EOT
4
3
2
1
100
80
60
40
20
0
5
10
15
20
25
Transaction Length
(a) Absolute throughput performance
0
5
10
15
20
Transaction Length
25
(b) Relative throughput performance
Figure 6.7: Transaction throughput and relative performance penalty (RPP) of the MVCC-* suite compared
to SI as the transaction size is increased.
Additionally, we conducted experiments in order to quantify the degradation of the overall
system performance that occurs when using MVCC-BOT and MVCC-IBOT in lieu of their optimized companions. The results of the experiments are summarized in Figure 6.8 showing that the
penalty of using the unoptimized protocols grows with increasing transaction length. On average,
the performance difference of MVCC-IBOT and MVCC-BOT is on average about 8% relative to
the optimized versions. Since the measured performance degradation due to inefficient transaction
validation is quite significant, we believe that the additional processing overhead of the optimized
protocols compared to their basic variants is well compensated by the performance gain.
6.5.5
Results of the Sensitivity Analysis
In the following subsections we present the results of a sensitivity analysis conducted to understand
how different system and workload parameters affect the overall system performance of the protocols of the MVCC-* suite in comparison to each other and to SI. We also report on the performance
227
16
MVCC−IBOT
MVCC−BOT
RPP comp. to the
Optimized Protocols (Percent)
14
12
10
8
6
4
2
0
5
10
15
20
Transaction Length
25
Figure 6.8: Relative performance penalty of deploying MVCC-BOT and MVCC-IBOT, respectively, in lieu
of their optimized variants.
impact on the protocols of the MVCC-* suite and SI when providing alternative write operations
to those specified in the original transaction. Last, but not least, we present results of a performance analysis comparing the cache replacement and prefetching policy MICP-L, used to improve
response times of read-write transactions, vs. LRFU-P and P-P.
6.5.5.1
Effects of Varying the Data Contention Level
To understand the impact of data contention on the protocols of the MVCC-* suite as well as
the SI scheme, we varied the number of update transactions generated per MBC by the server
workload generator. Remember, in the default setting of the simulator the generator produced 10
read-write transactions per MBC. For the sensitivity experiments, we varied the number of readwrite transactions issued by the server starting from 5 up to 25 transactions per MBC. As the
results in Figures 6.9(a) and 6.9(b) show, the performance of the MVCC-* schemes compared
to SI degrades both in terms of absolute numbers and relative percentages. The reason is that if
there are only few server transactions executing in parallel to the transactions run at the clients,
the data conflict rate is relatively low and, therefore, transaction restarts are rare. However, if we
increase the number of concurrently active transactions gradually, the transaction abort rate will
grow superlinearly causing the overall system performance to degrade at the same rate. Besides,
the MVCC-* schemes suffer from higher data contention levels to a greater extent than SI due to
228
the relatively faster increasing probability of transaction aborts for the former.
2.5
MVCC−IBOTO
MVCC−BOTO
MVCC−EOT
RPP compared to SI (Percent)
2
Throughput / Second
120
SI
MVCC-IBOTO
MVCC-BOTO
MVCC-EOT
1.5
1
0.5
100
80
60
40
20
0
5
10
15
20
25
Number of Update Transactions per MBC
0
5
10
15
20
25
Number of Update Transactions per MBC
Figure 6.9: Absolute and relative transaction throughput by varying the number of update transactions
issued per MBC.
6.5.5.2
Effects of Providing Alternative Write Operations to Transactions
The MVCC-* suite was designed to provide serializability and well-defined data currency guarantees to general purpose applications without having to exploit any application or user knowledge
for carrying out CC. In Section 6.4.3, we described that the technique of using alternatively specified write operations to prevent active read-write transactions from experiencing data conflicts
with recently committed transactions has the potential of achieving a higher degree of concurrency
in the system. However, the method has the fundamental disadvantage of undermining the userfriendliness of the application since it is the user that is expected to specify write alternatives for
transactions. In order to be able to better evaluate the attractiveness of this technique, we run experiments quantifying its impact on the overall system performance. For this purpose, we varied
the number of alternative write operations provided for each transactional object write from 0 to 3.
The results of the experimental studies are represented in Figures 6.10(a) and 6.10(b), respectively.
They show a notable performance improvement of at least 40% for the investigated protocols if just
one additional write operation is associated with each original write operation. Additionally, the
plots show that providing more than one additional write operation for each original one results
229
in a sublinear performance increase. Nonetheless, transaction throughput nearly doubles (irrespective of the investigated protocol) if each original write operation is backed up by three additional
ones. The reader may wonder why we do not present experimental results for the MVCC-EOT
protocol. The reason is that the scheme does not benefit from this conflict-reducing technique since
rw-conflicts between a validating transaction Ti and some read-write transaction T j ∈ Tactive (Ti ) do
not matter under the MVCC-EOT protocol. As the scheme enforces EOT data currency guarantees,
a validating transaction Ti will always be serialized after all previously committed transactions and,
therefore, rw-conflicts are not an issue.
3.5
3
Improvement in
Throughput / Second
100
SI
MVCC-IBOTO
O
MVCC-BOT
2.5
2
1.5
Transaction Throughput (Percent)
4
90
SI
MVCC−IBOTO
MVCC−BOT O
80
70
60
50
40
1
0
1
2
3
Number of Alternative Write Operations
30
1
2
3
Number of Alternative Write Operations
Figure 6.10: Absolute and relative performance improvements by providing alternative write operations.
6.5.5.3
Effects of Intermittent Connectivity
Mobile clients may either voluntarily or involuntarily get disconnected from the hybrid data delivery
network. In order to determine the effect of disconnections on the overall system performance, we
have run the simulator under two connectivity setups: (a) First, we simulated the case of mobile
clients being partially disconnected from the network for fixed periods of time. By the notion of
partial disconnections we mean that clients’ networking capabilities are restricted to accessing and
downloading data from the broadcast channel, i.e., if clients operate in partial disconnection mode,
there is no means for them to communicate to the server through a point-to-point channel. (b)
230
Second, we simulated the case of communication between clients and the server being completely
interrupted for fixed time intervals.
The results of periodically interrupting the point-to-point communication between mobile
clients and the server are presented in Figures 6.11(a) and 6.11(b), respectively. The plots show
the performance degradation of the investigated protocols as a function of the disconnection probability. Remember that the disconnection probability specifies the likelihood of client’s inability
to communicate over one or more communication medias. As the results show, the performance
penalty experienced by clients due to partial disconnections is negligible if disconnections occur
relatively infrequent, i.e., up to 10% of the overall simulation time. Otherwise, the performance degrades moderately by about 8 to 14% if the point-to-point communication is interrupted for about
a quarter to a half of the clients’ total processing time. The reason for that relatively small performance drop due to unreliability of the back-channel to the server is twofold: (a) The majority of
the client data requests can either be satisfied from the client cache or from the broadcast channel,
i.e., the uplink communication channel plays only a tangential role for satisfying data requests. (b)
Since transactions are long-lived due to high RTTs over wireless channels and are therefore likely
to experience many data conflicts, the number of transaction pre-commits is relatively low and thus,
the back-channel is only seldom required to initiate final transaction validations.
2
Throughput / Second
1.8
1.6
1.4
1.2
1
SI
MVCC-IBOTO
MVCC-BOTO
MVCC-EOT
0.8
0.6
0.4
0.2
0
0.1
0.2
0.3
0.4
Disconnection Probability
0.5
RPP of Intermittent Connectivity (Percent)
14
SI
MVCC−IBOTO
MVCC−BOTO
MVCC−EOT
12
10
8
6
4
2
0
0
0.1
0.25
0.5
Figure 6.11: Absolute and relative performance degradation with increasing disconnection probability (partial disconnection).
231
The results of the experiments simulating the scenario where the clients are detached from
both communication channels while being in disconnected mode are shown in Figures 6.12(a)
and 6.12(b). As before, increasing the disconnection probability causes the transaction throughput rate to drop. However, in contrast to the previous experiments, the decline in the overall system
performance becomes even more significant in cases of rare disconnections. For example, if the
probability of a client being separated from the mobile network is 10%, the system performance
degrades from 24 to 42% depending on the protocol used. Further, the plots show that the performance decline accelerates with increasing disconnection probability which is explained by the fact
that clients operating in total disconnection mode may miss CCRs desirable to ensure cache consistency and freshness and to pre-validate active read-write transactions. Another major drawback of
total disconnections is that they hinder clients from acquiring non-cache-resident object versions,
i.e., whenever a cache miss occurs, transaction processing is blocked until the client reconnects to
the server. Depending on the frequency and duration of disconnection periods, transaction process-
2
SI
MVCC-IBOTO
MVCC-BOTO
MVCC-EOT
1.8
Throughput / Second
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0
0.1
0.2
0.3
0.4
0.5
RPP of Intermittent Connectivity (Percent)
ing is impeded significantly as Figure 6.12 clearly demonstrates.
100
SI
MVCC−IBOTO
MVCC−BOTO
MVCC−EOT
90
80
70
60
50
40
30
20
0.1
0.25
0.5
Figure 6.12: Absolute and relative performance degradation with increasing disconnection probability (total
disconnection).
232
6.5.5.4
Effects of Using Various Caching and Prefetching Policies
Our last experiment was aimed at proving the performance superiority of MICP-L over LRFUP, which is a modified variant of the LRFU cache replacement policy [98, 99], used in order to
improve the response times of read-write transactions. Remember, MICP-L’s original goal was
to improve transaction response times of read-only transactions. In Chapter 5 we showed that
the performance degradation of using LRFU-P is about 19% vs. MICP-L when used to meet the
data storage requirements of read-only transactions. Since the CC protocols proposed for read-only
transactions [123,124,138,139] have similar requirements on the client cache manager than those of
the MVCC-* schemes2 , we expect MICP-L to outperform LRFU-L if used along with the MVCC-*
suite.
In the following we present the experimental results combining the MVCC-IBOT O protocol with
various cache replacement and prefetching policies such as P-P, LRFU-P, and MICP-L. We chose
MVCC-IBOTO as CC protocol for these experiments since it turned out to be the best performing
protocol among those of the MVCC-* suite. The reason for selecting P-P in addition to LRFUP and MICP-L is that the P-P protocol’s characteristics such as perfect knowledge of the access
probabilities of all database objects and ease of implementation allows us to use its experimental
results as a baseline for the comparison with the other two protocols. Further, to be able to compare
the results with those measured for read-only transactions in Chapter 5, we run our simulator with
the same system and workload settings as used there. To do so, we increased the number of data
updates produced by the server workload generator from 50 to 100 per MBC. The results of the
experiments are graphically shown in Figures 6.13(a) and 6.13(b). As in our previously reported
experiments (see Section 5.4.4), P-P significantly outperforms both online cache replacement and
prefetching policies MICP-L and LRFU-P. However, and more importantly, MICP-L is also superior over LRFU-P if used to accelerate the response times of read-write transactions. On average,
relative performance degradation when deploying LRFU-P in lieu of MICP-L is about 6%. The
drop in the performance advantage of MICP-L vs. LRFU-P if used for read-write transactions and
2 To
achieve an optimal level of concurrency all aforementioned protocols exploit multi-versioning and, therefore,
expect the cache manager to maintain non-current as well as current object versions in the client cache.
233
not for read-only transactions is due to the fact that read-write transactions do not benefit to the same
extent as their read-only counterparts from observing non-current data objects. Since MICP-L and
LRFU-P primarily differ from each other in the way they handle those object versions, MICP-L’s
superiority over LRFU-P diminishes somewhat. In order to enable the reader
adjusted the scaling of the plots presented in Section 5.4.4 and present them in Figures 6.14(a)
and 6.14(b) once more.
5
P-P
MICP-L
LRFU-P
4
50
MICP−L
LRFU−P
RPP to P−P (Percent)
Throughput / Second
3
2
1
40
30
20
10
0
5
10
15
20
0
25
5
Transaction Length
10
15
20
25
Transaction Length
Figure 6.13: Absolute and relative performance deviation between P-P, MICP-L, and LRFU-P when varying
the read-write transaction size.
20
16
50
MICP−L
LRFU−P
45
RPP to P−P (Percent)
Throughput / Second
55
P-P
MICP-L
LRFU-P
12
8
4
40
35
30
25
20
15
10
0
5
10
15
20
Transaction Length
25
5
5
10
15
20
Transaction Length
25
Figure 6.14: Absolute and relative performance deviation between P-P, MICP-L, and LRFU-P when varying
the read-only transaction size.
234
In preceding sections of this chapter we have given detailed description of a suite of MVCC
protocols ideally applicable in hybrid data delivery environments. Besides, we have provided the
correctness proofs of the protocols, have shown their performance under various system settings
and workloads, have evaluated them against the SI protocol, have given an indication on how their
performance could be improved by exploiting the semantics available from the user and/or application, and finally have investigated MICP-L’s performance when servicing clients that execute
read-write transactions. The quantitative performance analysis has shown that MVCC-IBOT O is the
best performing protocol in the MVCC-* suite. Performance-wise MVCC-IBOTO is followed by
MVCC-BOTO and MVCC-EOT, which indicates MVCC protocols’ superiority over mono-version
schemes in mobile environments. Additionally, comparing and contrasting the performance results
of MVCC-IBOTO and MVCC-BOTO demonstrates that forcing read-write transactions to read object versions that were current at the transaction’s starting point is not the optimal strategy when
application responsiveness is to be maximized. Instead, when mapping read operations to actual
version reads, the read forward policy should be applied which allows reads of object versions written after the starting point of a read-write transaction Ti as long as Ti has not been invalidated by any
concurrent read-write transaction T j . It is also important to note that despite the underperformance
of the other protocols of the MVCC-* suite compared to MVCC-IBOTO , their existence is fully justified. Since both MVCC-BOTO and MVCC-EOT provide data currency guarantees to read-write
transactions different to those of MVCC-IBOTO and each of those degrees may be desirable in one
or the other application scenario, all these protocols are useful for CC purposes.
To provide some assistance when selecting the most appropriate CC scheme out of the MVCC-*
suite, we summarized the protocols’ major characteristics in Table 6.5, which contains the protocols’ features described from the perspective of a client processing a read-write transaction Ti . In
summary, the information presented in Section 6.5 shows that MVCC-EOT is the cheapest protocol in the MVCC-* suite in terms of both space and processing overhead. However, it clearly
suffers from its weak performance results and, therefore, should only be used in situations where
EOT data currency guarantees are necessary for correctness reasons. Overhead-wise MVCC-EOT
is followed by MVCC-IBOT and its optimized variant MVCC-IBOTO . In contrast to MVCC-EOT,
235
however, MVCC-IBOT and MVCC-IBOTO outperform the latter significantly. As MVCC-IBOT
and MVCC-IBOTO perform better than MVCC-BOT and its optimized variant, and as the latter
pair incurs the highest storage and processing costs, MVCC-IBOT and MVCC-IBOTO are the first
choice if the overall system performance is to be maximized. MVCC-BOT or MVCC-BOTO may
be applied if BOT data currency requirements are imperative from the application point of view.
Besides, if application response time provided by any of the proposed protocols is not satisfactory from the user’s point of view, the protocols can be extended by a number of measures such
as associating dependency relations to transactions, providing alternative write operations to those
specified in the original transaction, etc. The implication of exploiting such techniques for transaction processing has been quantified through simulation and the results have clearly proven their
attractiveness for mobile computing. However, when deploying such measures, one has always to
remember that they are application-dependent, that their use is prone to error and they complicate
application programming and may deteriorate the application’s user-friendliness.
236
Protocol
Data currency
guarantee
MVCCBOT
database state as
of the
transaction’s
starting point
MVCCBOTO
see MVCC-BOT
protocol
MVCCIBOT
database state
between the
transaction’s
starting and
commit point
Storage
space
overhead
at the
client
Processing
overhead
at the
client
Influence of
disconnections
Performance
penalty
relative to
MVCCIBOTO
moderate
moderate
no influence on
protocol
correctness, but on
transaction
throughput
36%
moderate
moderate,
but higher
than
MVCCBOT
see MVCC-BOT
protocol
31%
moderate
low,
if RFF(Ti )
is set to
true;
otherwise,
moderate
if RFF(Ti ) is set to
false, same
influence as on
MVCC-BOT;
otherwise, it
requires Ti to end its
RFP
8%
see MVCC-IBOT
protocol
–
see MVCC-BOT
protocol
83%
MVCCIBOTO
see MVCC-IBOT
protocol
moderate
low,
if RFF(Ti )
is set to
true;
otherwise,
moderate
MVCCEOT
database state as
of the
transaction’s
commit point
low
low
Table 6.5: The MVCC-* suite at a glance.
“All truths are easy to understand once
they are discovered; the point is to discover them.”
– Galileo Galilei
Chapter 7
Conclusion and Future Work
In this thesis, we have focused on the problem of efficiently providing consistent and current data
to dissemination-based applications run at clients being part of a hybrid data delivery network. In
this last chapter, we summarize all the results presented. We then conclude with a discussion of the
various directions to extend this work.
7.1
Summary and Conclusion
Owing to the widespread deployment of wireless networks and ever-increasing capabilities of mobile devices, wireless data services quickly emerge as data-hungry users require instant access to
timely information no matter where they are located [71, 79, 108]. Due to the intrinsic constraints
of mobile systems such as asymmetric bandwidth, limited power supply and unreliable communication, the efficient and cost effective provision of wireless data services poses many research
challenges in itself. One of the most important issues is to provide data consistency and currency to
dissemination-based applications, and this topic has been intensively discussed within this thesis.
The thesis first presented background information on the basic concepts of wireless data communications, highlighted the characteristics, capabilities, and limitations of existing and newly
237
238
Chapter 7. Conclusion and Future Work
emerging wireless communication networks and discussed the various forms of asymmetry that
occur in mobile data networks. We then enumerated the various limitations of mobile computing
and discussed their influence on our objective to efficiently provide data consistency and currency
to information-centered applications despite frequent updates of the data source. Thereafter, we
proposed hybrid data delivery as the basis of providing highly scalable and efficient transaction
support to dissemination-based applications and presented its potential performance and scalability benefits in contrast to its underlying basic data delivery mechanisms, namely the traditional
request/response (or pull/unicast) and the rather novel push/broadcast. It was followed by a discussion of various performance-critical and other crucial issues — besides transaction support — that
are vital to the successful deployment of hybrid data delivery services. In this context, we focused
on the air-cache which serves as an abstract vehicle or intermediate memory level between the
mobile clients and the server, identified its special properties compared to other types of caching,
presented different ways to organize the air-cache and discussed their advantages and disadvantages
w.r.t. the critical issue of providing access efficiency. We identified power conservation as a second system-critical design component for hybrid data delivery networks and introduced air-cache
indexing as a solution to the problem of reducing the energy consumption of mobile devices when
locating and retrieving requested data objects in the air-cache. We distinguished three classes of
indexing: (a) signature-based, (b) hashing-based, and (c) tree-based indexing, described their basic
working principles and reported on results of two comparison studies that quantitatively evaluated
the performance of various instances of the three classes. As a result of the trade-off between tuning
time and access latency (and as reported by the performance studies), none of the three indexing
methods is superior to any of the other in terms of both performance metrics. The results, however,
showed that if the application scenario favors short latency at the cost of more energy consumption, a signature-based or hashing-based indexing method is the way to go. Otherwise, a tree-based
indexing method should be deployed.
Following those preliminary discussions, the thesis then focused on the main problem of
this work, the cost-efficient and adequate provision of data consistency and data currency to
dissemination-based applications. We first concentrated on the provision of efficient and reliable
7.1. Summary and Conclusion
239
data consistency and data currency support for queries, i.e., read-only transactions, as they constitute the majority of the transactions initiated by dissemination-based applications. We addressed the
limitations of existing IL definitions by showing that most of them lack any data currency guarantees. To rectify the problem, we proposed four new ILs, namely BOT Serializability, Strict Forward
BOT Serializability, Strict Forward BOT Update Serializability, and Strict Forward BOT View Consistency, that provide a set of useful data consistency and currency guarantees to disseminationbased applications. In contrast to the ANSI ILs, our specifications of the proposed levels are
implementation-independent and use a combination of conditions on serialization graphs and transaction histories. Furthermore, we presented new and efficient implementations of the newly defined
ILs based on optimism and multi-versioning. We also presented the results of a simulation study that
evaluated the relative performance of the different ILs’ implementations and additionally compared
their performance with previously proposed schemes. The results showed that the cost of providing
Full Serializability to read-only transaction compared to View Consistency, which is the weakest
consistency level that ensures a transaction-consistent view of the database, is relatively low ranging from as little as 1% up to 10%. Thus, if the application writer is in doubt whether running a
read-only transaction at a weaker IL would produce anomalous reads that may result in false or
misleading decisions, then serializability is preferable over any other level. Further, we conducted a
comparison study of our worst and best performing ILs’ implementations, namely MVCC-SFBVC
and MVCC-BS, with the invalidation-only and F-MATRIX-No schemes [124, 139]. The results
showed that MVCC-SFBVC and MVCC-BS are both superior to the other two protocols which is
a result of the strong data currency guarantees that the latter two enforce, obliging the scheduler to
produce only mono-version histories.
As a second major topic we tackled the issues of client cache management and data prefetching as they are fundamental techniques to improve the performance and scalability of hybrid data
delivery systems. We started by emphasizing that currently available client caching and prefetching
policies either designed for the stationary or for the mobile client-server architecture are not effective in supporting the data preservation and storage requirements imposed by MVCC protocols
suitable for read-only transactions. The reason for their inadequacy is that they treat all versions
240
of an object the same way, ignoring the fact that two distinct versions of the same object may have
different values to the client. To address this shortcoming, we proposed a novel multi-version integrated cache replacement and prefetching algorithm, called MICP. MICP logically divides the
available client cache size into two variable-sized partitions, coined REC and NON-REC, in order
to separate re-cacheable from non-re-cacheable object versions and to prevent that re-cacheable
and non-re-cacheable versions compete among each other for scarce storage space and that non-recacheable versions cannot be replaced by re-cacheable ones. In contrast to traditional caching and
prefetching policies, MICP does not only consider the access probability of an object to determine
whether it should be evicted from or pre-fetched into the client cache, but also its re-acquisition
costs and liklihood that it can be re-cached. MICP combines estimates of those three parameters
into a single performance metric, called probabilistic cost/benefit value (PCB), and calculates it for
any cache-resident object version whenever a demand-fetched or prefetched object version is to be
brought into a full cache. Then, an existing cached object version must be chosen as a replacement
victim and MICP does so by selecting the cached version with the lowest PCB value. As PCB
values are dynamic — they change with every “tick” of the broadcast — prefetching from the aircache is potentially very expensive to implement. MICP solves this problem by calculating PCB
values only for a small subset of the potential prefetching candidates, namely versions of recently
referenced objects and makes a prefetching decision only if a version of a recently referenced object is broadcast. To gain insight into the algorithms’ performance, we validated MICP or, more
precisely, MICP-L which is a lightweight version of MICP that calculates PCB values of cached
object versions only at pre-defined events rather than at every broadcast tick, with experimental
results drawn from a highly-detailed simulation model. The results demonstrate that the performance penalty of using LRFU-P or W 2 R-P as cache replacement and prefetching policy compared
to MICP-L is, on average, about 19% and 80%, respectively, and MICP-L’s cache hit rate is about
6% and 94% higher than that of LRFU-P and W 2 R-P, respectively. As MICP-L significantly outperforms LRFU-P which extends LRFU by prefetching from the air-cache and LRFU is currently
known to be the best performing online caching algorithm, we can deduce that MICP-L is able to
give dissemination-based applications issuing read-only transactions a much higher improvement
7.1. Summary and Conclusion
241
in response time than any other proposed caching and prefetching policy.
Last, but not least, we addressed the issue of providing data consistency and currency along with
good performance to broadcast-based applications that do not only want to access/consume shared
information, but also want to produce and integrate them into a universally accessible database.
We first showed that currently available ILs and CC protocols are not suitable for broadcast-based
applications issuing read-write transactions since they either lack any data currency guarantees or
do not ensure serializability. To rectify the problem, we proposed a suite of five new MVCC protocols, dubbed MVCC-BOT, MVCC-BOTO , MVCC-IBOT, MVCC-IBOTO , and MVCC-EOT, which
all ensure serializability along with BOT, IBOT, and EOT data currency, respectively. For each of
those protocols, we first specified their intended semantic guarantees, then defined rules and conditions that need to be applied and satisfied by the scheduler in order to enforce those guarantees,
and finally showed that the protocols produce only correct histories in accordance to their specifications. We also discussed various issues influencing the performance of the MVCC-* suite including
caching, intermittent connectivity, and exploiting semantic information about the read-write transactions being processed. We then argued that our integrated caching and prefetching policy MICP
also suits the data preservation and storage requirements of the MVCC-* family and should therefore be deployed at any mobile client that requires transaction support. We further discussed the
impact of network disconnections and network failures on the performance and operations of the
MVCC-* protocols. We identified the MVCC-IBOT and MVCC-IBOTO protocols as being vulnerable to disconnections and proposed two alternative ways to rectify the problem. Thereafter, we
showed how semantic knowledge about the objects and the operations that operate on them can be
used to identify false or permissive conflicts that occurred by using the MVCC-* suite and how real
conflicts can be resolved by means of specifying alternative write choices as part of their original
operations. Again, we concluded the chapter by presenting the results of a set of simulation-based
studies that investigated the relative performance differences of the protocols of the MVCC-* suite
and compared their performance with the well-known Snapshot Isolation scheme which provides
slightly weaker consistency guarantees than our protocols, hence allowing us to study the performance trade-off between serializability and a weaker consistency level in mobile broadcast-based
242
environments. The results showed that MVCC-IBOTO is the best performing protocols among
those of the MVCC-* suite, followed by MVCC-BOTO and MVCC-EOT. Further, the experiments
revealed that the costs of providing Full Serializability instead of the weaker Snapshot Isolation
guarantees to read-write transactions is, on average, about 40% in mobile networks which is significantly higher than the costs imposed to the clients in stationary networks [9]. We also presented
results of a detailed sensitivity analysis which was conducted to understand how different system
and workload parameters including the data contention level, provision of alternative write operations, network connectivity, and caching and prefetching policy affect the relative performance of
the protocols of the MVCC-* suite and the Snapshot Isolation scheme. The results can be summarized as follows: (a) Increasing the number of read-write transactions in the workload increases the
data contention in the system and thus, widens the throughput difference between the protocols of
the MVCC-* suite and the Snapshot Isolation protocol. (b) Specifying alternative write operations
in addition to their originally scheduled updates leads to a significant performance improvement of
at least 40% for the investigated protocols. (c) If the client suffers from intermittent connectivity
to the server, the throughput performance decreases only moderately if the disconnections are short
and constrained to the back-channel to the server. Otherwise, if clients are completely disconnected
from the hybrid network, the performance of the investigated protocols may degrade significantly
even under relatively low disconnection probabilities. (d) Using MICP-L rather than LRFU-P as
client caching and prefetching policy improves the system performance by about 6% on average,
making it the first choice for multi-version client-server environments.
7.2
Future Work
We believe that the results of the thesis have far reaching implications, as they provide system
developers with guidance in deploying large scale information dissemination systems whose task is
to deliver consistent and timely data to read-only and read-write transactions executing at mobile
clients. There is a lot of interesting research work to be done in the future and some possible
directions are highlighted below:
7.2. Future Work
243
• The thesis has shown that the cost of providing serializability to read-only transactions compared to weaker consistency guarantees including Update Serializability and View Consistency is not excessively high (≤ 10%) in broadcast-based data delivery networks. However, the cost of providing strong consistency guarantees such as Full Serializability to readwrite transaction compared to slightly weaker consistency conditions such as Snapshot Isolation is significantly higher (approximately 40%). Thus, it would be beneficial to execute
dissemination-based applications that do not only inspect, but also modify the database state
below serializability. However, and as mentioned in previous parts of the thesis, an important drawback of weaker consistency guarantees than serializability is that applications may
destroy database integrity if the application programmer does not analyze the program code
for potential conflicts with other transactions and prevents them to occur. Analyzing and detecting viable, i.e., non-interfering, interleavings of the execution of transactions at weaker
ILs than serializability is a non-trivial and error-prone task and should therefore be supported by a tool providing the programmer with an intuitive interface to perform the analysis
semi-automatically. The development of a preferably graph-based tool which partially automates the analysis process and visually supports the conflict detection and avoidance task,
thus takes a great burden from the application programmer, is a particularly important and
challenging research issue being currently under investigation in two independent research
groups [50, 104]. We believe that the existence of such a tool would certainly contribute
to the further promotion of semantics-based CC mechanisms in academia and, more importantly, in the commercial world.
• This thesis presented various MVCC protocols that provide efficient and scalable transaction
support for dissemination-based applications issuing both read-only and read-write transactions. Our protocols have been presented with the assumption that the sizes of the objects
being disseminated are relatively small (in the range from a few bytes to several dozens of
bytes). While the size of the objects is relatively unimportant for read-only transaction as
long as it is small relative to the cache size, this is certainly not true for read-write transactions. In contrast to read-only transactions which can immediately be committed by the
244
clients once the last read operation has been processed, read-write transactions require the
clients to communicate with the server to finally validate and, if successful, commit them. To
enable the server to validate a committing read-write transaction Ti against previously committed read-write transactions, the client maintains various information about the data being
accessed and written by Ti (e.g., Ti ’s read and write set) and sends them to the server along
with copies of the objects modified by Ti . Obviously, if the copies of modified objects are relatively small in size compared to Ti ’s validation information, their transmission through the
wireless medium does not take much additional time and, therefore, Ti ’s final validation can
be performed without much delay. However, if updated objects are large and thus, require
a large amount of time to be transfered to the server, the probability of the transaction not
being successfully validated increases as the effective degree of transaction concurrency and
data contention grows. In such a situation a more appropriate strategy might be to propagate
the operations (represented as plain text e.g., SQL, as compiled code, or as calls to stored
procedures) that modify the objects to the server rather than the modified objects themselves.
Clearly, using a so-called function-shipping approach to integrate the transaction updates into
the common database state does not come for free since it incurs additional load on the server
to re-execute the operations once again, i.e., the approach trades-off reduced network communication costs for an increase in the CPU costs of the server. In this respect it would be an
interesting research topic to investigate under which system and workload conditions either
a data-shipping or function-shipping approach is superior to the other when used to transfer
client updates to the broadcast server.
• In this thesis, we have proposed a suite of new MVCC protocols, called MVCC-*, that are
based on optimism allowing clients to immediately execute read-write transactions using the
information stored in the client cache or air-cache without any extra communication with the
broadcast server in case an object has to be accessed or modified. This is clearly in contrast
to lock-based schemes that may require such communication to obtain appropriate permission (e.g., read or write permission) before an object can be accessed or modified. While
the independence of client read and write operations from the server activity is useful to
7.2. Future Work
245
overcome the latency and bandwidth problems prevalent in mobile environments, at the end
of a transaction Ti , however, the broadcast server needs to check Ti ’s validation information
against the CC information of earlier committed transactions with the objective to detect nonserializable transaction executions. Whenever Ti is identified to be involved in a consistency
violating conflict with one or more recently committed read-write transactions, the detected
conflict needs to be resolved somehow. As the majority of transaction control protocols proposed in the literature, our MVCC-* protocols resolve data conflicts by simply aborting the
conflicting transaction. Using this approach has the advantage that no human intervention
is required to resolve data conflicts, i.e., no application-specific conflict detection and resolution rules need to be specified by the application programmer and/or users. On the other
hand, the abort-based conflict resolution approach suffers from poor resource utilization and
has the drawback of potentially causing transactions to starve which is the case when a transaction repeatedly fails to commit. As starvation is a well-known problem for optimistic CC
schemes, numerous solutions have been proposed in the literature. The pioneering paper on
optimistic schemes by Kung et al. [96] suggests detecting a starving transactions by counting
the number of successive transaction aborts. To rescue the system from starving transactions, the authors propose to use semaphores to effectively lock the entire database during the
transaction’s restart to ensure that the re-execution succeeds. Rahm et al. [131] proposed to
solve the problem by using page-locks rather than database locks being acquired after a readwrite transaction is aborted for the very first time to avoid multiple transaction restarts. More
recently and within the context of mobile databases and the presence of intermittent connectivity, Preguiça et al. [126] suggested to use object type- and operation-specific reservations,
i.e., locks, that are associated with time leases which guarantee that reservations will not be
held forever, even if the mobile client that holds the reservation becomes permanently disconnected. An interesting area of future work is to use simulation to investigate the impact of the
various approaches trying to solve the starvation problem on the overall system performance.
The solution space to be examined comprises two components: (a) starvation detection and
(b) starvation resolution. For starvation detection, the evaluation of different heuristics such
246
as the number of transaction restarts (in the range of 1 to some upper bound N), the transaction age, or the amount of wasted resources (e.g., battery power) useful to indicate starvation
w.r.t. their impact on the overall system performance (under various workload conditions)
would be worthwhile to study in order to gain insights into the pros and cons of the individual approaches. For starvation resolution of a starving read-write transaction Ti , the following
four questions arise: What objects should be protected from being updated during Ti ’s restart
execution? Should the server protect all objects accessed and updated during Ti ’s last unsuccessful execution or only those that produced conflicts. What level of lock granularity should
be used to guarantee that no conflict will arise when Ti is re-executed? If the restart change
probability is 0%, i.e., Ti accesses and updates the same set of objects during its re-execution,
then object-level or even object field-level locking would be appropriate. If Ti performs new
accesses and updates, then page-level or even table-level locking should be used. What lease
time should be attached to locks and what is the impact of mispredicting the optimal lease
times (too short or too long) on the system performance?
• In this thesis all performance evaluations have been carried out using a synthetic workload
model to generate a stream of read and write requests issued by the client and server transactions. In the light of the boosted interest of the industry in wireless data services and
the emergence of a continuous broadcast network, called DirectBand Network [108], using
FM radio sub-carrier frequencies to deliver timely information to people with SPOT-enabled
devices (e.g., PDAs or watches), it would be useful to re-validate the performance of our algorithms by experimenting with real workload traces gathered from a large number of clients
using DirectBand Network services. By re-running the experiments with real traces and
comparing the gathered results with those of our experiments, it would be interesting to see
whether the choice of the workload model (synthetic vs. real) affects the relative performance
and overall ranking of the studied CC protocols and cache replacement and prefetching policies. If the experiments would reveal differences in the performance among the algorithms,
it would then be important to investigate the cause of this inconsistent behavior in order to
draw conclusions for future studies in this area.
Bibliography
[1] R. Abbott and H. Garcia-Molina, “Scheduling Real-Time Transactions: A Performance Evaluation,” in VLDB 1988, 1988, pp. 1–12.
[2] N. Abramson, “The ALOHA System — Another Alternative for Computer Communication,”
in Proc. of the Fall Joint Computer Conference, 1970, pp. 281–285.
[3] S. Acharya, “Broadcast Disks: Dissemination-based Data Management for Asymmetric
Communication Environments,” Brown University, Department of Computer Science, Tech.
Rep. CS-97-15, 1997.
[4] S. Acharya, R. Alonso, M. Franklin, and S. Zdonik, “Broadcast Disks: Data Management
for Asymmetric Communications Environments,” in Proc. ACM SIGMOD Conf., 1995, pp.
199–210.
[5] S. Acharya, M. Franklin, and S. Zdonik, “Prefetching from a Broadcast Disk,” in ICDE 1996,
Februar 1996, pp. 276–285.
[6] ——, “Balancing Push and Pull for Data Broadcast,” in Proc. ACM SIGMOD Conf., 1997,
pp. 183–194.
[7] S. Acharya, M. J. Franklin, and S. B. Zdonik, “Dissemination-Based Data Delivery Using Broadcast Disks,” IEEE Personal Communications, vol. 2, no. 6, pp. 50–60, December
1995.
[8] F. Adachi, “Fundamentals of Multiple Access Techniques,” in Wireless Communications in
the 21st Century, M. Shafi, S. Ogose, and T. Hattori, Eds.
247
Wiley-IEEE Press, 2002.
Bibliography
248
[9] A. Adya, “Weak Consistency: A Generalized Theory and Optimistic Implementations for
Distributed Transactions,” MIT Laboratory for Computer Science, Cambridge, MA, Tech.
Rep. MIT/LCS/TR-786, March 1999.
[10] A. Adya, B. Liskov, and P. O’Neil, “Generalized Isolation Level Definitions,” in ICDE 2000,
2000, pp. 67–78.
[11] D. Agrawal, A. E. Abbadi, and A. K. Singh, “Consistency and Orderability. Semantics-Based
Correctness Criteria for Databases,” ACM TODS, vol. 18, no. 3, pp. 460–486, 1993.
[12] AirTV, Inc., “The Official AirTV Website,” 2004. [Online]. Available: http://www.airtv.net
[13] R. Alonso and H. F. Korth, “Database System Issues in Nomadic Computing,” in Proc. ACM
SIGMOD Conf. 1993, 1993, pp. 388–392.
[14] ANSI X3.135-1992 — Database Language SQL, American National Standart for Information
Systems, 1819 L Street, NW, Washington, DC 20036, USA, 1992.
[15] M. H. Ammar and J. Wong, “The Design of Teletext Broadcast Cycles,” Performance Evaluation, vol. 5, no. 4, pp. 235–242, 1985.
[16] Anonymous, “Wireless LANs: Comparison of Wireless LAN Standards — 802.11a versus
802.11b,” 2001. [Online]. Available: http://www.mobileinfo.com/Wireless LANs/802.11a
802.11b.htm
[17] H. Balakrishnan and V. N. Padmanabhan, “How Network Asymmetry Affects TCP,” IEEE
Communications Magazine, vol. 39, no. 4, pp. 60–66, April 2001.
[18] H. Balakrishnan, V. N. Padmanabhan, G. Fairhurst, and M. Sooriyabandara, “TCP
Performance Implications of Network Path Asymmetry,” 2002. [Online]. Available:
ftp://ftp.isi.edu/in-notes/rfc3449.txt
[19] K. Banh, “Kenny’s PDA (Personal Digital Assistant) Guide,” 2004. [Online]. Available:
http://www.pages.drexel.edu/∼kvb22/
Bibliography
249
[20] D. Barbará and T. Imielinski, “Sleepers and Workaholics: Caching Strategies in Mobile
Environments,” in Proc. ACM SIGMOD Conf., 1994, pp. 1–12.
[21] D. Barbará and T. Imielinski, “Sleepers and Workaholics: Caching Strategies in Mobile
Environments,” VLDB Journal, vol. 4, no. 4, pp. 567–602, 1995.
[22] R. Bayer and C. McCreight, “Organization and Maintenance of Large Ordered Indexes,” in
Acta Informatica 1, 1972, pp. 173–189.
[23] H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O’Neil, and P. O’Neil, “A Critique of ANSI
SQL Isolation Levels,” in Proc. ACM SIGMOD Conf., June 1995, pp. 1–10.
[24] A. Bernstein, D. Gerstl, and P. Lewis, “Concurrency Control for Step-Decomposed Transactions,” Information Systems, vol. 24, no. 8, 1999.
[25] A. J. Bernstein, D. S. Gerstl, W. H. Leung, and P. M. Lewis, “Design and Performance of an
Assertional Concurrency Control System,” in ICDE 1998, 1998, pp. 436–445.
[26] P. A. Bernstein, V. Hadzilacos, and N. Goodman, Concurrency Control and Recovery in
Database Systems.
Addison-Wesley, 1987.
[27] Bluetooth Special Interest Group, “Bluetooth Core Specification v1.2,” Nov. 2003. [Online].
Available: https://www.bluetooth.org/spec
[28] ——,
“The
Official
Bluetooth
SIG
Website,”
2004.
[Online].
Available:
http://www.bluetooth.com
[29] P. M. Bober and M. J. Carey, “Multiversion Query Locking,” in VLDB 1992, August 1992,
pp. 497–510.
[30] T. Bowen, G. Gopal, G. Herman, T. Hickey, K. Lee, W. Mansfield, J. Raitz, and A. Weinrib,
“The Datacycle Architecture,” CACM, vol. 35, no. 12, pp. 71–81, 1992.
[31] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, “Web Caching and Zipf-like Distributions: Evidence and Implications,” in Infocom 1999, 1999, pp. 126–134.
Bibliography
250
[32] P. Cao, E. W. Felten, A. Karlin, and K. Li, “A Study of Integrated Prefetching and Caching
Strategies,” in ACM SIGMETRICS 1995, 1995, pp. 188–197.
[33] E. Chang and R. Katz, “Exploiting Inheritance and Structure Semantics for Effective Clustering and Buffering in an Object-Oriented DBMS,” in Proc. ACM SIGMOD Conf., 1989,
pp. 348–357.
[34] W. W. Chang and H. J. Schek, “A Signature Access Method for the Starburst Database System,” in VLDB 1989, 1989, pp. 145–153.
[35] M. S. Chen, K. L. Wu, and P. S. Yu, “Optimizing Index Allocation for Sequential Data
Broadcasting in Wireless Mobile Computing,” IEEE TKDE, vol. 15, no. 1, pp. 161–173,
2003.
[36] C. M. Cordeiro, H. Gossain, R. L. Ashok, and D. P. Agrawal, “The Last Mile: Wireless Technologies for Broadband and Home Networks,” in In 21th Brazilian Symposium on Computer
Networks, 2003, pp. 119–178.
[37] M.
Dankberg
nomics
of
and
Broadband
J.
Puetz,
Satellite
“Comparative
Services,”
Approaches
January
2002.
in
[Online].
the
Eco-
Available:
http://www.satelliteonthenet.co.uk/white/viasat1.html
[38] A. Datta, A. Celik, J. G. Kim, D. E. VanderMeer, and V. Kumar, “Adaptive Broadcast Protocols to Support Power Conservant Retrieval by Mobile Users,” in ICDE 1997, 1997, pp.
124–133.
[39] S. B. Davidson, H. Garcia-Molina, and D. Skeen, “Consistency in a Partitioned Network: A
Survey,” ACM Comput. Surv., vol. 17, no. 3, pp. 341–370, 1985.
[40] P. J. Denning, “On Modeling Program Behaviour,” in Proceedings Spring Joint Computer
Conference, Arlington, VA., 1972, pp. 937–944.
[41] P. Deolasee, A. Katkar, A. Panchbudhe, K. Ramamritham, and P. Shenoy, “Dissemination of
Dynamic Data,” in Proc. ACM SIGMOD Conf., 2001, p. 599.
Bibliography
251
[42] D. J. DeWitt, D. Maier, P. Futtersack, and F. Velez, “A Study of Three Alternative
Workstation-Server Architectures for Object-Oriented Database Systems,” in VLDB 1990,
1990, pp. 107–121.
[43] G. Diviney, “An Introduction to Short-Range Wireless Data Communications,” in Embedded
Systems Conference, San Francisco, April 2003.
[44] W. Effelsberg and T. Haerder, “Principles of Database Buffer Management,” ACM TODS,
vol. 9, no. 4, pp. 560–595, 1984.
[45] L. D. Eife and L. Gruenwald, “Research Issues for Data Communication in Mobile Ad-Hoc
Network Database Systems,” ACM SIGMOD Record, vol. 32, no. 2, pp. 42–47, 2003.
[46] M. Engels, Wireless OFDM Systems: How to Make Them Work?
Kluwer Academic Pub-
lishers, 2002.
[47] S.
Evans,
“Last
Mile
Technologies,”
2000.
[Online].
Available:
http://www.telsyte.com.au/feature/last mile.htm
[48] C. Faloutsos and S. Christodoulakis, “Signature Files: An Access Method for Documents
and its Analytical Performance Evaluation,” ACM Trans. Inf. Syst., vol. 2, no. 4, pp. 267–
288, 1984.
[49] A. A. Farrag and M. T. Özsu, “Using Semantic Knowledge of Transactions to Increase Concurrency,” ACM TODS, vol. 14, no. 4, pp. 503–525, 1989.
[50] A. Fekete, D. Liarokapis, E. O’Neil, P. O’Neil, and D. Shasha, “Making Snapshot Isolation
Serializable,” 2004, accepted for publication in ACM TODS.
[51] M. J. Franklin and S. B. Zdonik, “Dissemination-Based Information Systems,” IEEE Bulletin
of the Technical Committee on Data Engineering, vol. 19, no. 3, pp. 20–30, September 1996.
[52] H. Garcia-Molina, “Using Semantic Knowledge for Transaction Processing in a Distributed
Database,” ACM TODS, vol. 8, no. 2, pp. 186–213, 1983.
Bibliography
252
[53] S. Ghemawat, “The Modified Object Buffer: A Storage Management Technique for ObjectOriented Databases,” MIT Laboratory for Computer Science, Cambridge, MA, Tech. Rep.
MIT/LCS/TR-666, September 1995.
[54] J. D. Gibson, Ed., The Mobile Communications Handbook, 2nd ed.
IEEE Press, 1999.
[55] J. Gray and G. Graefe, “The Five-Minute Rule Ten Years Later, and Other Computer Storage
Rules of Thumb,” SIGMOD Record, vol. 26, no. 4, pp. 63–68, 1997.
[56] Groningen Growth and Development Centre, “60-Industry Database,” 2003. [Online].
Available: http://www.ggdc.net/dseries/60-industry.shtml
[57] P. G. D. Group, PostgreSQL: The Most Advanced Open Source Database System in the
World, 2004. [Online]. Available: http://www.postgresql.org
[58] R. E. Gruber, “Optimism vs. Locking: A Study of Concurrency Control for Client-Server
Object-Oriented Databases,” MIT Laboratory for Computer Science, Cambridge, MA, Tech.
Rep. MIT/LCS/TR-708, 1997.
[59] A. Gurtov, “Efficient Transport in 2.5G3G Wireless Wide Area Networks,” Ph.D. dissertation, University of Helsinki, 2002.
[60] S. Hameed and N. H. Vaidya, “Log-Time Algorithms for Scheduling Single and Multiple
Channel Data Broadcast,” in MobiCom 1997, 1997, pp. 90–99.
[61] R. C. Hansdah and L. M. Patnaik, “Update Serializability in Locking,” in International Conference on Database Theory, 1986, pp. 171–185.
[62] T. Härder, “Observations on Optimistic Concurrency Control Schemes,” Information Systems, vol. 9, no. 2, pp. 111–120, 1984.
[63] J. R. Haritsa, M. J. Carey, and M. Livny, “On Being Optimistic about Real-Time Constraints,”
in ACM PODS, 1990, pp. 331–343.
Bibliography
253
[64] T. Henderson, “Networking over Next-Generation Satellite Systems,” Ph.D. dissertation,
University of California at Berkeley, 1999.
[65] T. Henderson and R. Katz, “Transport Protocols for Internet-compatible Satellite Networks,”
IEEE Journal on Selected Areas of Communications, vol. 17, no. 2, pp. 345–359, 1999.
[66] M. Herlihy and W. E. Weihl, “Hybrid Concurrency Control for Abstract Data Types,” JCSS,
vol. 43, no. 1, pp. 25–61, 1991.
[67] G. Herman, G. Gopal, K. Lee, and A. Weinrib, “The Datacycle Architecture for Very High
Throughput Database Systems,” in Proc. ACM SIGMOD Conf., 1987, pp. 97–103.
[68] W. W. Hsu, A. J. Smith, and H. C. Young, “Characteristics of Production Database Workloads
and the TPC Benchmarks,” IBM Systems Journal, vol. 40, no. 3, pp. 781–802, 2001.
[69] Q. Hu, L. D. L., and W.-C. Lee, “A Comparison of Indexing Methods for Data Broadcast
on the Air,” in Proceedings of the 12th International Conference on Information Networking
(ICOIN-12), 1998, pp. 656–659.
[70] Hughes Network Systems, DirecPC Home Page, January 2002. [Online]. Available:
http://www.direcpc.com/
[71] ——, DIRECWAY Home Page, July 2004. [Online]. Available: http://www.direcway.com/
[72] IEEE 802 LAN/MAN Standards Committee, “The Offical The IEEE 802.16 Working Group
on Broadband Wireless Access Standards .” [Online]. Available: http://www.ieee802.org/16
[73] T. Imielinski and B. R. Badrinath, “Data Management for Mobile Computing,” SIGMOD
Record, vol. 22, no. 1, pp. 34–39, 1993.
[74] ——, “Mobile Wireless Computing: Challenges in Data Management,” CACM, vol. 37,
no. 10, pp. 18–28, 1994.
[75] T. Imielinski, S. Viswanathan, and B. R. Badrinath, “Energy Efficient Indexing on Air,” in
Proc. ACM SIGMOD Conf. 1994.
ACM Press, 1994, pp. 25–36.
Bibliography
254
[76] ——, “Power Efficient Filtering of Data an Air,” in EDBT 1994, 1994, pp. 245–258.
[77] ——, “Data on Air: Organization and Access,” IEEE Transactions on Knowledge and Data
Engineering, vol. 9, no. 3, pp. 353–372, 1997.
[78] ——, “Scheduling Data Broadcast in Asymmetric Communication Environments,” Knowledge and Data Eng., IEEE Trans., vol. 9, no. 3, pp. 353–372, 1997.
[79] S. C. Inc., “Starband Home Page,” 2004. [Online]. Available: http://www.starband.com/
[80] Infrared Data Association — IrDA, “Serial Infrared Physical Layer Specification, ver. 1.4,”
May 2001. [Online]. Available: http://www.irda.org
[81] ——, “The Official IrDA Website,” 2004. [Online]. Available: http://www.irda.org
[82] K. Jacobs, “Concurrency Control: Transaction Isolation and Serializability in SQL92 and
Oracle7,” Oracle Corporation, Oracle White Paper Part No. A33745, July 1995.
[83] H. S. Jeon and S. H. Noh, “A Database Disk Buffer Management Algorithm Based on
Prefetching,” in Proceedings of ACM CIKM, Bethesda, Maryland, USA, 1998, pp. 167–174.
[84] D. G. Jeong and J. W. Sook, “CDMA/TDD System for Wireless Multimedia Services with
Traffic Unbalance between Uplink and Downlink,” IEEE Journal on Selected Areas in Communications, vol. 17, no. 5, pp. 939–946, 1999.
[85] T. Johnson and D. Shash, “2Q: A Low Overhead High Performance Buffer Management
Replacement Algorithm,” in VLDB 1994, 1994, pp. 439–450.
[86] C. E. Jones, K. M. Sivalingam, P. Agrawal, and J.-C. Chen, “A Survey of Energy Efficient
Network Protocols for Wireless Networks,” Wireless Networks, vol. 7, no. 4, pp. 343–358,
2001.
[87] S. K. Joo and T. C. Wan, “Incorporation of QoS and Mitigated TCP/IP over Satellite Links,”
in Proc. 1 Asian Int’l Mobile Computing Conference (AMOC 2000), 2000.
Bibliography
255
[88] C. P. K. L. Wu, P. S. Yu, “Divergence Control for Epsilon Serializability,” in ICDCS, February 1992, pp. 506–515.
[89] S. Khanna and V. Liberatore, “On Broadcast Disk Paging,” in ACM STOCS 1998, 1998, pp.
634–643.
[90] ——, “On Broadcast Disk Paging,” SIAM Journal on Computing, vol. 29, no. 5, pp. 1683–
1702, 2000.
[91] S.-W. Kim and H.-S. Won, “Batch-construction of B+-trees,” in Proceedings of the 2001
ACM symposium on Applied Computing, 2001, pp. 231–235.
[92] L. Kleinrock and F. Tobagi, “Packet Switching in Radio Channels: Part 1 — Carrier Sense
Multiple-access Models and their Throughput-delay Characteristics,” IEEE Trans. Communi., vol. 23, no. 12, pp. 1400–1416, 1975.
[93] D. E. Knuth, The Art of Computer Programming: Sorting and Searching, 2nd ed. AddisonWesley, 1998, vol. 3.
[94] R. Kravets, K. Schwan, and K. Calvert, “Power-aware Communication for Mobile Computers,” in International Workshop on Mobile Multimedia Communications 1999, 1999.
[95] N. Krishnakumar and A. J. Bernstein, “High Throughput Escrow Algorithms for Replicated
Databases,” in VLDB 1992, 1992, pp. 175–186.
[96] H. T. Kung and J. T. Robinson, “On Optimistic Methods for Concurrency Control,” ACM
TODS, vol. 6, no. 2, pp. 213–226, 1981.
[97] E. R. Lassettre, “Olympic Records for Data at the 1998 Nagano Games,” in SIGMOD 1998,
L. M. Haas and A. Tiwary, Eds., 1998, p. 537.
[98] D. Lee, J. Choi, J. H. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim, “LRFU: A Spectrum
of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies,”
IEEE Transactions on Computers, vol. 50, no. 12, pp. 1352–1361, 2001.
Bibliography
256
[99] D. Lee, S. H. N. J. H. Kim, S. L. Min, J. Choi, Y. Cho, and C. S. Kim, “On the Existence of
a Spectrum of Policies that subsumes the Least Recently Used (LRU) and Least Frequently
Used (LFU) Policies,” in ACM SIGMETRICS 1999, 1999, pp. 134–143.
[100] S. Y. Lee, M. C. Yang, and J. W. Chen, “Signature File as a Spatial Filter for Iconic Image
Database,” pp. 373–397, 1992.
[101] V. Lee, S. H. Son, and K. Lam, “On the Performance of Transaction Processing in Broadcast
Environments,” in MDA 1999, 1999, pp. 61–70.
[102] V. C. S. Lee and K. Lam, “Optimistic Concurrency Control in Broadcast Environments:
Looking Forward at the Server and Backward at the Clients,” in MDA 1999, 1999, pp. 97–
106.
[103] W. C. Lee and D. L. Lee, “Using Signature Techniques for Information Filtering in Wireless
and Mobile Environments,” DPDB, vol. 4, no. 3, pp. 205–227, 1996.
[104] S. Lu, A. Bernstein, and P. Lewis, “Correct Execution of Transactions at Different Isolation
Levels,” IEEE TKDE, vol. 16, no. 9, pp. 1070–1081, 2004.
[105] R.
and
Ludwig,
Third
A.
(3G)
Gurtov,
Generation
and
F.
Wireless
Khafizov,
Networks,”
“TCP
over
2003.
Second
[Online].
(2.5G)
Available:
http://www.ietf.org/rfc/rfc3481.txt?number=3481
[106] J. Martin, Communications Satellite Systems.
Prentice Hall, 1978.
[107] Maxtor Corporation, “Maxtor Atlas 10K III - Product Manual,” 2002. [Online].
Available: http://www.maxtor.com/ files/maxtor/en us/documentation/manuals/atlas10k iii
manual.pdf
[108] Microsoft Corporation, “DirectBand Network. Microsoft Smart Personal Objects Technology (SPOT),” 2005. [Online]. Available: http://www.microsoft.com/resources/spot/
Bibliography
257
[109] C. Mohan, H. Pirahesh, and R. Lorte, “Efficient and Flexible Methods for Transient Versioning of Records to Avoid Locking by Read-only Transactions,” in Proc. ACM SIGMOD Conf.,
1992, pp. 124–133.
[110] E. Mok, H. V. Leong, and A. Si, “Transaction Processing in an Asymmetric Mobile Environment,” in MDA, 1999, pp. 71–81.
[111] T. Nakajima, “Commutativity Based Concurrency Control and Recovery for Multiversion
Objects,” in International Workshop on Distributed Object Management, 1992, pp. 231–
247.
[112] J. H. Oh, K. A. Hua, and K. Prabhakara, “A New Broadcasting Technique for an Adaptive
Hybrid Data Delivery in Wireless Mobile Network Environment,” in Proc. of 19th IEEE
International Performance, Computing and Communications Conference, 2000, pp. 361–
367.
[113] J. H. Oh, K. A. Hua, and K. Vu, “An Adaptive Hybrid Technique for Video Multicast,”
in IEEE International Conference on Computer Communications and Networks, 1998, pp.
227–234.
[114] B. Oki, M. Pfluegl, A. Siegel, and D. Skeen, “The Information Bus: An Architecture for
Extensible Distributed Systems,” in 14th ACM Symposium on Operating System Principals,
Asheville, NC, 1993.
[115] E. J. O’Neil, P. E. O’Neil, and G. Weikum, “The LRU-K Page Replacement Algorithm for
Database Disk Buffering,” in Proc. ACM SIGMOD Conf., 1993, pp. 297–306.
[116] P. E. O’Neil, “The Escrow Transactional Method,” ACM TODS, vol. 11, no. 4, pp. 405–430,
1986.
[117] Oracle Corporation, “Concepts: 10g Release 1,” Oracle 10g Documentation, Part No.
B10743-01, December 2003.
Bibliography
258
[118] J. O’Toole and L. Shrira, “Opportunistic Log: Efficient Installation Reads in a Reliable Storage Server,” in Operating Systems Design and Implementation, 1994, pp. 39–48.
[119] M. Palmer and S. B. Zdonik, “Fido: A Cache That Learns to Fetch,” in VLDB 1991, 1991,
pp. 255–264.
[120] C. Papadimitriou, The Theory of Database Concurrency Control. Computer Science Press,
1986.
[121] S. H. Phatak and B. R. Badrinath, “Multiversion Reconciliation for Mobile Databases,” in
ICDE 1999, 1999, pp. 582–589.
[122] E. Pitoura and B. Bhargava, “Maintaining Consistency of Data in Mobile Distributed Environments,” in ICDCS 1995, 1995, pp. 404–413.
[123] E. Pitoura and P. Chrysanthis, “Exploiting Versions for Handling Updates in Broadcasting
Disks,” in VLDB, 1999, pp. 114–125.
[124] ——, “Scalable Processing of Read-Only Transactions in Broadcast Push,” in ICDCS, 1999,
pp. 432–439.
[125] E. Pitoura and G. Samaras, Data Management for Mobile Computing.
Kluwer Academic
Publishers, 1998, vol. 10.
[126] N. Preguiça, J. L. Martins, M. Cunha, and H. Domingos, “Reservations for Conflict Avoidance in a Mobile Database System,” in MobiSys 2003, 2003, pp. 43–56.
[127] N. M. Preguiça, C. Baquero, F. Moura, J. L. Martins, R. Oliveira, H. J. L. Domingos, J. O.
Pereira, and S. Duarte, “Mobile Transaction Management in Mobisnap,” in ADBIS-DASFAA,
2000, pp. 379–386.
[128] J. G. Proakis, Digital Communications, 4th ed.
McGraw Hill, 2000.
[129] M. B. Pursley, “The Role of Spread Spectrum in Packet Radio Networks,” Processings of the
IEEE, vol. 75, no. 1, pp. 116–134, 1987.
Bibliography
259
[130] F. Rabitti and P. Zezula, “A Dynamic Signature Technique for Multimedia Databases,” in
Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval, 1990, pp. 193–210.
[131] E. Rahm and A. Thomasian, “A New Distributed Optimistic Concurrency Control Method
and a Comparison of its Performance with Two-Phase Locking,” in Proceedings of the 10th
International Conference on Distributed Computing Systems, 1990, pp. 294–301.
[132] T. S. Rapport, Wireless Communications Principles and Practice, 2nd ed.
Prentice-Hall,
Inc., 2002.
[133] I. Rubin, “Access-control Disciplines for Multiaccess Communications Channels: Reservation and TDMA Schemes,” IEEE Trans. Inform. Theory, vol. 25, no. 5, pp. 516–526, 1979.
[134] P. Rysavy, “MMDS Struggles to Find a Foothold,” Network Computing, 2001. [Online].
Available: http://www.networkcomputing.com/1222/1222f3.html
[135] M. Sainsbury, “Mobiles on the Move,” The Australian, 2004. [Online]. Available:
http://www.theaustralian.news.com.au/printpage/0,5942,8859609,00.html
[136] H.
Schwetman,
CSIM
Users
Guide,
November
2002.
[Online].
Available:
http://www.mesquite.com/htmls/guides.htm
[137] A. Seifert and M. H. Scholl, “Processing Read-only Transactions in Hybrid Data Delivery
Environments with Consistency and Currency Guarantees,” University of Konstanz, Tech.
Rep. 163, December 2001.
[138] ——, “Processing Read-only Transactions in Hybrid Data Delivery Environments with Consistency and Currency Guarantees,” MONET, vol. 8, no. 4, pp. 327–342, 2003.
[139] J. Shanmugasundaram, A. Nithrakasyap, R. Sivasankaran, and K. Ramamritham, “Efficient
Concurrency Control for Broadcast Environments,” in Proc. ACM SIGMOD Conf., 1999, pp.
85–96.
Bibliography
260
[140] M. Shapiro, A. I. T. Rowstron, and A.-M. Kermarrec, “Application-independent Reconciliation for Nomadic Applications,” in ACM SIGOPS European Workshop 2000, 2000, pp.
1–6.
[141] D. Shasha, F. Llirbat, E. Simon, and P. Valduriez, “Transaction Chopping: Algorithms and
Performance Studies,” ACM TODS, vol. 20, no. 3, pp. 325–363, 1995.
[142] N. Shivakumar and S. Venkatasubramanian, “Energy Efficient Indexing for Information Dissemination in Wireless Systems,” MONET, vol. 1, no. 4, pp. 433–446, 1996.
[143] S. Singh, M. Woo, and C. S. Raghavendra, “Power-Aware Routing in Mobile Ad Hoc Networks,” in Mobile Computing and Networking, 1998, pp. 181–190.
[144] A. J. Smith, “Disk Cache-Miss Ratio Analysis Design Considerations,” ACM TOCS, vol. 3,
no. 2, pp. 161–203, 1985.
[145] R. Srinivasan, C. Liang, and K. Ramamritham, “Maintaining Temporal Coherency of Virtual
Data Warehouses,” in RTSS 1998, 1998, pp. 60–70.
[146] K. Stathatos, “Air-Caching: Adaptive Hybrid Data Delivery,” Ph.D. dissertation, University
of Maryland, College Park, Maryland, 1999.
[147] K. Stathatos, N. Roussopoulos, and J. S. Baras, “Adaptive Data Broadcast in Hybrid Networks,” in VLDB 1997, 1997, pp. 326–335.
[148] C. J. Su and L. Tassiulas, “Broadcast Scheduling for Information Distribution,” in Infocom
1997, 1997, pp. 109–117.
[149] L. Tassiulas and C. J. Su, “Optimal Memory Management Strategies for a Mobile User in
a Broadcast Data Delivery System,” IEEE Journal on Selected Areas in Communications,
vol. 15, no. 7, pp. 1226–1238, 1997.
[150] D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer, and C. Hauser,
“Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System,” in
Bibliography
261
Proceedings 15th Symposium on Operating Systems Principles (SOSP-15), 1995, pp. 172–
183.
[151] A. Tomkins, R. H. Patterson, and G. Gibson, “Informed Multiprocess Prefetching and
Caching,” in ACM SIGMETRICS 1997, 1997, pp. 100–114.
[152] K. L. Tripp, “SQL Server 2005 Beta 2 Snapshot Isolation,” 2004. [Online]. Available:
http://www.microsoft.com/technet/prodtechnol/sql/2005/SQL05B.mspx
[153] N. H. Vaidya and S. Hameed, “Scheduling Data Broadcast in Asymmetric Communication
Environments,” ACM Wireless Networks, vol. 5, no. 3, pp. 171–182, 1999.
[154] M. A. Viredaz, L. S.Brakmo, and W. R. Hamburgen, “Energy Management on Handheld
Devices,” Queue, vol. 1, no. 7, pp. 44–52, 2003.
[155] S. R. Viswanathan, “Publishing in Wireless and Wireline Environments,” Ph.D. dissertation,
Rutgers University, 1996.
[156] Vocal Technologies, Ltd., “EDGE — Enhance Data Rate GSM,” 2001. [Online]. Available:
http://www.vocal.com/data sheets/full/edge.pdf
[157] G. D. Walborn and P. K. Chrysanthis, “Supporting Semantics-Based Transaction Processing
in Mobile Database Applications,” in Symposium on Reliable Distributed Systems, 1995, pp.
31–40.
[158] W. E. Weihl, “Data-dependent Concurrency Control and Recovery,” in PODC 1983, 1983,
pp. 63–75.
[159] ——, “Distributed Version Management for Read-only Actions,” SE, vol. 13, no. 1, pp. 55–
64, January 1987.
[160] ——, “Commutativity-Based Concurrency Control for Abstract Data Types,” IEEE Transactions on Computers, vol. 37, no. 12, pp. 1488–1505, 1988.
Bibliography
262
[161] G. Weikum and G. Vossen, Transactional Information Systems: Theory, Algorithms, and
Practice of Concurrency Control and Recovery.
Morgan Kaufmann, 2001.
[162] H. Wiederhold, “Read-only Transactions in a Distributed Database,” ACM TODS, vol. 7,
no. 2, pp. 209–234, 1982.
[163] O. Wolfson, A. P. Sistla, S. Chamberlain, and Y. Yesha, “Updating and Querying Databases
that Track Mobile Units,” Distributed and Parallel Databases, vol. 7, no. 3, pp. 257–287,
1999.
[164] J. W. Wong, “Broadcast Delivery,” in Proceedings of the IEEE, 1988, pp. 1566–1577.
[165] World Airline Entertainment Association Internet Working Group (IWG), “Matrix
of
Service
Delivery
Options,
Version
1.0,”
2001.
[Online].
Available:
www.waea.org/tech/techdocs/off-board matrix v10.doc
[166] J. Xu, Q. Hu, D. L. Lee, and W.-C. Lee, “SAIU: An Efficient Cache Replacement Policy for
Wireless On-demand Broadcasts,” in ACM CIKM 2000, 2000, pp. 46–53.
[167] J. Xu, W.-C. Lee, and X. Tang, “Exponential Index: A Parameterized Distributed Indexing
Scheme for Data on Air,” in MobiSys 2004, 2004, pp. 153–164.
[168] G. K. Zipf, Human Behavior and Principle of Least Effort: An Introduction to Human Ecology.
Addison-Wesley, 1949.

Efficient and Consistent Transaction Processing in

Transcription

Similar documents

Das Rechenzentrum — Eine Einführung

Die BUCHSTAVIER - Das Dosierte Leben