An object-oriented wrapper to relational databases with

Transcription

POLITECHNIKA ŁÓDZKA
WYDZIAŁ ELEKTROTECHNIKI, ELEKTRONIKI,
INFORMATYKI I AUTOMATYKI
KATEDRA INFORMATYKI STOSOWANEJ
mgr inŜ. Jacek Wiślicki
Ph.D. Thesis
An object-oriented wrapper
to relational databases
with query optimisation
praca doktorska
Obiektowa osłona
do relacyjnych baz danych
z uwzględnieniem optymalizacji
zapytań
Advisor:
prof. dr hab. inŜ. Kazimierz Subieta
Łódź 2007
To my wife and son, for their love, belief and patience…
Index of Contents
ABSTRACT .................................................................................................................................................... 5
ROZSZERZONE STRESZCZENIE ...................................................................................................................... 7
CHAPTER 1 INTRODUCTION ......................................................................................................................... 13
1.1 Motivation....................................................................................................................................... 13
1.2 Theses and Objectives..................................................................................................................... 14
1.3 History and Related Works ............................................................................................................. 16
1.4 Thesis Outline ................................................................................................................................. 18
CHAPTER 2 THE STATE OF THE ART AND RELATED WORKS ....................................................................... 20
2.1 The Impedance Mismatch ............................................................................................................... 21
2.2 Related Works................................................................................................................................. 23
2.2.1 Wrappers and Mediators ......................................................................................................... 23
2.2.2 ORM and DAO ....................................................................................................................... 32
2.2.3 XML Views over Relational Data........................................................................................... 37
2.2.4 Applications of RDF ............................................................................................................... 38
2.2.5 Other Approaches.................................................................................................................... 40
2.3 The eGov-Bus Virtual Repository................................................................................................... 42
2.4 Conclusions..................................................................................................................................... 44
CHAPTER 3 RELATIONAL DATABASES ........................................................................................................ 45
3.1 Relational Optimisation Constraints ............................................................................................... 45
3.2 Relational Calculus and Relational Algebra ................................................................................... 47
3.3 Relational Query Processing and Optimisation Architecture.......................................................... 48
3.3.1 Space Search Reduction .......................................................................................................... 51
3.3.2 Planning................................................................................................................................... 55
3.3.3 Size-Distribution Estimator..................................................................................................... 58
3.4 Relational Query Optimisation Milestones ..................................................................................... 60
3.4.1 System-R ................................................................................................................................. 60
3.4.2 Starburst .................................................................................................................................. 61
3.4.3 Volcano/Cascades ................................................................................................................... 62
CHAPTER 4 THE STACK-BASED APPROACH ................................................................................................ 64
4.1 SBA Object Store Models............................................................................................................... 65
4.2 SBQL .............................................................................................................................................. 65
4.2.1 SBQL Semantics ..................................................................................................................... 66
4.2.2 Sample Queries ....................................................................................................................... 68
4.3 Updateable Object-Oriented Views................................................................................................. 75
4.4 SBQL Query Optimisation.............................................................................................................. 77
4.4.1 Independent Subqueries .......................................................................................................... 79
4.4.2 Rewriting Views and Query Modification .............................................................................. 82
4.4.3 Removing Dead Subqueries .................................................................................................... 83
4.4.4 Removing Auxiliary Names .................................................................................................... 83
4.4.5 Low Level Techniques ............................................................................................................ 84
CHAPTER 5 OBJECT-RELATIONAL INTEGRATION METHODOLOGY .............................................................. 86
5.1 General Architecture and Assumptions........................................................................................... 86
5.2 Query Processing and Optimisation ................................................................................................ 88
5.2.1 Naive Approach vs. Optimisation ........................................................................................... 90
5.3 A Conceptual Example ................................................................................................................... 91
Page 3 of 235
Index of Contents
CHAPTER 6 QUERY ANALYSIS, OPTIMISATION AND PROCESSING ............................................................. 100
6.1 Proposed Algorithms..................................................................................................................... 100
6.1.1 Selecting Queries................................................................................................................... 101
6.1.2 Deleting Queries.................................................................................................................... 103
6.1.3 Updating Queries................................................................................................................... 104
6.1.4 SQL Query String Generation ............................................................................................... 106
6.1.5 Stack-Based Query Evaluation and Result Reconstruction................................................... 108
6.2 Query Analysis and Optimisation Examples................................................................................. 108
6.2.1 Relational Test Schemata ...................................................................................................... 109
6.2.2 Selecting Queries................................................................................................................... 117
6.2.3 Imperative Constructs ........................................................................................................... 130
6.2.4 Multi-Wrapper and Mixed Queries ....................................................................................... 137
6.2.5 SBQL Optimisation over Multi-Wrapper Queries ................................................................ 144
6.3 Sample Use Cases ......................................................................................................................... 151
6.3.1 Rich Employees..................................................................................................................... 151
6.3.2 Employees with Departments................................................................................................ 152
6.3.3 Employees with Cars............................................................................................................. 153
6.3.4 Rich Employees with White Cars ......................................................................................... 154
CHAPTER 7 WRAPPER OPTIMISATION RESULTS ........................................................................................ 156
7.1 Relational Test Data...................................................................................................................... 157
7.2 Optimisation vs. Simple Rewriting ............................................................................................... 159
7.3 Application of SBQL optimisers................................................................................................... 181
CHAPTER 8 SUMMARY AND CONCLUSIONS ............................................................................................... 183
8.1 Prototype Limitations and Further Works..................................................................................... 184
8.2 Additional Wrapper Functionalities .............................................................................................. 185
APPENDIX A THE EGOV-BUS PROJECT ..................................................................................................... 187
APPENDIX B THE ODRA PLATFORM ........................................................................................................ 190
B.1 ODRA Optimisation Framework.................................................................................................. 193
APPENDIX C THE PROTOTYPE IMPLEMENTATION ..................................................................................... 195
C.1 Architecture .................................................................................................................................. 195
C.1.1 Communication protocol ...................................................................................................... 196
C.2 Relational Schema Wrapping ....................................................................................................... 198
C.2.1 Example................................................................................................................................ 199
C.2.2 Relational Schema Models ................................................................................................... 201
C.2.3 Result Retrieval and Reconstruction..................................................................................... 206
C.3 Installation and Launching ........................................................................................................... 209
C.3.1 CD Contents ......................................................................................................................... 209
C.3.2 Test Schemata Generation .................................................................................................... 210
C.3.3 Connection Configuration .................................................................................................... 210
C.3.4 Test Data Population ............................................................................................................ 211
C.3.5 Schema Description Generation ........................................................................................... 212
C.3.6 Server.................................................................................................................................... 213
C.3.7 Client .................................................................................................................................... 217
C.4 Prototype Testing ......................................................................................................................... 217
C.4.1 Optimisation Testing ............................................................................................................ 218
C.4.2 Sample batch files................................................................................................................. 219
INDEX OF FIGURES .................................................................................................................................... 221
INDEX OF LISTINGS ................................................................................................................................... 224
INDEX OF TABLES ..................................................................................................................................... 225
BIBLIOGRAPHY ......................................................................................................................................... 226
Page 4 of 235
Abstract
This Ph.D. thesis is focused on transparent and efficient integration of relational
databases to an object-oriented distributed database system available to top-level users
as a virtual repository. The core of the presented solution is to provide a wrapper –
a dedicated generic piece of software capable of interfacing between the virtual
repository structures (in the most common case – object-oriented updateable views) and
the wrapped relational database, enabling bidirectional data exchange (e.g. retrieval and
updates) with optimal query evaluation.
The idea of integration of distributed, heterogeneous, fragmented and redundant
databases can be dated to the eighties of the last century and the concept of federated
databases, nevertheless the virtual repository approach is closer to the distributed
mediation concept from the early nineties. Regardless of the origin, the goal of such
systems is to present final and complete business information transparently combined of
data stored in bottom-level resources, exactly matching top-level user demands and
requirements. In the most general case, the resources mean any data sources and feeds,
including the most common relational databases (the focus of the thesis), objectoriented and object-relational databases, XML and RDF data stores, Web Services, etc.
The need for integration of various resources representing completely different
paradigms and models is caused by the characteristics of the nowadays data and
information management systems (besides the resources’ distribution, fragmentation,
replication, redundancy and heterogeneity) – the software is usually written is some
high-level object-oriented programming language that is completely unadjacent to the
resource-level query language (the phenomenon is often referred to as the impedance
mismatch).
This integration process must be completely transparent so that an end user
(a human or an application) is not aware of an actual data source model and structure.
Page 5 of 235
Abstract
On the other side, an object-oriented database communicating directly with such
a wrapper must be opaque – it must work as a black box and its underlying objectrelational interface cannot be available directly from any system element located in
upper layers. Another feature of a wrapper is its genericity – its action and reliability
must be completely independent of a relational resource it wraps. Also neither data
materialisation nor replication are allowed at the virtual repository side (or any other
intermediate system module), of course except for current query results that must be
somehow returned to users querying the system. Similarly, updating the wrapped
resource data from the virtual repository level and its object-oriented query language
must me assured. Besides the transparency aspects, the most effort has been devoted to
efficient optimisation procedures enabling action of powerful native relational resource
query optimisers together with the object-oriented optimisation methods applicable in
the virtual repository.
The virtual repository for which the wrapper was designed relies on the stackbased approach (SBA), the corresponding stack-based query language (SBQL) and the
updateable object-oriented views. Therefore the wrapper must be capable of
transforming object-oriented queries referring to the global schema expressed in SBQL
into SQL-optimiseable relational queries, whose results are returned to the virtual
repository in the same way as actual object-oriented results.
The thesis has been developed under the eGov-Bus (Advanced eGovernment
Information Service Bus) project supported by the European Community under
“Information Society Technologies” priority of the Sixth Framework Programme
(contract number: FP6-IST-4-026727-STP). The idea with its different aspects
(including the virtual repository it is a part of and data fragmentation and integration
issues) has been presented in over 20 research papers, e.g. [1, 2, 3, 4, 5, 6, 7].
Keywords: database, object-oriented, relational, query optimisation, virtual repository,
wrapper, SBA, SBQL
Page 6 of 235
Rozszerzone streszczenie
Liczba dostępnych obecnie źródeł danych jest ogromna. Wiele z nich jest dostępne
poprzez Internet, jakkolwiek mogą one nie być publiczne lub ograniczać dostęp
do ściśle określonej grupy uŜytkowników. Takie zasoby są rozproszone, niejednorodne,
podzielone i nadmiarowe. Idea, której część została opracowana i zaimplementowana
w poniŜszej pracy doktorskiej polega na wirtualnej integracji takich zasobów
w zcentralizowaną, jednorodną, spójną i pozbawioną fragmentacji oraz nadmiarowości
całość
tworzącą
wirtualne
repozytorium
zapewniające
pewne
powszechne
funkcjonalności i usługi, włączając w to infrastrukturę zaufania (bezpieczeństwo,
prywatność, licencjonowanie, płatności, itp.), Web Services, rozproszone transakcje,
zarządzanie procesami (workflow management), itd.
Głównym celem przedstawionych prac jest integracja heterogenicznych
relacyjnych baz danych (zasobów) do obiektowego systemu bazodanowego, w którym
takie zasoby są widziane jako czysto obiektowe modele i składy, które mogą być
w przeźroczysty sposób odpytywane za pomocą obiektowego języka zapytań (innymi
słowy, muszą być one nieodróŜnialne od rzeczywistych zasobów obiektowych).
W związku z tym schematy relacyjne muszą zostać „osłonięte” wyspecjalizowanym
oprogramowaniem
zdolnym
do
dwukierunkowej
wymiany
danych
pomiędzy
znajdującym się na samym dole SZRBD i innymi elementami systemu. Ten proces musi
być przeźroczysty, aby końcowy uŜytkownik (człowiek lub aplikacja) nie był w Ŝaden
sposób świadomy faktycznego modelu zasobu. Z drugiej strony, obiektowa baza danych
komunikująca się bezpośrednio z taką osłoną musi być całkowicie nieprzejrzysta
(czarna skrzynka), aby osłona nie była w Ŝaden sposób bezpośrednio dostępna
z Ŝadnego elementu systemu znajdującego się w górnych warstwach architektury.
Kolejną cechą takiej osłony jest pełna generyczność – jej działanie, niezawodność
i własności wydajnościowe powinny być całkowicie uniezaleŜnione od zasobu, który
kryje. Z załoŜenia niedozwolone są takŜe materializacja i replikacja danych relacyjnych
Page 7 of 235
po stronie wirtualnego repozytorium (ani Ŝadnego innego pośredniego modułu
systemu), oczywiście poza bieŜącymi wynikami zapytań, które muszą zostać zwrócone
uŜytkownikom odpytującym system.
Tezy pracy zostają zdefiniowane jak poniŜej:
1. Spadkowe bazy relacyjne mogą zostać w przeźroczysty sposób włączone do
obiektowego wirtualnego repozytorium, a znajdujące się w nich dane
przetwarzane i aktualizowane przez obiektowy język zapytań – nieodróŜnialnie
od danych czysto obiektowych, bez konieczności materializacji i replikacji.
2. Dla takiego systemu mogą zostać opracowane i zaimplementowane mechanizmy
umoŜliwiające
współdziałanie
obiektowej
optymalizacji
wirtualnego
repozytorium z natywnymi optymalizatorami zasobu relacyjnego.
Sztuka
budowania
obiektowych
osłon
przeznaczonych
do
integracji
heterogenicznych zasobów relacyjnych do homogenicznych obiektowych systemów
bazodanowych jest rozwijana od około piętnastu lat. Prowadzone prace mają na celu
połączenie znacznie starszej teorii baz relacyjnych (podstawy zostały zebrane
i zaprezentowane przez Edgara F. Codda prawie czterdzieści lat temu) z względnie
młodym paradygmatem obiektowym (pojęcie „obiektowego systemu bazodanowego”
pojawiło się w połowie lat osiemdziesiątych ubiegłego stulecia, a podstawowe
koncepcje zamanifestowane w [8], jakkolwiek odpowiednie prace rozpoczęły się ponad
dziesięć lat wcześniej).
Prezentowana praca doktorska jest skoncentrowana na nowatorskim podejściu
do integracji heterogenicznych zasobów relacyjnych do rozproszonego obiektowego
systemu bazodanowego. Zasoby te muszą być dostępne dla globalnych uŜytkowników
poprzez globalny model obiektowy i obiektowy język zapytań, tak aby ci uŜytkownicy
nie byli w Ŝaden sposób świadomi faktycznego modelu i składu zasobu. Opracowany
i zaimplementowany proces integracji jest całkowicie przeźroczysty i umoŜliwia
dwukierunkową wymianę danych, tj. odpytywanie zasobu relacyjnego (pobieranie
danych zgodnych z kryteriami zapytań) i aktualizację danych relacyjnych. Ponadto,
znajdujące się tuŜ nad zasobem relacyjnym struktury obiektowe (utworzone
bezpośrednio w oparciu o ten zasób) mogą zostać bezproblemowo przekształcane
i filtrowane (poprzez kaskadowo nabudowane aktualizowalne perspektywy obiektowe)
w taki sposób, aby odpowiadały modelowi biznesowemu i ogólnemu schematowi,
którego część mają stanowić. Poza aspektami przeźroczystości, największy wysiłek
Page 8 of 235
został poświęcony procedurom wydajnej optymalizacji zapytań umoŜliwiającej
działanie natywnych optymalizatorów zasobu relacyjnego. Te funkcjonalności zostały
osiągnięte poprzez wyspecjalizowany moduł stanowiący osłonę obiektowo-relacyjną,
nazywany dalej w skrócie osłoną.
Zaprezentowane w pracy doktorskiej prototypowe rozwiązanie zostało oparte
o podejście stosowe do języków zapytań i baz danych (SBA, Stack-Based Approach),
wynikające z niego język zapytań (SBQL, Stack-Based Query Lanuage) oraz
aktualizowalne obiektowe perspektywy, interfejs JDBC, protokół TCP/IP oraz język
SQL. Implementacja została wykonana w języku JavaTM. Koncepcja z jej róŜnymi
aspektami (włącznie z wirtualnym repozytorium, którego jest częścią oraz kwestiami
fragmentacji i integracji danych) przedstawiona została w ponad 20 artykułach, np. [1,
2, 3, 4, 5, 6, 7].
Praca doktorska została wykonana w ramach projektu eGov-Bus (Advanced
eGovernment Information Service Bus) wspieranego przez Wspólnotę Europejską
w ramach
priorytetu
„Information
Society Technologies” Szóstego Programu
Ramowego (nr kontraktu: FP6-IST-4-026727-STP).
Tekst pracy został podzielny na następujące rozdziały, których zwięzłe
streszczenia znajdują się poniŜej:
Chapter 1 Introduction
Wstęp
Rozdział zawiera motywację do podjęcia tematyki, tezy, cele i załoŜenia pracy
doktorskiej, opis wykorzystanych rozwiązań technicznych oraz zwięzły opis stanu
wiedzy i prac powiązanych z tematyką.
Chapter 2 The State of the Art and Related Works
Stan wiedzy i prace pokrewne
W tej części pracy omówione zostały stan wiedzy oraz inne prowadzone
na świecie prace mające na celu powiązanie baz relacyjnych z obiektowymi systemami
i językami programowania. Jako punkt wyjściowy zostało przyjęte zjawisko
„niedopasowania impedancji” pomiędzy systemami językami zapytań i językami
programowania, jego konsekwencje i moŜliwe rozwiązania problemu. W szczególności
uwzględniona została architektura mediatorów i wrapperów zaproponowana przez
G. Wiederholda w 1992 oraz szereg rozwiązań opartych na tym właśnie modelu.
W dalszej kolejności opisane zostały róŜnorodne rozwiązania ORM (Object-Relational
Page 9 of 235
Mapping) i DAO (Data Access Objects) mające na celu mapowanie istniejących danych
zgromadzonych w systemach relacyjnych na struktury obiektowe dostępne z poziomu
obiektowych
języków
programowania
oraz
zapewnianie
trwałości
obiektów
na poziomie obiektowych języków programowania w (relacyjnych) bazach danych,
odpowiednio. Część rozdziału została poświęcona takŜe technikom udostępniania
danych relacyjnych poprzez perspektywy XML oraz RDF. Oddzielny podrozdział
stanowi krótki opis roli osłony w ogólnej architekturze wirtualnego repozytorium
opracowywanego w ramach projektu eGov-Bus.
Chapter 3 Relational Databases
Relacyjne bazy danych
Znajdują się tutaj podstawy systemów relacyjnych oraz optymalizacji zapytań
relacyjnych z istniejącymi wyzwaniami i opracowanymi metodami, głównie
w odniesieniu do najpopularniejszego obecnie języka SQL. Główna część rozdziału
dotyczy architektury optymalizatora zapytań wyraŜonych w języku SQL działającego
w oparciu o rachunek relacyjny i algebrę relacyjną oraz najpopularniejszych technik
stosowanych we współczesnych systemach relacyjnych. Wspomniane zostały równieŜ
„kamienie milowe” w rozwoju optymalizacji zapytań relacyjnych takie jak System-R,
Starburst i Volcano.
Chapter 4 The Stack-Based Approach
Podejście stosowe
Rozdział koncentruje się na podstawach podejścia stosowego (SBA) oraz
obecnie zaimplementowanych w wirtualnym repozytorium metodach optymalizacji
zapytań obiektowych, głównie w odniesieniu do języka SBQL, który został
wykorzystany podczas prac nad prototypem osłony. W rozdziale ujęte zostały metody
optymalizacji SBQL oparte na przepisywaniu zapytań (optymalizacja statyczna)
związane z jak najwcześniejszym wykonywaniem selekcji, przepisywaniem perspektyw
(query modification), usuwaniem martwych podzapytań i pomocniczych nazw, jak
równieŜ wykorzystaniem indeksów.
Chapter 5 Object-Relational Integration Methodology
Metodologia integracji obiektowo-relacyjnej
Ta część pracy stanowi koncepcyjny opis opracowanej i zaimplementowanej
metodologii integracji zasobów relacyjnych do obiektu wirtualnego repozytorium.
Omówione
zostały
załoŜenia
systemu,
sposób
odwzorowania
schematów
z zastosowaniem aktualizowanych perspektyw oraz przetwarzania zapytań obiektowych
Page 10 of 235
na relacyjne. Wprowadzone zostały takŜe podstawy przetwarzania zapytania mającego
na celu umoŜliwienia działania optymalizacji obiektowej oraz relacyjnej. Rozdział
zakończony
jest
abstrakcyjnym
(niezaleŜnym
od
implementacji)
przykładem
mapowania schematów oraz optymalizacji zapytania.
Chapter 6 Query Analysis, Optimisation and Processing
Analiza, optymalizacja i przetwarzanie zapytań
W
rozdziale
znajduje
się
szczegółowe
omówienie
opracowanych
i zaimplementowanych algorytmów analizy i transformacji zapytań, które mają na celu
uzyskanie maksymalnej wydajności i niezawodności systemu. Zaprezentowane zostają
przykładowe schematy relacyjne oraz proces ich odwzorowania (za pomocą
aktualizowanych perspektyw obiektowych) na schemat obiektowy dostępny dla
wirtualnego repozytorium. Omówione metody poparte są rzeczywistymi przykładami
przekształceń zapytań odnoszących się do wspomnianych schematów testowych,
w których
szczegółowo
zademonstrowane
są
poszczególne
etapy
analizy
i optymalizacji. Na końcu rozdziału znajdują się przykłady dalszych transformacji
utworzonego schematu obiektowego, które mogą zostać zastosowane w górnych
warstwach wirtualnego repozytorium.
Chapter 7 Wrapper Optimisation Results
Wyniki optymalizacji osłony
Rozdział
stanowi
prezentację
i
dyskusję
wyników
optymalizacji
przeprowadzanej przez osłonę w odniesieniu do niezoptymalizowanych zapytań oraz do
zapytań optymalizowanych za pomocą mechanizmów SBQL. Działanie osłony
przetestowane zostało na względnie duŜym zbiorze testowych zapytań (odnoszących się
to
przykładowych
schematów
wprowadzonych
w
poprzednim
rozdziale)
wykorzystujących róŜnorodne mechanizmy optymalizacyjne języka SQL, którego
zapytania wykonywane są bezpośrednio na zasobie relacyjnym. Przeprowadzone
zostały takŜe testy dla róŜnych wielkości baz danych, w celu obserwacji zaleŜności
wyniku optymalizacji od liczności pobieranych rekordów oraz liczby wykonywanych
złączeń.
Chapter 8 Summary and Conclusions
Podsumowanie i wnioski
Zawarte zostały tutaj doświadczenia i wnioski zdobyte podczas opracowywania
osłony i testowania prototypu. Podsumowanie wyników optymalizacji przeprowadzanej
przez prototyp jednoznacznie dowodzi słuszności tez pracy doktorskiej. Osobny
Page 11 of 235
podrozdział poświęcony jest dalszym pracom, które mogą zostać wykonanie w celu
rozwoju prototypu i rozszerzenia jego funkcjonalności.
Tekst pracy został rozszerzony o trzy załączniki omawiające kolejno projekt
eGov-Bus, obiektowy system bazodanowy ODRA odpowiedzialny za obsługę
wirtualnego repozytorium, w ramach którego został zaimplementowany prototyp osłony
oraz szczegóły implementacji prototypu osłony.
Page 12 of 235
Chapter 1
Introduction
1.1 Motivation
Nowadays, we observe an enormous number of data sources. Many of them are
available via Internet (although they may be not public or restrict access to a limited
number of users). Such resources are distributed, heterogeneous, fragmented and
redundant. Therefore, information available to users (including governmental and
administrative offices and agencies as well as companies and industry) is incomplete
and requires much effort and usually manual (or human-interactive computer-aided)
analysis to reveal its desired aspects. Even after such time and resource consuming
process actually there is no guarantee that some important pieces of information are not
lost or overlooked.
The main idea partially developed and implemented in the thesis is to virtually
integrate such resources into a centralised, homogeneous, integrated, consistent and
non-redundant whole constituting a virtual repository providing some common
functionalities and services, including a trust infrastructure (security, privacy, licensing,
payments, etc.), web services, distributed transactions, workflow management, etc. The
proposed solution is focused on relational databases that are the most common in
currently utilised (and still developed) data and information management systems.
Contemporary system designers still choose them since they are mature, predictable,
stable and widely available as either commercial or free products with professional
maintenance and technical support. On the other hand programmers tend to use highlevel object-oriented programming languages that collide with relational paradigms and
philosophy (the object-relational impedance mismatch phenomenon). Global migration
to object-oriented database systems is unlike in reasonable future due to programmers’
habits, its unpredictably high costs (and time required) and lack of commonly accepted
Page 13 of 235
Chapter 1
Introduction
object-oriented standards and mature database systems (most of them are functional,
however still experimental prototypes). These factors reveal strong need for providing
a smooth bridge between relational and object-oriented technologies and systems
enabling the effective and seamless manipulating relational data in the object-oriented
manner together with combining them with purely object-oriented data.
1.2 Theses and Objectives
The Ph.D. dissertation is focused on a novel approach to integration of heterogeneous
relational database resources in an object-oriented distributed database system. Such
resources must be available to global (top level) users via a common object data model
(provided by the virtual repository) and an object-oriented query language so that these
users are not aware of an actual resource data model and storage. Relational databases
are mapped as purely object-oriented models and stores that can be transparently
queried with an object-oriented query language. In other words, they are to be
compatible with actual object-oriented resource. Therefore, relational schemata must be
wrapped (enveloped) with some dedicated pieces of software – interfaces capable of
bidirectional exchanging data between a bottom-level RDBMS and other elements of
the system. Such an interface is referred to as a relational-to-object data wrapper, or
simply – a wrapper.
This integration process must be completely transparent so that an end user
(a human or an application) is not aware of an actual data source model and structure.
On the other side, an object-oriented database communicating directly with such
a wrapper must be opaque – it must work as a black box and its underlying objectrelational interface cannot be available directly from any system element located in
upper layers. Another feature of a wrapper is its genericity: its action and reliability
must be completely independent of a resource it wraps. Also neither data
materialisation nor replication are allowed at the virtual repository side (or any other
intermediate system module), of course except for current query results that must be
somehow returned to users querying the system.
Page 14 of 235
Chapter 1
Introduction
The summarized theses are:
1. Legacy relational databases can be transparently integrated to an objectoriented virtual repository and their data can be processed and updated with
an object-oriented query language indistinguishably from purely objectoriented data without materialisation or replication.
2. Appropriate optimisation mechanisms can be developed and implemented for
such a system in order to enable coaction of the object-oriented virtual
repository optimisation together with native relational resource optimisers.
The prototype solution accomplishing, verifying and proving the theses has been
developed and implemented according to the modular reusable software development
methodology, where subsequent components are developed independently and
combined according to the predefined (primarily assumed) interfaces.
First, the state of the art in the field of integration of heterogeneous database and
information systems was analysed. A set of related solutions has been isolated with their
assumptions, strengths and weaknesses, considering possibility of their adaptation to the
designed solution (Chapter 2). The experience of previous works together with
requirements of the virtual repository and the author’s experience from the previously
participated commercial projects allowed designing and implementing the general
modular wrapper architecture (subchapter 5.1 General Architecture and Assumptions).
The architecture assures genericity, flexibility and reliability in terms of communication
and data transfer and it was not subjected to any substantial verification and
experiments. Basing on the above, the first working prototype (without optimisation
issues, but allowing transparent and reliable accessing the wrapped resource, referred to
as the naive approach) has been implemented and experimentally tested. Since this part
of the solution is mutually independent of the virtual repository and the objectives were
clear and well defined, the fast cascaded development model has been used.
The next stage concerned thorough analysis of various optimisation methods
used in relational database systems (Chapter 3) and object-oriented database systems,
especially based on the stack-based approach (Chapter 4) implemented in the virtual
repository (subchapter 2.3 The eGov-Bus Virtual Repository). The goal was to establish
SBQL syntax patterns transformable to optimiseable SQL queries. Subsequently
improved prototypes verified with experiments have allowed achieving a set of reliable
query rewriting rules and designing the desired query optimiser. The evolutionary/spiral
Page 15 of 235
Chapter 1
Introduction
development model used at this stage was implied by integration of the wrapper with
the independently developed virtual repository and its continuously changing
implementation. Also, the experiments’ results concerning the query evaluation
efficiency and reliability forced further improvement of the rewriting and optimisation
methods.
The described shortly development process consists mainly of analysis of related
solutions in terms of the thesis objectives and integration requirements with the rest of
the virtual repository. The conclusions allowed stating primary assumptions realised in
subsequent prototypes and continuously verified with experiments. The resulting
integration process is completely transparent and it enables bidirectional data transfer,
i.e. querying a relational resource (retrieving data according to query conditions) and
updating relational data. Moreover, bottom-level object-oriented data structures basing
directly on a relational resource can be easily transformed and filtered (with multiply
cascaded object-oriented updateable views, subchapter 6.3 Sample Use Cases) so that
they comply with the business model and goals of the general schema they are a part of.
Besides the transparency aspects, the most effort has been devoted to efficient
optimisation procedures enabling action of powerful native relational resource query
optimisers. The prototype solution realising the above goals and functionalities is
implemented with JavaTM language. It bases on:
•
The Stack-Based Approach (SBA), providing SBQL (Stack-Based Query
Language) being a query language with a complete computational power of
regular programming languages and updateable object-oriented views,
•
JDBC used for connecting to relational databases and manipulating their data; the
technology is supported by almost all currently used relational systems,
•
TCP/IP sockets allowing communication between distributed system components,
•
SQL used for low-level communication with wrapped databases.
1.3 History and Related Works
The art of building object-relational wrappers for integrating heterogeneous relational
database resources into homogeneous object-oriented database systems has been
developed for about fifteen years, struggling to join the much older theory of relational
databases (almost forty years since its fundamentals were presented by Edgar F. Codd)
with the relatively young object-oriented database paradigm. The term of an “object-
Page 16 of 235
Chapter 1
Introduction
oriented database system” first appeared in the mid-eighties of the last century and the
concepts were manifested in [8], however the corresponding works started more than
ten years earlier.
The concept of mediators and wrappers was first formulated by Wiederhold in
[9] as a set of indications for developing future information systems. The inspiration
was the growing common demand for information induced by broadband networks and
rapid Internet development, simultaneously obstructed by inefficient, fragmented,
heterogeneous and distributed data sources. The main idea was to support the decisionmaking software with some flexible infrastructure (expressed in terms of mediators and
wrappers) capable of retrieving complete information without human interference.
A mediator was defined as an autonomous piece of software capable of processing data
from am underlying datasource according to the general system requirements. Since
mediators were considered resource-independent, they were supplied with wrappers –
another software modules transparently interfacing between the resource and
the mediator itself. Another very important concept stated was that mediators are not to
be accessible in any user-friendly language, but a communication-friendly language –
the mediation process (i.e. creating the actual information basing on various mediators
and their datasources) is realised in the background and it must be reliable and efficient.
The user interface is realised by some top-level applications and they should be userfriendly instead.
The
approach
and
the
methodology
presented
in
[9]
had
several
implementations, e.g. Pegasus (1993), Amos (1994) and Amos II (developed since
2002), DISCO (1997) described shortly in the subchapter 2.2.1 Wrappers and
Mediators. In the virtual repository the described wrapper solution is a part of, the
mediation process relies on updateable object-oriented views based on the Stack-Based
Approach.
There exist also many other approaches aiming to “objectify” relational data –
they are described in subchapters 2.2.2 ORM and DAO, 2.2.3 XML Views over
Relational Data, 2.2.4 Applications of RDF. Nevertheless, their application in the thesis
was not considered due to completely different assumptions and goals.
Page 17 of 235
Chapter 1
Introduction
1.4 Thesis Outline
The thesis is subdivided into the following chapters:
Chapter 1 Introduction
The chapter presents the motivation for the thesis subject, the theses and the
objectives, the description of solutions employed and the short description of the state
of the art and the related works.
Chapter 2 The State of the Art and Related Works
The state of the art and the related works aiming to combine relational and
object-oriented databases and programming languages (including mediator/wrapper
approaches, ORM/DAO and XML) are briefly discussed here. Further, the wrapper
location in the overall virtual repository architecture is presented.
Chapter 3 Relational Databases
The fundamentals of relational systems and their query optimisation methods
and challenges are given, focused mainly on SQL, the most common relational query
language.
Chapter 4 The Stack-Based Approach
The stack-based approach and SBQL, the corresponding query language, are
presented, with the concept fundamentals and meaningful query examples. Also,
the SBQL optimisation methods are introduced.
Chapter 5 Object-Relational Integration Methodology
The chapter presents the concept and the assumptions of the developed and
implemented methodology for integrating relational resources into virtual repository
structures. The schema transformation procedures are given, followed by query
analysis, optimisation and processing steps. The chapter is concluded with a conceptual
example of the query processing methods applied.
Chapter 6 Query Analysis, Optimisation and Processing
The detailed description of the developed and implemented query analysis and
transformation methods is given, followed by demonstrative examples based on two
relational test schemata introduced. The chapter ends with more complex examples of
Page 18 of 235
Chapter 1
Introduction
further schema transformations with object-oriented updateable views applicable in
upper levels of the virtual repository.
Chapter 7 Wrapper Optimisation Results
The chapter provides the discussion of the optimisation results for the prototype
based on a relatively large set of sample queries and various database sizes (for the
schemata presented in the previous chapter).
Chapter 8 Summary and Conclusions
The conclusions and future works that can be performed for the further wrapper
prototype development.
The thesis text is extended with three appendices describing the eGov-Bus
project, the ODRA platform responsible for the virtual repository environment and the
wrapper prototype implementation issues.
Page 19 of 235
Chapter 2
The State of the Art and Related
Works
The art of object-oriented wrappers built on top of relational database systems has been
developed for years – first papers on the topic are dated to late 80s and were devoted to
federated databases. The motivation for the wrappers is reducing the technical and
cultural difference between traditional relational databases and novel technologies based
on object-oriented paradigms, including analysis and design methodologies (e.g., based
on UML), object-oriented programming languages (e.g. C++, Java, C#), object-oriented
middleware (e.g., based on CORBA), object-relational databases and pure objectoriented databases. Recently, Web technologies based on XML/RDF also require
similar wrappers. Despite the big pressure on object-oriented and XML-oriented
technologies, people are quite happy with relational databases and there is a little
probability that the market will massively change soon to other data store paradigms
(costs and time required for such migration are unpredictably huge).
Unfortunately, the object-orientation has as many faces as existing systems,
languages and technologies. Thus, the number of combinations of object-oriented
options with relational systems and applications is very large. Additionally, wrappers
can have different properties, in particular, can be proprietary to applications or generic,
they can deal with updates or be read-only, can materialise objects on the wrapper side
or deliver purely virtual objects, can deal with object-oriented query language or
provide some iterative “one-object-in-a-time” API, etc. [10]. This causes an extremely
huge number of various ideas and technologies. For instance, Google reports more than
500 000 results as a response to the query “object relational wrapper”.
Page 20 of 235
Chapter 2
The State of the Art and Related Works
2.1 The Impedance Mismatch
The problem of the impedance mismatch between query languages and programming
languages is very well realised and documented, for example see [11]. The impedance
mismatch term is used here as a mere analogy to the electrical engineering
phenomenon. It refers to a set of conceptual and technical difficulties which are often
encountered when a RDBMS is used by a program written in an object-oriented
programming language or style, particularly when objects and/or class definitions are
mapped in a straightforward way to database tables and/or relational schemata.
The basic technical differences and programming difficulties between
programming and query (mainly the most popular SQL) languages refer to [11]:
•
Syntax – a programmer is obliged to use two different language styles and obey
two different grammars,
•
Typology – a query language operates on types defined in a relational database
schema, while a programming language is usually based on a completely different
typological system, where nothing like relation exists. Most programming
languages are supplied with embedded static (compile time) typological control,
while SQL does not provide it (dynamic binding),
•
Semantics and language paradigms –concepts of language semantics are
completely different. A query language is declarative (what to retrieve, not how to
retrieve), while programming languages are imperative (how to do, instead of
what to do),
•
Pragmatics – a query language frees a programmer from many data organisation
and implementation details (collection organisation, presence of indices, etc.),
while in programming languages these details must be coded explicitly,
•
Binding phases and mechanisms – query languages are interpreted (late binding),
while programming languages assume early binding during compilation and
linking phases. This causes many problems, e.g. for debuggers,
•
Namespaces and scoping rules – query and programming languages have their
own namespaces that can contain the same names with different meanings.
Mapping between these namespaces requires additional syntactical and semantic
tools. Programming language namespace is hierarchical and obeys stack-based
scoping rules – these rules are ignored by a query language, which causes many
inconveniences,
Page 21 of 235
Chapter 2
•
Null values – databases and query languages are supplied with dedicated tools for
storing and processing null values, these means are not available to programming
languages (or null is regarded completely different from the one in a database),
•
Iterative schemata – in a query language iterations are embedded in its operators’
semantics (e.g. selection, projection, join), while in a programming language
iterations must be realised explicitly with some loops (e.g. for, while, repeat).
Query result processing with a programming language requires dedicated facilities
like cursors and iterators,
•
Data persistence – query languages process only persistent (physically stored)
data, while programming languages operate only on volatile (located in operating
memory) data. Combination of these languages requires using dedicated language
constructs to parameterize queries with language variables and other language and
architectural means for transmitting stored data to memory and reversely,
•
Generic programming means – these means in a query language are based on
reflection (e.g. in dynamic SQL). Using something similar in a programming
language is usually impossible because of early binding. Other means are used
instead, e.g. higher-level functions, casting, switching to lower language level,
polymorphism or templates.
SBQL, the query language with a complete functionality of an object-oriented
programming language (described in Chapter 4) applied in the virtual repository does
not involve the impedance mismatch unless it is embedded in some programming
language.
The impedance mismatch is also very often defined in terms of relational and
object-oriented database systems and their query and programming languages,
respectively, regarding mainly their diverse philosophical and cultural aspects [12, 13].
The most important and pointed differences are found in interfaces (relational data only
vs. object-oriented “behaviour”), schema binding (much more freedom to an object
structure than in strict relational tables) and access rules (a set of predefined relational
operators vs. object-specific interfaces). To solve this mismatch problem supporters of
each paradigm argue that the other one should be abandoned. But this does not seem to
be a good choice, as relational systems are still extremely popular and useful in many
commercial, business, industrial, scientific and administrative systems (the situation
does not seem to change soon), while programmers are willing to use object-oriented
Page 22 of 235
Chapter 2
languages and methodologies. There are proposed, however, some reasonable solutions
that can minimise the impact of the impedance mismatch on further development on
information systems. Preferably, object-oriented programmers should realise that a very
close mapping between relational data and object-oriented is erroneous and leads to
many complications. Their object-oriented code should be closer to relational model, as
with application of JDBC for Java (applied in the wrapper prototype implementation) or
ADO.NET [14] for C#.
2.2 Related Works
Below there are presented some of the most important solutions aiming to access
(mainly) relational databases with object-oriented tools (preferably in distributed
environments) over last fifteen years roughly grouped and classified by their
methodologies and main approaches. Many of them are not strictly related to the subject
of the thesis, nevertheless their importance in the field cannot be neglected, as well as
some experiences and concepts became useful for the prototype implementation.
2.2.1 Wrappers and Mediators
The concept of wrappers and mediators to heterogeneous distributed data sources was
introduced in [9] as vision and indication for developing future information systems for
next years (appropriate solutions were usually unavailable yet and were assumed to
appear within next ten years). The idea appeared as a response to a growing demand for
data and information triggered by the evolution of fast broadband networks and still
limited by distributed heterogeneous and inefficient data sources. A knowledge that
information exists and is accessible causes users’ expectations, while real-life
experience showing that this information is not available in a useful form and it cannot
be combined with other pieces of information makes much confusion and frustration.
The main reasons for designing and implementing proposed mediation systems are
heterogeneities of single databases, where a complete lack of data abstractions exits and
no common data representation is available. This issue makes combining information
from various database systems very difficult and awkward.
The main goal of this architecture was to design mediators as relatively simple
software modules transparently keeping specific data information and sharing those data
abstractions with higher level mediators or applications. Therefore for large networks
Page 23 of 235
Chapter 2
mediators should be defined for primitive (bottom-level) mediators that can still mediate
for lower modules and data sources. Mediators described in [9] were provided with
some technical and administrative “knowledge” concerning how to process data behind
them in order to achieve effective decision process supported by distributed, modular
and composable architecture. They must work completely automatically, without any
interference of human experts, producing complete information from available data.
The described information processing model of the proposed mediations system
was based on existing human-based or human-interactive solutions (without any
copying or mimicking of them, however). The mediation process is introduced in order
to
overcome
the
feature
of
a
database-application
interface,
where
only
a communication protocol and formats are defined – the interface does not resolve
abstraction and representation problems. A proper interface must be active, which
functionality is defined as mediation. Mediation includes all the processing needed to
make an interface work, knowledge structures driving data transformations and any
intermediate storage (if required). Mediators defined as above should be realised as
explicit modules acting as an intermediate active layer between user applications and
data resources in sharable architecture, independent of data resources. There are also
defined meta-mediators responsible for managing and enabling allocating and accessing
actual mediators and data sources.
The concept of splitting a database-application interface with a mediator layer
makes two additional interfaces appear: between a database server and a mediator and
between a mediator and a user application.
Since programmers tend to stick to one interface language and they are
unwilling to switch to any other, unless the current one becomes somehow inconvenient
or ineffective, access to a mediator must be defined in a universal high-level extensible
and flexible language. The proposal does not suggest any particular language, rather
than its recommendation and features. A strong assumption is made that such
a language does not have to be user-friendly, but machine- and communication-friendly,
as it works in the background – any user interaction is defined on an application level
and it is completely separated from a mediation process. This statement allows omitting
problems and inadequacies introduced to SQL [15].
On the other side of a mediator, an interaction with a data source takes place. A
communication can be realised with any common interface language supported by
Page 24 of 235
Chapter 2
a resource, e.g., SQL for relational databases. A mediator covering several data sources
can be also equipped with appropriate knowledge on how to combine, retrieve and filter
data.
The most effective mediation can be achieved if a mediator serves a variety of
application, i.e. it is sharable. On the other hand, applications can compose their tasks
basing on a set of underlying mediators (unavailable information may motivate creating
new mediators).
The simplified mediation architecture is presented in Fig. 1.
User
application 1
User
application 2
User-friendly
external language
Mediator
Communication-friendly
internal language
Mediator
Mediator
Wrapper
Mediator
Mediator
Mediator
Wrapper
Wrapper
Wrapper
Resource
Mediator
Wrapper
Resource
Mediator
Wrapper
Resource
Mediator
Mediator
Wrapper
Wrapper
Resource language
Resource Resource Resource
Resource Resource
Fig. 1 Mediation system architecture
The general mediation concept presented briefly above deserved several
implementations; the most important are described shortly in the following subsections.
Its intentions and architectural aspects are also reflected in the virtual repository idea
(subsection 2.3 The eGov-Bus Virtual Repository).
2.2.1.1 Pegasus
Pegasus [16, 17] was a prototype multidatabase (federated) system. Its aim was to
integrate heterogeneous resources into a common object-oriented data model. Pegasus
accessed foreign systems with mapper and translator modules. Mappers were
responsible for mapping a foreign schema into a Pegasus schema, while translators
processed top-level queries expressed with an object-oriented query language
Page 25 of 235
Chapter 2
(HOSQL1, an Iris query language [18, 19]) to native queries applicable to an integrated
resource.
A mapping procedure generated a Pegasus schema covering a resource so that it
was compatible with a common database model (a function realised by object-oriented
updateable views in the thesis) and defined data and operations available from
a resource. Dedicated mappers and translators were designed for different resources and
data models, including relational databases, and available as separate modules
(mapper/translator pairs).
Mapping of relational databases served three different cases: with both primary
and foreign keys given, without foreign keys and without primary keys [20]. In the first
case a user-defined Pegasus type was created for each relation that has a simple primary
key. Then for each user-defined type and for every non-key attribute in the associated
relation, a function was created with the type as argument and the attribute as result. For
relations with composite primary key, multiargument functions were created with the
primary key fields as arguments and the attribute as result. No type creation was needed
in the mapping. If an attribute in the mapped relation was a foreign key, it was replaced
with the type corresponding to that key. In case of no foreign keys available, userdefined types could not be created, since the attributes that refer to a primary key field
cannot be substituted by its associated type. However, a functional view of the
underlying relations was created in a manner similar to the previous case. In the last
case (no primary keys’ information), as functional dependencies of any kind were not
known, only a predicate function was created for each relation, where all the attributes
of the relations formed an argument of the function. A predicate function contained one
or more arguments returned a boolean result.
An input query expressed in terms of Pegasus schema with HOSQL was parsed
into an intermediate syntax tree (F-tree) with functional expressions (calls, variables,
literals) of HOSQL as nodes. The F-tree was then transformed into a corresponding Btree with catalogue information – for foreign schema queries (e.g., targeting a relational
database) this information included all necessary connection parameters (a database
name, a network address, etc.). If such a query aimed at a single foreign system, it was
optimised by Pegasus and sent directly to the resource (a minimum of processing at
1
HP AllBase SQL
Page 26 of 235
Chapter 2
a Pegasus side). In case of a query aiming at multiple foreign schemata, its parts were
grouped by a destination in order to reduce a number of resource invocations
(optimisation issues), which process resulted in D-tree structure. Basing on statistical
information Pegasus performed a cost-based optimisation (join order, join methods, join
sites, intermediate data routing and buffering, etc.) within such a D-tree. Once
optimised, a query was translated into a foreign resource query language and executed.
2.2.1.2 Amos
Amos2, another early work devoted to building object-oriented views over relational
databases and querying these databases with object-oriented query languages (basing of
the Pegasus experience and the mediator approach [9]) has been described in [21]. In
the system a user interacts with a set of object-oriented views that translate input queries
to a relational query (a set of queries) that are executed on a covered resource.
A returned result is again converted (possibly composed from a set of partial results of
single relational queries) to an object-oriented form and presented to a user. Such a view
besides of covering a relational resource was also capable of storing its own purely
object-oriented data and methods. Therefore input queries can transparently combine
relational and object references. One of strengths of these views was a query
optimisation based on an object and relational algebra and calculus. The most important
weakness was a lack of support for relational data updates. Object-oriented views were
regarded as means to integrate a relational resource into multidatabase (federated)
systems based on an object-oriented common data model (CDM) [22]. This approach
assumed also applying views for a general case of mapping any heterogeneous
component database into such a system, which approach is also reflected in the thesis
virtual repository architecture. The prototype has been implemented with Amos [23]
object-oriented database system over Sybase RDBMS.
An object-oriented view was designed over an existing relational database to
express its semantics in terms of an object-oriented data model. At this mapping stage
there was an assumption that a relational database was directly mappable to an objectoriented model, a set of relational views (or other modifications) was required for
a relational resource to enable such mapping otherwise. After this mapping procedure
a system was ready for querying. A query evaluation methodology for the described
2
Active Mediator Object System
Page 27 of 235
Chapter 2
system was developed as an extension to conventional methodology [24] where a query
cost plans are compared with a translator introduced. A translator was a middleware
piece of software responsible for replacing predicates during views’ expansion with
their base relational expressions (in general: one-to-one mapping between relational
tuples and Amos objects.
2.2.1.3 Amos II
Amos II [25, 26] is a current continuation of the Amos project shortly described above
being a prototype object-oriented peer mediator system with a functional data model
and a functional query language – AmosQL based on a wrapper/mediator concept
introduced in [9]. The solution enables interoperations between heterogeneous
autonomous distributed database systems. Transparent views used in Amos II mediators
enable covering other mediators, wrapping data sources and native Amos objects in
a modular and composable way. The Amos II data manager and its query processor
enable defining additional data types and operators with some programming languages
(e.g., Java or C).
Amos II is based on a mediator/wrapper approach [9] with mediator peers
communicating over Internet rejecting a centralised architecture with a single server
responsible for a resource integration process and translating data into a CDM. Each
mediator peer works as a virtual database with data abstraction and a query language.
Applications access data from distributed data sources through queries to views in
mediator peers. Logical composition of mediators is achieved when multidatabase
views in mediators are defined in terms of views, tables, and functions in other
mediators or data sources.
Due to an assumed heterogeneity of integrated data sources, Amos II mediators
can be supplied with one or more wrappers processing data from covered resources,
e.g., ODBC relational databases, XML repositories, CAD systems, Internet search
engines. In terms of Amos II, a wrapper is a specialised facility for query processing
and data translations from between an external resource and the rest of the system. It
contains an interface to a resource and information on how to efficiently translate and
process queries targeting at its resource. In Amos II architecture its external peers are
also regarded as external resources with a dedicated wrapper and special query
Page 28 of 235
Chapter 2
optimisation methods based on a distribution, capabilities, costs, etc. of the different
peers [27].
Amos II assumes an optimisation of AmosQL queries prior to their execution.
This procedure is based on an object calculus and an object algebra. First, a query is
compiled and decomposed into algebra expressions. The object calculus is expressed in
an internal simple logic-based language called ObjectLog [28], which is an objectoriented dialect of Datalog [29, 30]. AmosQL optimisation rules rely on its functional
and multidatabase properties. Distributed multidatabase queries are decomposed into
local queries executed on an appropriate peer (load balancing is also taken into
account). At each peer, a cost optimiser based on statistical estimates is further applied.
Finally, an optimised algebra is reinterpreted to produce a final result of a query.
Fig. 2 Distributed mediation in Amos II [26]
Since Amos II is a distributed system, it introduces a distributed mediation
mechanism, where peers communicate and interoperate with TCP/IP protocol. The
mediation process is illustrated in Fig. 2 where an application accesses data from two
different mediators over heterogeneous distributed resources. Peer communication is
depicted with thick lines (the arrows indicate peers acting as servers), and the dotted
lines show a communication process with a name server to register a peer and to obtain
information on a peer group. The name server stands for a mediator server keeping a
simple meta-schema of a peer group (e.g., names, locations, etc.; a schema of each peer
is kept and maintained at its side locally). The information in the name server is
managed without explicit operator intervention; its content is managed through
messages from autonomous mediator peers. Mediator peers usually communicate
directly without involving a name server that is used only when a new mediator peer
connection is established. A peer can communicate directly with any other peer within
its group (Fig. 2 shows communication between different topological peer levels,
Page 29 of 235
Chapter 2
however). An optimal communication topology is established by the optimisation
process for a particular query (individual peer optimiser can also exchange data and
schema information to produce an optimised execution plan of a query).
2.2.1.4 DISCO
DISCO3 [31,32] was designed as a prototype heterogeneous distributed database based
on underlying data sources according to a mediator/wrapper paradigm [9]. It realised
common
problems
concerning
such
environments,
i.e.
inconsistencies
and
incompatibilities of data sources and unavailability and disappearance of resources.
Moreover, much attention was put to cost-based query optimisation among wrappers.
Besides regular databases, file systems and information retrieval systems (multimedia
and search engines), DISCO was oriented on WWW resources (e.g., HTML
documents). Its main application was search and integration of data stored in distributed
heterogeneous resources, which solution should provide uniform and optimised
information access based on a common declarative query language.
In a mediator-based architecture, end users interacted with an application which
in turn accessed a uniform representation of underlying resources with a SQL-like
declarative query language. These resources were covered by mediators encapsulating
a representation of multiple data sources for this query language. Mediators could be
developed independently or might be combined, which enabled to deal with
complexities of distributed data sources. Queries entering mediators, expressed with an
algebraic language supporting relational operations, were transformed into subqueries
distributed to appropriate data sources. Wrappers interfacing between mediators and
actual data sources represented resources as structured views. They accepted
(sub)queries from mediators and translated them into an appropriate query language of
a particular wrapped resource. On the other hand, query results were reformatted so that
they were acceptable by wrappers’ mediators.
Each type of a data source required implementing a dedicated wrapper.
A wrapper had to contain a precise definition of data source’s capabilities so that
a subset of the algebraic language supported by a wrapper could be chosen (resource
heterogeneity). On registration of a wrapper by a mediator, a wrapper transmitted this
3
Distributed Information Search COmponent
Page 30 of 235
Chapter 2
subset description that could be supported in a mediator. This information was
automatically incorporated by a mediator in a query transformation process.
As mentioned above, cost-based optimisation was one of the most important
issues in DISCO. As in the case of defining a language subset available via a wrapper,
optional cost information was also included at a wrapper implementation stage. This
information concerned selected or all of the algebraic operations supported by
a wrapper. Again, cost information was transmitted to a mediator when registering
a wrapper. This wrapper-specific cost information overrode a general cost information
used by a mediator so that a more accurate cost model could be used. The DISCO’s
cost-based optimiser was used to produce the best possible query evaluation plan.
The problem of resource unavailability/disappearance was solved on a level of
partial query results’ composition. A partial answer to a query being a part of a final
result was produces by currently available data sources. It contained also a query
representing finished or unfinished parts of the answer. When a previously unavailable
data source became accessible, a partial answer could be resubmitted as a new query to
obtain a final answer to the original query.
Wrapper-mediator interaction occurred in two phases: first – when a wrapper
was registered by a mediator (a mediator could register various wrappers), second –
during query processing. During registration a wrapper's local schema its support for
query processing capabilities and specific cost information were supplied to a mediator.
A mediator contained a global schema defined by its administrator (this global schema
is seen by applications) and a set of views defining how to connect a global schema to
local schemata. During a query processing phase a query from an application was
passed to a mediator transforming it into a plan consisting of subqueries and
a composition query (information how to produce a final result for a mediator). This
plan was optimised with respect to information provided by wrappers in the integration
stage. Then, a resulting plan was executed by issuing subqueries to appropriate
available wrappers that evaluated them on their underlying resources and returned
partial results. In an ideal case, when all the wrappers were available, a mediator
combined their partial answers according to the composition query and returned a final
result to the applications. If some wrappers were unavailable, however, a mediator
returned only a partial answer. The application could extract from it some information
depending on which wrappers were accessible.
Page 31 of 235
Chapter 2
2.2.1.5 ORDAWA
ORDAWA4 [33] was an international project whose aim was to develop techniques for
integration and consolidation of different external data sources in an object–relational
data warehouse. The detailed issues consisted of construction and maintenance of
materialised
relational
and
object-oriented
views,
index
structures,
query
transformations and optimisations, and techniques of data mining.
The ORDAWA architecture [33] is realised as a customised wrapper/mediator
approach. The object-oriented views with wrappers constitute mediators transforming
data from bottom-level resources. The integrator works also as a top-level mediator
basing on underlying mediators.
Any further description of ORDAWA is out of the scope of a thesis, since it
assumes data materialisation, which approach has been strictly rejected here – it was
mentioned just as another implementation of the wrapper/mediator approach.
2.2.2 ORM and DAO
ORM5 [34, 35, 36, 37] is a programming technique for converting data between
incompatible type systems in databases and object-oriented programming languages. In
effect, this creates a “virtual object database” which can be used from within the
programming language.
In common relational DBMS, stored values are usually primitive scalars, e.g.,
strings or numeric values. Their structure (tables and columns) creates some logic that
in object-oriented programming languages is commonly realised with complex
variables. ORM aims to map these primitive relational data onto complex constructs
accessed and processed from an object-oriented programming language. The core of the
approach is to enable bidirectional transparent transformations between persistent data
(stored in a database) with their semantics and transient objects of a programming
language, so that data can be retrieved from a database and stored there as results of
a programme executions. An object-relational mapping implementation needs to
systematically and predictably choose which relational tables to use and generate the
necessary SQL code.
4
5
Object–Relational Data Warehousing System
Object-Relational Mapping
Page 32 of 235
Chapter 2
Unfortunately, most of ORM implementations are not very efficient. The reason
for this are usually are extra mapping operations and memory consumption occurring in
an additional middleware layer introduced between a DBMS and an end-user
application.
On the other hand, DAO6 [38, 39] indents to map programming language objects
to database structures and persist them there. Originally the concept was developed for
Java and most applications are created for this language. However. an early attempt to
build DAO architecture referred to as an “object wrapper” (the term of DAO appeared
later) was shown in [40], while a general architecture and constraints for building
object-oriented business applications over relational data stores with persistence issues
was extensively described and discussed in [41, 42, 43].
The most important differences between ORM and DAO with their advantages
and disadvantages are very well depicted in [44], the crucial distinction appears at the
very beginning of design – in ORM a database schema already exists and an application
code is created according to it (objects-to-tables mapping), for DAO application logic is
established and then appropriate database persistence structures are developed (tablesto-objects mapping). However, in real-life implementations both approaches are very
often mixed in favour of API or application usability and features.
Below there are shortly described the most common and important approaches to
ORM and DAO, mainly for Java (due to the prototype implementation technology),
a more complete list of ORM solutions for different platforms and languages can be
found at [45].
2.2.2.1 DBPL Gateway
The early project concerning the ORM approach was DBPL7 [46] being a Modula-2
[47] extension provided with a gateway to relational databases (Ingres [48] and Oracle
[49]). The API extension concerned a new bulk data type constructor “relation”,
persistence and high-level relational expressions (queries) based on the nested relational
calculus, maintaining strong typing and orthogonality. The gateway itself was
a mechanism enabling regular DBPL programmes accessing relational databases, i.e. no
6
7
Data Access Object
DataBase Programming Languages
Page 33 of 235
Chapter 2
SQL statement strings were embedded in a code and therefore creating transparent
access to databases. DBPL constructs referring to relational data were automatically
converted to corresponding SQL expressions and statements referring to relational
tables. The motivation for designing such solution was the well known impedance
mismatch between programming languages and query languages, a misleading userfriendliness of query languages (mainly SQL), a bottom-up evolution of query
languages resulting in awkward products like Oracle PL/SQL.
A user work on relational data as on persistent DBPL collections transparently
mapped onto relational structures. SQL queries resulting from background
transformations were optimised by SQL optimisers, therefore a general performance
was acceptably good. The conceptual and implementational details of the gateway are
described in details in [46].
2.2.2.2 EOF
EOF8 [50] developed by NeXT Software, Inc. (overtaken by Apple Computer in 1997)
in often considered as the best ORM implementation. However, the overall impact of
EOF on this technology development was rather poor, as it was strictly integrated with
OpenStep [51], the NeXT’s object-oriented API specification for operating systems,
realised within OPENSTEP. EOF was also implemented as a part of WebObjects, the
first object-oriented Web Application Server. Currently, it provides a background
technology for the Apple’s e-commerce solutions, e.g., iTunes Music Store.
The Apple’s EOF provides two current implementations: with Objective-C [52]
(Apple Developer Tools) and with Java (WebObjects 5.2). EOF has been also used as
a basis for an open-source Apache Cayenne described shortly below.
2.2.2.3 Apache Cayenne
Cayenne [53] is an open-source ORM implementation for Java inspired by EOF
provided with remote services support. It is capable of transparent binding of one or
more database schemata to Java objects, managing transactions, SQL generation,
performing joins, sequences, etc. A solution of Remote Object Persistence enables
persisting Java objects with native XML serialisation or via Web Services – such
8
Enterprise Objects Framework
Page 34 of 235
Chapter 2
persistent objects can be passed to non-Java clients, e.g., Ajax [54] supporting browsers.
Java object generation in Cayenne is based on the Velocity [55] templating engine and
the process can be managed with a GUI application.
2.2.2.4 IBM JDBC wrapper
A simple ORM solution proposed by IBM [56] is based on JDBC9. Relational structures
(databases, tables, rows, row sets and results) are mapped onto appropriate Java classes’
instances. Particular structures are accessible by their names or SQL-like predicates
given as string parameters to Java methods (including constructors).
This simplified approach enables again working with relational databases from
within a programming language.
2.2.2.5 JDO
JDO10 [57,58] is a specification of Java object persistence. The main application of JDO
is persisting Java objects in a database rather than accessing database with
a programming language (as described in the above approaches) – JDO is a DAO
approach. Object persistence is defined in external XML metafiles, which may have
vendor-specific extensions. Currently, JDO vendors offer several options for
persistence, e.g., to RDBMS, to OODBMS, to files [59]. JDO is integrated with Java EE
in several ways. First of all, the vendor implementation may be provided as a JEE
Connector. Secondly, JDO may work in the context of JEE11 transaction services.
The most common supporter and implementer of JDO is Apache JDO [60], other
commercial and non-commercial implementation (e.g., Apache OJB [61], XORM [62],
Speedo [63]) are listed in [64].
2.2.2.6 EJB
The EJB12 [65, 66, 67] specification intends to provide a standard way to implement the
back-end “business” code typically found in enterprise applications (as opposed to
“front-end” user-interface code). Such code was frequently found to reproduce the same
types of problems, and it was found that solutions to these problems are often repeatedly
9
Java DataBase Connectivity
Java Data Objects
11
Java Enterprise Edition
12
Enterprise Java Beans
10
Page 35 of 235
Chapter 2
re-implemented by programmers. Enterprise Java Beans were intended to handle such
common concerns as persistence, transactional integrity, and security in a standard way,
leaving programmers free to concentrate on the particular program at hand.
Following EJB2, the EJB3 specification covered issues of persistence, however
there were some crucial differences from the JDO specification. As a result, there are
several implementations of JDO [64], while EJB3 was still under development. The
situation became yet more unfavourable for EJB, as a new standard for Java persistence
is being introduced (JPA described below) and persistence has been excluded from the
EJB3 core.
2.2.2.7 JPA
JPA13 [68,69] is specified in a separate document within the EJB specification [66]. The
package javax.persistence does not require the EJB container and therefore it can
be applied to J2SE14 environment (still required by JDO), according to the POJO15
concept. JPA differs from JDO even more. It is a classical ORM standard, not
transparent object persistence, independent of the technology of the underlying data
store.
2.2.2.8 Hibernate
Hibernate [70] is a DAO solution for Java and .NET [71] (NHibernate). Similarly to
JDO, Hibernate is not a plain ORM solution, as its main application is persisting
programming language objects. It allows to develop persistent classes according to the
object-oriented
paradigm,
including
association,
inheritance,
polymorphism,
composition, and collections. As for querying, Hibernate is provided with its own
portable query language (HQL) being an extension to SQL, it supports also a native
SQL and an object-oriented Criteria and Example API (QBC and QBE). Therefore data
source optimisation can be applied.
13
Java Persistence API
Java 2 Standard Edition
15
Plain Old Java Object
14
Page 36 of 235
Chapter 2
2.2.2.9 Torque
Torque [72] is an ORM for Java. It does not use a common approach of reflection to
access user-provided classes – a Java code and classes are generated automatically
(including data objects) basing on an XML relational schema description (generated or
created manually). This XML schema can be also used to generate and execute SQL
scripts for creating a database schema. In the background Torque uses transparently
JDBC and database-specific implementation details, therefore a Torque-based
application is completely independent of a database – Torque does not use any DBMS
specific features and constructs.
2.2.2.10 CORBA
CORBA16 [73, 74] is a standard defined by the OMG17 [75] enabling software
components written in multiple computer languages and running on multiple computers
to work together. Due to CORBA’s application in distributed systems, object
persistence (in terms of DAO) seem a very important issue, unfortunately not defined in
the standard itself. Some attempts has been made in order to achieve persistence in
object-oriented databases (e.g., [76]) and relational ones (e.g., [77]). Nevertheless, these
solutions seem somehow exotic and are not very popular.
The detailed discussion of persistence and ORM issues for CORBA is contained
within [78].
2.2.3 XML Views over Relational Data
Using XML18 [79] for processing relational databases in object-oriented systems can be
regarded as an extension to ORM, but its technique is completely different from the
ones mentioned above. The concept of presenting relational data appeared as XML can
reflect relational structures easily (with application of a single or multiple documents), it
is portable and can be further queried (for example with XQuery [80, 81]) or processed
in an object-oriented manner.
The most straightforward approach aims it build XML views over relational
databases (over 1 000 000 Google hits for “xml relational view”), preferably with data
16
Common Object Request Broker Architecture
Object Management Group
18
eXtensible Markup Language
17
Page 37 of 235
Chapter 2
updating capabilities. There are a lot of works devoted to this subject with successful
implementations (much less realising relational data updates, however).
The dedicated solution used in designing XML views over relational and objectrelational data sources is XPERANTO19 [82, 83, 84] by IBM. The goal of XPERANTO
was to provide XML views with XQuery querying capability, with no need to use SQL,
i.e. without user’s knowledge about an actual data model. The system translated XMLbased queries into SQL requests executed on its resource, received regular SQL results
and transformed them into XML documents presented to an end-user application.
XPERANTO’s intention was to push as much as possible of query evaluation to
a resource level, so that its SQL optimisers could work and XML-level processing was
minimised. Another projects similar to was XTABLES [85].
Updating relational data from XML views was solved in UXQuery [86, 87]
based on a subset of XQuery [80, 81] used for building views so that s an update to the
view could be unambiguously translated to a set of updates on the underlying relational
database, assuming that certain key and foreign key constraints held. Another
implementation [88] of updateable XML views for relational database was created for
the CoastBase [89,90] project within the IST programme of the European Commission
(contract no: IST-1999-11406). A concept of triggers for updating is introduced in
Quark [91] (built over IBM DB2) [92].
2.2.4 Applications of RDF
Alternative approaches concerning RDF20 [93] and its triples (the subject-predicateobject concept) has been also adapted for accessing relational resources in an objectoriented manner.
2.2.4.1 SWARD
SWARD21 [94, 95] (a part of Amos II project) is a wrapper to relational databases
realised with RDF format and technologies. A user defines a wrapper exporting
a chosen part of a database as a view in terms of the RDF metadata model. This view
19
Xml Publishing of Entities, Relationships, ANd Typed Objects
Resource Description Framework
21
Semantic Web Abridged Relational Databases
20
Page 38 of 235
Chapter 2
automatically generated from two simple mapping tables can be queried with either
SQL or RDQL [96].
SWARD is a system based on a virtual repository concept enabling scalable
SQL queries to RDF views of large relational databases storing government documents
and life event data. Such a view enables querying both relational metadata and stored
data, which supports scalable access to large repositories. A RDF view of a relational
database is defined as a large disjunctive query. For such queries an optimisation is
critical not only concerning data access time but also the time to perform the query
optimisation itself. The SWARD is supplied with novel query optimisation techniques
based on query rewriting and compile time evaluation of subexpressions enabling
execution of real-world queries to RDF views of relational databases.
The process of mapping relational structures to RDF reflects tables in classes
and columns in properties, according to a given ontology. A relational database is
represented as a single table of RDF triples called an universal property view (UPV).
The UPV is internally defined as a union of property views, each representing one
exported column in a relational database (a content view) and a representation of
relational metadata (a schema view). A generation of UPV is automatic, provided a user
specifies for a given relational database and ontology a property mapping table
declaring how to map exported relational columns to properties of the ontology.
Further, a user specifies a class mapping table declaring URIs of exported relational
table.
2.2.4.2 SPARQL
SPARQL22 [97, 98, 99] is a RDF query language and protocol for semantic webs. There
are some works aiming to use its paradigms for accessing relational data (e.g., [100,
101, 102, 103, 104]), however the concept and its real-life applications are currently
difficult to evaluate. It is mentioned here as another approach to the issue.
22
SPARQL Protocol and RDF Query Language (a recursive acronym)
Page 39 of 235
Chapter 2
2.2.5 Other Approaches
There is also a set of other approaches to the issue of integration of object-oriented
systems with relational data stores that cannot be classified to any of the above
categories. Below there are shown some of them.
2.2.5.1 ICONS
The ICONS23 project [105] was supported by the European Community’s “Information
Society Technology” research and realised under the Fifth Framework Programme
(contract no: IST-2001-32429). The ICONS project focused on bringing together into
coherent, web-based system architecture the advanced research results, technologies,
and standards, in order to develop and further exploit the knowledge-based, multimedia
content management platform. Integrating and extending known results from the AI and
database management fields, combined with advanced features of the emerging
information architecture technologies. As ICONS’s result, a prototype solution has been
developed and implemented in the Structural Fund Project Knowledge Portal [106, 107]
published in Internet.
The overall project objective was to integrate and extend the existing research
results and standards in the area of knowledge representation as well as integration of
pre-existing, heterogeneous information sources. The knowledge representation
research covered such paradigms as logic (disjunctive Datalog [29, 30]), semantic nets
and conceptual modelling (UML [108] semantic data models and the RDF standard
[93]), as well as the procedural knowledge represented by directed graphs (WfMC
[109]). The ICONS prototype managed an XML-based multimedia content repository,
storing complex information objects and/or representations (proxies) of external
information resources such as pre-existing, heterogeneous databases, information
processing system outputs, and Web pages, as well as the corresponding domain
ontologies.
ICONS assumed using existing data and knowledge resources, including
heterogeneous databases, legacy information processing systems and Web information
sources. The overall integration architecture with information management and
processing methodologies [110, 111, 112] are completely out of the scope of the thesis,
23
Intelligent CONtent management System
Page 40 of 235
Chapter 2
except for techniques of accessing heterogeneous database systems. ICONS assumed a
simplified object-oriented model having virtual non-nested objects with repeated
attributes and UML-like association links used for navigation among objects. Both
repeated attributes and links were derived from the primary-foreign key dependencies in
the relational database by a parameterization utility. The ICONS repository was
available as an API to Java (without use of JDBC). However, because Java has no query
capabilities, all the programming had to be done through sequential scanning of
collections of objects. Obviously, this gave no chances to the SQL optimiser, hence
initially the performance was extremely bad. It was improved by some extensions, for
instance, by special methods with conditions as parameters that were mapped into SQL
where clauses, but in general the experience was disappointing.
2.2.5.2 SDO
SDO24 [113, 114, 115] are a new approach to a programming model unifying data
access and manipulation techniques for various data source types. According do SDO,
programming procedures, tools, frameworks and resulting applications should be
therefore
“resource-insensitive”,
which
improves
much
design,
coding
and
maintenance.
SDO are implemented as a graph-based framework similar to EMF25 [116] (its
detailed description can be found at [115] and it is out of the scope of the thesis). In
the framework they are defined data mediator services (DMS, not declared in the
specification [117], however) whose are responsible for interactions with data sources
(e.g., JDBC, EJB entities [65], XML [79]). Data sources in terms of SDO are not
restricted to persistent databases, and they contain their own data formats. A data source
is accessed directly only by an appropriate DMS, not an application itself as it works
only with data objects in data graphs provided with a DMS). These data objects are
the fundamental components of the framework corresponding to service data objects in
the specification [117]. Data objects are generic and provide a common view of
structured data built by a DMS. While a JDBC DMS, for instance, needs to know about
the persistence technology (for example, relational databases) and how to configure and
access it, SDO clients need not know anything about it. Data objects hold their “data” in
24
25
Service Data Objects
Eclipse Modelling Framework
Page 41 of 235
Chapter 2
properties. Data objects provide convenience creation and deletion methods (like
createDataObject() with various signatures and delete()) and reflective methods to get
their types (instance class, name, properties, and namespaces). Data objects are linked
together and contained in data graphs.
2.3 The eGov-Bus Virtual Repository
One of the goals of the eGov-Bus project (described in Appendix A) is to expose all
the data as a virtual repository whose schema is shown in Fig. 3. The central part of the
system is the ODRA database server, the only component accessible for the top-level
users and applications presenting the virtual repository as a global schema. The server
virtually integrates data from the underlying resources available here only as SBQL
views’ definitions (described in subchapter 4.3 Updateable Object-Oriented Views)
basing on a global integration schema defined by the system administrator and a global
index keeping resource-specific information (e.g. data fragmentation, redundancy, etc.).
The integration schema is another SBQL view (or a set of views) combining the data
from the particular resources (also virtually represented as views, as mentioned)
according to the predefined procedures and the global index contents. These contents
must determine resource location and its role in the global schema available to the toplevel users. The resource virtual representation is referred to as a contributory view, i.e.
another SBQL view covering it and transforming its data into a form compliant with the
global schema. A contributory view must comply with the global view; actually it is
defined as its subset.
The user of the repository sees data exposed by the systems integrated by means
of the virtual repository through the global integration view. The main role of the
integration view is to hide complexities of mechanisms involved in access to local data
sources. The view implements CRUD behaviour which can be augmented with logic
responsible for dealing with horizontal and vertical fragmentation, replication, network
failures, etc. Thanks to the declarative nature of SBQL, these complex mechanisms can
often be expressed in one line of code. The repository has a highly decentralised
architecture. In order to get access to the integration view, clients do not send queries to
any centralised location in the network. Instead, every client possesses its own copy of
the global view, which is automatically downloaded from the integration server after
successful authentication to the repository. A query executed on the integration view is
Page 42 of 235
Chapter 2
to be optimised using such techniques as rewriting, pipelining, global indexing and
global caching.
Fig. 3 eGov-Bus virtual repository architecture [205]
The currently considered bottom-level resources can be relational databases
(being the thesis focus), RDF resources, Web services applications and XML
documents. Each type of such a resource requires an appropriate wrapper capable of
communicating with both the upper ODRA database (described in Appendix B) and the
resource itself (except for XML documents currently imported into the system). Such
a wrapper works according to the early concept originally proposed in [9], while the
contributory views are used as mediators performing appropriate data transformations
and standing for external (however visible only within the system, not seen above the
virtual repository) resources’ representations.
Since SBQL views can be created only over SBA-compliant resources, the
wrapper must be accessible in this way. This goal is achieved by utilising a regular
ODRA instance whose metadata (schema) are created by the underlying wrapper basing
on some resource description, but without any physical data. The data are retrieved by
the wrapper only when some query arrives and its result is to be returned (after
transformations performed by the contributory view) to the upper system components.
Page 43 of 235
Chapter 2
Similarly, the data disappears when it is not needed anymore (e.g. when a transaction
finishes or dies).
2.4 Conclusions
The wide selection of various solutions presented above contains both historical
approaches with their implementations (focused mainly on the mediation concept) and
contemporary techniques of integration of relational resources into modern objectoriented data processing systems. Actually, the issue of object-relational integration
discussed in the dissertation can be divided into two separate, but still overlapping in
many fields, parts. The first one is focused on how to integrate heterogeneous
distributed and redundant resources into one consistent global structure (here: the virtual
repository), which is the mediation process. The other part of the problem is how to
access the resources so that they can transparently contribute to the global structure with
their data, which should be regarded as wrapping.
The mediation procedure does not rely on bare resources but their wrappers,
instead. Such a wrapped resource is provided with a common interface accessible to
mediators and its actual nature can be neglected in upper-level mediation procedures.
The mediation and integration procedures must rely on some effective resource
description language used for building low-level schemata combined into top-level
schema available to users and the only one seen by them. In the thesis, this feature is
implemented by updateable object-oriented views based on SBQL. SBQL is further
employed for querying resources constituting the virtual repository.
The large set of solutions possibly applicable for wrapping relational resources
can be referred to as ORM/DAO. Unfortunately, their relatively poor performance
(proved by the author’s previous experience and additional preliminary tests not
included in the thesis) does not allow building efficient wrapper for the virtual
repository. Again, any intermediate data materialisation (to XML, for example) was
rejected as unnecessary and contradictory to the eGov-Bus project requirements.
Instead, dedicated client-server architecture was designed and effectively implemented,
basing on relatively low-level wrapper-resource communication with its native query
language.
Page 44 of 235
Chapter 3
Relational Databases
The fundamentals of current relational DBMSs were defined in [118] and [119]
(System-R [120, 121]). The idea presented in [118] indented to create a simple, logical,
value-based data model where data were stored in tuples with named attributes and sets
of tuples constituting relations. This made writing queries easier (comparing to prerelational systems) and independent of data storage, but introduced also serious
problems with evaluation efficiency and urgent needs for query optimisation.
Developed for almost 40 years relational query optimisation techniques are based both
on static syntactic transformations (relational calculus and algebra) and storage
information on data structure and organisation (data pages, IO operations, various
indices, etc.) resulting in various execution plans. The aim of a relational optimiser is to
examine all the plans (in practise the number usually is limited to the most promising
and straightforward ones to minimise optimisation time and resource consumption) and
to choose the cheapest one (again time and resources are considered) however due to
process simplification a common practise is just to reject the worst of them and chose
one of the remaining. The overview of relational optimisation approaches was collected
in [122, 123, 124].
Below, there is an overview of the most common relational optimisation
approaches for single databases. Distribution issues are omitted as these are not
considered in the thesis, however they are well documented and elaborated for relational
systems (e.g., in [125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138,
139, 140, 141, 142]).
3.1 Relational Optimisation Constraints
Query optimisation tries to solve the problem of a complete lack of efficiency of query
evaluation when handling powerful operations offered by database systems (appearing
Page 45 of 235
Chapter 3
especially in case of content-based access to data) by integrating a large number of
techniques and strategies, ranging from logical transformations of queries to
the optimisation of access paths and the storage of data on the file system level [122].
The economic principle requires that optimisation procedures either attempt to
maximise the output for a given number of resources or to minimise the resource usage
for a given output. Query optimisation tries to minimise the response time for a given
query language and mix of query types in a given system environment. This general
goal allows a number of different operational objective functions. The response time
goal is reasonable only under the assumption that user time is the most important
bottleneck resource. Otherwise, direct cost minimisation of technical resource usage can
be attempted. Fortunately, both objectives are largely complementary; when goal
conflicts arise, they are typically resolved by assigning limits to the availability of
technical resources (e.g., those of main memory buffer space). [122]
The main factors considered during relational query optimisation are [122]:
•
Communication cost – the cost of transmitting data from the site where they are
stored to the sites where computations are performed and results are presented;
these costs are composed of costs for the communication line, which are usually
related to the time the line is open, and costs for the delay in processing caused by
transmission; the latter, which is more important for query optimisation, is often
assumed to be a linear function of the number of data transmitted,
•
Secondary storage access cost – the cost of (or time for) loading data pages from
secondary storage into main memory; this is influenced by the number of data to
be retrieved (mainly by the size of intermediate results), the clustering of data on
physical pages, the size of the available buffer space, and the speed of the devices
used,
•
Storage cost – the cost of occupying secondary storage and memory buffers over
time; storage costs are relevant only if storage becomes a system bottleneck and if
it can be varied from query to query,
•
Computation cost – the cost for (or time of) using the central processing unit
(CPU).
Page 46 of 235
Chapter 3
3.2 Relational Calculus and Relational Algebra
The relational calculus (in fact there are two calculi: the tuple relational calculus and the
domain relational calculus) is non-procedural and refers to quasi-natural-language
expressions used for declarative formulating SQL26 queries and statements (the
relational calculus was also used for Datalog [29, 30]); on the other hand, the relational
algebra is used for procedural transformations of SQL expressions. Both the calculus
and the algebra are the core parts of the relational model. The relational algebra and the
relational calculus are logically equivalent.
The tuple relational calculus (introduced by Codd [118, 143, 144]) is a notation
for defining a result of a query through description of its properties. The representation
of a query in relational calculus consists of two parts: a target list and a selection
expression [122]. Together with the relational algebra, the calculus constituted the basis
for SQL (formerly also for the already forgotten QUEL27). However, the relational
model has never been implemented completely complying with its mathematical
foundations. The relational model theory defines a domain (or a data type), a tuple
(an ordered multiset of attributes) being an ordered pair of a domain and a value and
a relation variable (relvar) standing for and ordered pair of a domain and a name
(a relation header), and a relation defined as a set of tuples. In relational databases
a relation is reflected in a table and a tuple corresponds to a row (a record). The tuple
calculus involves atomic values (atoms), operators, formulae and queries. Its complete
description is out of the scope of the thesis and can be found in [118]. The domain
relational calculus was proposed in [145] as a declarative approach to query languages.
It uses the same operators as the tuple calculus (and quantifiers) and it is meant for
expressing queries as formulae. Again, its detailed description is omitted as it can be
found in [145].
The relational algebra [118, 146] is based on the mathematical logic and set
theory and it is equivalent to the domain calculus. The relational algebra was applied as
a basis for a set of query languages, e.g., ISBL, Tutorial D, Rel, and SQL. However, in
case of SQL, as it does not comply strictly with the theory (tables cannot be regarded as
real relations), its affiliation to the relational algebra is not very close, which causes
26
27
Structured Query Language
QUEry Language
Page 47 of 235
Chapter 3
difficulties in some applications, e.g., in query optimisers. Because a relation is
interpreted as the extension of some predicate, each operator of a relational algebra has
a counterpart in predicate calculus. There is a set of primitive operations defined in
the algebra, i.e. the selection σ, the projection π, the Cartesian product × (the cross
product or the cross join), the set union ∪, the set difference –, and the rename ρ
(introduced later). All other operators, including the set intersection, the division and the
natural join are expressed in terms of the above primitive ones. In terms of the
optimisation, the relational algebra can express each query as a tree, where the internal
nodes are operators, leaves are relations and subtrees are subexpressions. Such trees are
transformed to their semantically equivalent forms, where the average sizes of the
relations yielded by subexpressions in the tree are smaller than they were before the
optimisation (e.g., avoiding calculating cross products). Further, the algebra
optimisation aims to minimise number of evaluations of a single subexpression (as its
result can be calculated once and used for evaluating other (sub)expressions).
The relational algebra optimisation basic techniques referring to selections, projections,
cross products, etc. will be presented in further sections.
3.3 Relational Query Processing and Optimisation Architecture
Fig. 4 Query flow through a RDBMS [124]
Fig. 4 shows a general architecture of query processing in relational systems
[124]. The particular component blocks denote:
Page 48 of 235
Chapter 3
•
Query parser – no optimisation performed, checks the validity of the query and
then translates it into an internal form, usually a relational calculus expression or
another equivalent form,
•
Query optimiser – examines all algebraic expressions that are equivalent to the
given query and chooses the one that is estimated to be the cheapest,
•
Code generator or interpreter – transforms the access plan generated by the
optimiser into calls to the query processor,
•
Query processor – actually executes the optimised query.
Queries can be divided to so called ad hoc, interactive, queries issued by an end
user or an end user application and stored (embedded) ones “hardcoded” in an
application compiled with it (this refers mainly to low-level programming languages,
where queries are not stored as common SQL strings). Ad hoc queries go through first
three steps shown in Fig. 4 each time they are passed to a RDBMS, embedded queries
can go through them only once and then be stored in a database in their optimised form
for further use (and called at a runtime), nevertheless a general optimisation process for
both kinds of queries can be regarded the same [124].
Fig. 5 presents abstract relational query optimiser architecture. As stated above,
query optimisation can be divided into two steps:
•
Rewriting – based on its calculus and algebra,
•
Planning – based on physical data storage.
Fig. 5 Abstract relational query optimiser architecture [124]
A rewriter, the only module responsible for static transformations, relies on
query syntax and rewrites it to a semantically equivalent form that should be more
efficient than an original one (for sure it would not be worse). The basic rewriter’s tasks
(executed to enable further processing and actual evaluation) consist of substituting
Page 49 of 235
Chapter 3
view’s calls with their definitions, flattening out nested queries, etc. All the
transformations are declarative and deterministic; no physical structure or DBMS
specific features are taken into account. A rewritten query is passed for further
optimisation to a planner, there is also a possibility that the rewriter produces more than
one semantically equivalent query form and sends all of them. The rewriting procedure
for relational databases is often considered as an advanced optimisation method and is
rarely implemented [124] – this approach is probably caused by SQL irregularities and
difficulties in its transformations.
The planner is responsible for cost estimation of different query execution plans.
The goal is to choose the cheapest one, which is usually impossible (database statistics
may be out-of-date, a number of all plans to consider and evaluate can be too large,
etc.). Nevertheless, the planner attempts to find the best of the possible plans by
applying a search strategy based on examining the space of execution plans. The space
is determined by two modules of the cost optimiser: an algebraic space and a methodstructure space. Costs of different plans are evaluated by a cost models and a selfdistributor estimator.
The algebraic space module is used for determining orders of actions for a query
execution (semantics is preserved but performance may differ). These actions are
usually represented relational algebra formulae and/or syntactical trees. Due to an
algorithmic nature of objects generated by this module, the overall planning stage
operates at a procedural level. On the other hand, the method-structure space module is
responsible for determining implementation choices for execution of each action
sequence generated previously. Its action is related to available join methods (e.g.,
nested loops, merge scans, hash joins), existence and cost of auxiliary data structures
(e.g., indices that can be persistent or built on-the-fly), elimination of duplicates and
other RDBMS-specific features depending on its implementation and storage form. This
module produces all corresponding execution plans specifying implementations of each
algebraic operator and use of any index possible.
The cost model depicted in Fig. 5 specifies arithmetic formulae used for
estimating costs of possible execution plans (for each join type, index type, etc.).
Algorithms and formulae applied are simplified (in order to limit an optimisation cost
itself) and based on certain assumptions concerning buffer management, CPU load,
number of IO operations, sequential and random IO operations, etc. Input parameters
Page 50 of 235
Chapter 3
for these calculations are mainly a size of a buffer pool for each step (determined by a
RDBMS for each query) and sizes of relations with data distribution (provided by
a size-distribution estimator). The estimator module specifies how sizes (and possibly
frequency distributions of attribute values) of database relations, indices and
intermediate results are determined. It also determines if and what statistics are to be
maintained in database catalogues.
3.3.1 Space Search Reduction
The search space for optimisation depends on the set of algebraic transformations that
preserve equivalence and the set of physical operators supported in an optimiser [122].
The following section discusses methods based on the relational algebra used for
reducing the search space size. As stated above, a query can be transformed into
a syntax tree, where leaves are relations (tables in case of SQL) and nodes are algebraic
operators (selections σ, projections π and joins). For multitable queries (i.e. the ones
referring to more than one table resulting in joins) a query tree can be created in
different ways corresponding to an order of subexpressions’ evaluation. A simple query
can result in a few completely different trees, while for more complex ones a number of
possible trees can be enormous. Therefore an appropriate search strategy is applied in
order to minimise optimisation costs. A search space is usually limited by the following
restrictions [124] described shortly in the following subsections.
3.3.1.1 Selections and Projections
Selections and projections are processed on-the-fly and almost never generate
intermediate relations. Selections are processed as relations are accessed for the first
time. Projections are processed as the results of other operators are generated. This is
irrelevant for queries without joins, for join queries it implies that all operations are
performed as a part of join execution. This restriction eliminates only suboptimal query
trees, since separate processing of selections and projections incurs additional costs.
Hence, the algebraic space module specifies alternative query trees with join operators
only, selections and projections being implicit. A set of alternative joins is determined
by their commutativity:
R1 join R2 ≡ R2 join R1
and the associativity:
Page 51 of 235
Chapter 3
(R1 join R2) join R3 ≡ R1 join (R2 join R3)
Due to them, the optimiser can determine which joins are to be executed as inner, and
which as outer ones (basing on already calculated results of inner joins). A number of
possible join combinations is a factorial of a number of relations (N!), therefore some
additional restrictions are usually introduced to minimise the search space, e.g.,
corresponding to cross products given below.
Consider a query aiming to retrieve surnames of employees who earn more than
1200 with names of their departments (the query’s target is the test schema presented in
subchapter 6.2 Query Analysis and Optimisation Examples):
select
employees.surname, departments.name
from
employees, departments
where
employees.department_id = departments.id and
employees.salary > 1200
The join is established between employees and departments on their
department_id and id columns, respectively. The resulting possible syntactical trees
(generated by the algebraic space module shown in Fig. 5 above) are presented in Fig.
6. The plan resulting from the tree complies with point 1 – an index scan of
employees finds tuples satisfying the selection on employees.salary on-the-fly and the
join is performed only on them, moreover, the projection of the result occurs as join
tuples are generated.
πemployees.surname, departments.name
joinemployees.department_id = departments.id
σemployees.salary > 1200
departments
πemployees.surname, employees.department_id
πdepartments.name, departments.id
1
employees
3
departments
πemployees.surname, employees.salary, employees.department_id
2
employees
employees
departments
Fig. 6 Possible syntax trees for a sample query for selections and projections
Page 52 of 235
Chapter 3
3.3.1.2 Cross Products
Cross products are never formed, unless the query itself asks for them. Relations are
combined always through joins in the query, which eliminates suboptimal join trees
typically resulting from cross products. Exceptions to this restriction can occur when
cross product relations are very small and time for its calculation is negligibly short.
For visualisation of the restriction a more complex query should be taken into
account – again, it refers to the test schema and aims to retrieve all employees’
surnames with names of their departments and departments’ locations:
select
employees.surname, departments.name, locations.name
from
employees, departments, locations
where
departments.location_id = locations.id
The syntax trees for this query are shown in Fig. 7. The tree does satisfy the
restriction as it invokes a cross product (an unconditional join).
πemployees.surname, departments.name, locations.name
1
joindepartments.location_id = locations.id
locations
2
employees
departments
locations
employees
departments
joindepartments.location_id = locations.id, employees.department_id = departments.id
3
join
employees
departments
locations
Fig. 7 Possible syntax trees for a sample query for cross products
3.3.1.3 Join Tree Shapes
This restriction is based on shapes of join trees and it is omitted in some RDBMSs (e.g.,
Ingres, DB2-Client/Server) – the inner operand of each join is a database relation,
Page 53 of 235
Chapter 3
never an intermediate result. Its description is given below basing on an illustrative
example.
In order to explain this restriction, the test schema is not complex enough, an
additional relation (table) should be introduced, e.g., countries related to locations with
id and country_id columns, respectively. A sample query stands for all employees’
surnames with their departments’ names, locations’ names (cities) for the departments
and locations’ countries:
select
employees.surname, departments.name, locations.name, countries.name
from
employees, departments, locations, countries
where
departments.location_id = locations.id and
locations.country_id = countries.id
Again, possible syntax trees for the query are presented in Fig. 8 (without cross
products). The tree complies with the restriction, trees and do not as they
contain at least one join with an intermediate result as the inner relation. Trees
satisfying the restriction (e.g., ) are referred to as left-deep, these with an outer
relation always being a database relation (e.g., ) are called right-deep, while trees with
at least one join between two intermediate results (e.g., ) are called bushy (in general,
trees shaped like and are called linear).
The restriction is more heuristic than the previous ones and in some cases it
might eliminate the optimal plan, but it has been claimed that most often the optimal
left-deep tree is not much more expensive than the optimal tree overall since [124]:
•
Having original database relations as inner ones increases the use of any preexisting indices,
•
Having intermediate relations as outer ones allows sequences of nested loops joins
to be executed in a pipelined fashion.
Both index usage and pipelining reduce the cost of join trees. Moreover, the last
significantly reduces the number of alternative join trees, to O(2N) for many queries
with N relations (from N! previously). Hence, the algebraic space module of the typical
query optimiser (provided it is implemented) specifies only join trees that are left-deep.
[124]
Page 54 of 235
Chapter 3
πemployees.surname, departments.name, locations.name, countries.name
1
joinlocations.country_id = countries.id
employees
countries
locations
departments
countries
2
locations
departments
3
departments
employees
employees
locations
countries
Fig. 8 Possible syntax trees for a sample query for tree shapes
3.3.2 Planning
The planner module (Fig. 5) is responsible for analysing the set of alternative query
plans developed by the algebraic space and the method-structure space modules in order
to find “the cheapest” one. This decision process is supported by the cost model and
the size-distribution estimator. There are different search strategies described in
the following subsections – many interesting alternative approaches (e.g., based on
genetic programming or artificial intelligence) to the problem are omitted here,
however. Their summary with short descriptions and references are available in. [124]
3.3.2.1 Dynamic Programming Algorithms
The dynamic programming approach (primarily proposed for System-R [119]) currently
is implemented in most commercial RDBMSs. It is based mainly on a dynamic
Page 55 of 235
Chapter 3
exhaustive search algorithm constructing a set of possible query trees complying with
the restrictions described above (any tree recognised as suboptimal is rejected). [124]
The main issue of the algorithm is an interesting order. A merge-scan join
method that is very often “suggested” by the method-structure module sorts join
attributes prior to executing the join itself. This sorting is performed on two input
relations’ join attributes which are then merged with a synchronized scan. However, if
any input relation is sorted already (e.g., due to sorting by previous merge-scan joins or
some B+-tree index action), a sorting stem can be skipped for this relation. In such
a situation, costs of two partial plans cannot be evaluated and compared well if a sort
order is not considered. An apparently more expensive partial plan can appear more
beneficial if it generates a sorted result that can be used for evaluation of some
subsequent merge-scan join execution. Therefore any partial plan that produces a sorted
result must be treated specially and its possible influence for a general query plan
examined.
The dynamic programming algorithm can be described in the following steps
[124]:
Step 1: For each relation in the query, all possible ways to access it, i.e., via all
existing indices and including the simple sequential scan, are obtained
(accessing an index takes into account any query selection on the index key
attribute). These partial (single-relation) plans are partitioned into equivalence
classes based on any interesting order in which they produce their result.
An additional equivalence class is formed by the partial plans whose results are
in no interesting order. Estimates of the costs of all plans are obtained from the
cost model module, and the cheapest plan in each equivalence class is retained
for further consideration. However, the cheapest plan of the no-order
equivalence class is not retained if it is not cheaper than all other plans.
Step 2: For each pair of relations joined in the query, all possible ways to evaluate their
join using all relation access plans retained after Step 1 are obtained.
Partitioning and pruning of these partial (two-relation) plans proceeds as
above.
…
Step i: For each set of i – 1 relations joined in the query, the cheapest plans to join
them for each interesting order are known from the previous step. In this step,
Page 56 of 235
Chapter 3
for each such set, all possible ways to join one more relation with it without
creating a cross product are evaluated. For each set of i relations, all generated
(partial) plans are partitioned and pruned as before.
…
Step N: All possible plans to answer the query (the unique set of N relations joined in
the query) are generated from the plans retained in the previous step. The
cheapest plan is the final output of the optimiser, to be used to process the
query.
This algorithm guarantees finding the cheapest (optimal) plan from all the ones
complaint with the search space restrictions presented above. It often avoids
enumerating all plans in the space by being able to dynamically prune suboptimal parts
of the space as partial plans are generated. In fact, although in general still exponential,
there are query forms for which it only generates O(N3) plans [147]. The illustrative
example of the dynamic programming algorithm can be found in. [124]
The possibilities offered by the method-structure space in addition to those of
the algebraic space result in an extraordinary number of alternatives that the optimiser
must search through. The memory requirements and running time of dynamic
programming grow exponentially with query size (i.e. a number of joins) in the worst
case since all viable partial plans generated in each step must be stored to be used in the
next one. In fact, many modern systems place a limit on the size of queries that can be
submitted (usually about fifteen joins), because for larger queries the optimiser crashes
due to its very high memory requirements. Nevertheless, most queries seen in practice
involve less than ten joins, and the algorithm has proved to be very effective in such
contexts. It is considered the standard in query optimisation search strategies. [124]
3.3.2.2 Randomised Algorithms
Dynamic programming algorithms are not capable of analysing relatively large and
complex queries (memory consumption issues and required limitations to a number of
joins mentioned above). In order to overcome these inconveniences, some randomised
algorithms were proposed – some of them are shortly described below.
The most important class of these optimisation algorithms is based on plan
transformations instead of the plan construction of dynamic programming, and includes
algorithms like simulated annealing, iterative improvement and two-phase optimisation.
Page 57 of 235
Chapter 3
These algorithms are generic and they can be applied to various optimisation issues.
They operate on graphs whose nodes represent alternative execution plans with
associated cost in order to find a node with a minimum overall cost. Such algorithms
perform random walks thorough a graph – two nodes that can be reached in one move
from a node S are the neighbours of S. If a move in a graph traverses from a cheaper
node to a more expensive node A move is called uphill (downhill, respectively) if the
cost of the source node is lower (higher, respectively) than the cost of the destination
node. A node is a global minimum if it has the lowest cost among all nodes. It is a local
minimum if, in all paths starting at that node, any downhill move comes after at least
one uphill move. [124]
The iterative improvement algorithm (II) [148, 149, 150] performs a large
number of local optimisations. Each one starts at a random node and repeatedly accepts
random downhill moves until it reaches a local minimum. II returns the local minimum
with the lowest cost found. simulated annealing (SA) performs a continuous random
walk accepting downhill moves always and uphill moves with some probability, trying
to avoid being caught in a high cost local minimum [151, 152, 153]. This probability
decreases as time progresses and eventually becomes zero, at which point execution
stops. Like II, SA returns the node with the lowest cost visited. The two-phase
optimisation (2PO) algorithm is a combination of II and SA [153]. In phase 1, II is run
for a small period of time, i.e., a few local optimisations are performed. The output of
that phase (the best local minimum found) is the initial node of the next phase. In phase
2, SA is run starting from a low probability for uphill moves. Intuitively, the algorithm
chooses a local minimum and then searches the area around it, still being able to move
in and out of local minima, but practically unable to climb up very high hills. [124]
3.3.3 Size-Distribution Estimator
The size-distribution estimator module (Fig. 5) is responsible for estimation of sizes of
the (sub)queries and frequency distributions of values in attributes of these results.
Although distributions of frequencies can be generalised as combinations of arbitrary
numbers of attribute, most RDBMSs deal with frequency distributions of individual
attributes only, because considering all possible combinations of attributes is very
expensive (the attribute value independence assumption).
Page 58 of 235
Chapter 3
Below there is given a description of the most commonly implemented method
(e.g., in DB2, Informix, Ingres, Sybase, Microsoft SQL Server) for query result sizes
and frequency distributions’ estimation, i.e. histograms, although some other
approaches could be also discussed (e.g., [154, 155, 156, 157, 158, 159]). [124]
3.3.3.1 Histograms
In a histogram on attribute a of relation R, the domain of a is partitioned into buckets,
and a uniform distribution is assumed within each bucket. That is, for any bucket b in
the histogram, if a value vi ∈ b, then the frequency fi of vi is approximated by
∑
vi ∈b
fi b
A histogram with a single bucket generates the same approximate frequency for all
attribute values. Such a histogram is called trivial and corresponds to making the
uniform distribution assumption over the entire attribute domain. In principle, any
arbitrary subset of an attribute's domain may form a bucket and not necessarily
consecutive ranges of its natural order. [124]
There are various classes of histograms that systems use or researchers have
proposed for estimation. Most of the earlier prototypes, and still some of the
commercial RDBMSs, use trivial histograms, i.e., make the uniform distribution
assumption [119]. That assumption, however, rarely holds in real data and estimates
based on it usually have large errors [160, 161]. Excluding trivial ones, the histograms
that are typically used belong to the class of equi-width histograms [162]. In those, the
number of consecutive attribute values or the size of the range of attribute values
associated with each bucket is the same, independent of the frequency of each attribute
value in the data. Since these histograms store a lot more information than trivial
histograms (they typically have 10-20 buckets), their estimations are much better. Some
other histogram classes has been also proposed (e.g., equi-height or equi-depth
histograms [162, 163] or multidimensional histograms [164]), however they are not
implemented in any RDBMS. [124]
It has been proved that serial histograms are the optimal ones [165, 166, 167].
A histogram is serial if frequencies of attribute values associated with each bucket are
either all greater or all less than the frequencies of the attribute values associated with
any other bucket (buckets of serial histogram group frequencies that are close to each
Page 59 of 235
Chapter 3
other with no interleaving). Identifying the optimal histogram among all serial ones
takes exponential time in the number of buckets. Moreover, since there is usually no
order-correlation between attribute values and their frequencies, storage of serial
histograms essentially requires a regular index that will lead to the approximate
frequency of every individual attribute value. Because of all these complexities, the
class of end-biased histograms has been introduced. In those, some number of the
highest frequencies and some number of the lowest frequencies in an attribute are
explicitly and accurately maintained in separate individual buckets, and the remaining
(middle) frequencies are all approximated together in a single bucket. End-biased
histograms are serial since their buckets group frequencies with no interleaving.
Identifying the optimal end-biased histogram, however, takes only slightly over linear
time in the number of buckets. Moreover, end-biased histograms require little storage,
since usually most of the attribute values belong in a single bucket and do not have to be
stored explicitly. Finally, in several experiments it has been shown that most often the
errors in the estimates based on end-biased histograms are not too far from the
corresponding (optimal) errors based on serial histograms. Thus, as a compromise
between optimality and practicality, it has been suggested that the optimal end-biased
histograms should be used in real systems. [124]
3.4 Relational Query Optimisation Milestones
The general (and the most common) relational query optimisation methods and
approaches were developed for almost 40 years. Below there are enumerated a few
historical “milestone” relational model implementations with emphasise for their query
optimisers.
3.4.1 System-R
System-R is the first implementation of the relational model (developed between 1972
and 1981), whose solution and experiences are still regarded as fundamentals for many
currently used relational databases (including commercial ones), its complete
bibliography is available at [121].
In System-R there was introduced a very important method for optimising
select-project-join (SPJ) queries (which notion covers also conjunctive queries), i.e.
queries where multiple joins and multiple join attributes are involved (non-star join) and
Page 60 of 235
Chapter 3
also group by, order by and distinct clauses. The search space for the System-R
optimiser in the context of a SPJ query consists of operator trees that correspond to
linear sequence of join operations (described in section Join Tree Shapes). Such
sequences are logically equivalent because of associative and commutative properties of
joins. A join operator can use either the nested loop or sort-merge implementation. Each
scan node can use either an index scan (using a clustered or non-clustered index) or
a sequential scan. Finally, predicates are evaluated as early as possible. [122]
The cost model realised in System-R relied on [122]:
•
A set of statistics maintained on relations and indexes, e.g., a number of data
pages in a relation, a number of pages in an index, number of distinct values in
a column,
•
Formulae to estimate selectivity of predicates and to project the size of the output
data stream for every operator node; for example, the size of the output of a join
was estimated by taking the product of the sizes of the two relations and then
applying the joint selectivity of all applicable predicates,
•
Formulae to estimate the CPU and I/O costs of query execution for every
operator; these formulae took into account the statistical properties of its input
data streams, existing access methods over the input data streams, and any
available order on the data stream (e.g., if a data stream was ordered, then the cost
of a sort-merge join on that stream may be significantly reduced); in addition, it
was also checked if the output data stream would have any order.
The query planning algorithm for System-R optimiser used the concept of
dynamic programming with interesting orders (described in section 3.3.2.1 Dynamic
Programming Algorithms). Indeed, the System-R optimiser was novel and innovative;
however it did not generalise beyond join ordering.
3.4.2 Starburst
Query optimisation in the Starburst project [168] developed at IBM Almaden [169]
between 1984 and 1992 (and eventually it became DB2). For optimisation purposes it
used a structural representation of the SQL query that was used throughout the lifecycle
of optimisation called a Query Graph Model (QGM). In the QGM, a box represented
a query block and labelled arcs between boxes represented table references across
blocks. Each box contained information on the predicate structure as well as on whether
Page 61 of 235
Chapter 3
the data stream was ordered. In the query rewrite phase of optimisation [170], rules
were used to transform a QGM into another equivalent QGM. These rules were
modelled as pairs of arbitrary functions – the first one checked the condition for
applicability and the second one enforced the transformation. A forward chaining rule
engine governed the rules. Rules might be grouped in rule classes and it was possible to
tune the order of evaluation of rule classes to focus search. Since any application of a
rule resulted in a valid QGM, any set of rule applications guaranteed query equivalence
(assuming rules themselves were valid). The query rewrite phase did not have the cost
information available. This forced this module to either retain alternatives obtained
through rule application or to use the rules in a heuristic way (and thus compromise
optimality). [122]
In the second phase of query optimisation (plan optimisation) an execution plan
(operator tree) was chosen for a given QGM. In Starburst, the physical operators (called
LOLEPOPs28) were combined in a variety of ways to implement higher level operators.
In Starburst, such combinations were expressed in a grammar production-like language
[171]. The realisation of a higher-level operation was expressed by its derivation in
terms of the physical operators. In computing such derivations, comparable plans that
represented the same physical and logical properties but higher costs, were pruned.
Each plan had a relational description corresponding to the algebraic expression it
represented, an estimated cost, and physical properties (e.g.,, order). These properties
were propagated as plans were built bottom-up. Thus, with each physical operator,
a function showing the effect of the physical operator on each of the above properties
was associated. The join planner in this system was similar to the one of System-R’s
(described shortly above). [122]
3.4.3 Volcano/Cascades
The Volcano [172] and the Cascades extensible architecture [173] evolved from Exodus
[174] (their current “incarnation” is MS SQL Server). In these systems, two kinds of
rules were used universally to represent the knowledge of search space:
the transformation
rules
mapped
an
algebraic
expression
into
another
and
the implementation rules mapped an algebraic expression into an operator tree.
The rules might have conditions for applicability. Logical properties, physical
28
LOw LEvel Plan OPerators
Page 62 of 235
Chapter 3
properties and costs were associated with plans. The physical properties and the cost
depended on the algorithms used to implement operators and its input data streams. For
efficiency, Volcano/Cascades used dynamic programming in a top-down way
(memoization). When presented with an optimisation task, it checked whether the task
had already been accomplished by looking up its logical and physical properties in the
table of plans that had been optimised in the past. Otherwise, it applied a logical
transformation rule, an implementation rule, or used an enforcer to modify properties of
the data stream. At every stage, it used the promise of an action to determine the next
move. The promise parameter was programmable and reflected cost parameters.
The Volcano/Cascades framework differed from Starburst in its approach to planning:
these systems did not use two distinct optimisation phases because all transformations
were algebraic and cost-based and the mapping from algebraic to physical operators
occurred in a single step. Further, instead of applying rules in a forward chaining
fashion, as in the Starburst query rewrite phase, Volcano/Cascades applied a goal-driven
application of rules. [122]
Page 63 of 235
Chapter 4
The Stack-Based Approach
The Stack-Based Approach (SBA) is a formal methodology for both query and
programming languages. Its query language (SBQL, described below) has been
designed basing on concepts well known from programming languages. The main SBA
idea is that there is no clear and final boundary between programming languages and
query languages. Therefore, there should arise a theory consistently describing both
aspects. SBA offers a complete conceptual and semantic foundation for querying and
programming with queries, including programmes with abstractions (e.g., procedures,
functions, classes, types, methods, views).
SBA defines language semantics with an abstract implementation method, i.e. an
operational specification that requires definitions of all runtime data structures followed
by unambiguous definitions of behaviour of any language construct on these structures.
SBA introduces such three abstract structures:
•
An object store,
•
An environment stack (ENVS),
•
A result stack (QRES).
Any operator used in queries (e.g., a selection, a projection, a join, quantifiers)
must be precisely described with these three abstract structures, with no referring to
classical notions and theories of relational and object algebras.
As stated above, SBA is based on a classic programming language mechanism
modified with required extensions. The substantial syntactic decision was made on
unification of a query language with a programming language, which results in
expressing query languages as a form of programming languages. Hence, SBA does not
distinguish between simple expressions (e.g., 2 + 2, (x + y) * x) and complex queries
like Employee where salary = 1000 or (Employee where salary = (x + y) * z).surname.
Page 64 of 235
Chapter 4
All these expressions can be used in any imperative constructs, as arguments of
procedures, functions or methods, as a function returned value, etc.
Any SBA object must be represented by the following features:
•
A unique internal identifier (OID, an object identifier),
•
A name used for accessing an object; it does not have to be unique,
•
It can contain a value (a simple object), other object(s) (a complex object) or
a reference to another object (a pointer object).
4.1 SBA Object Store Models
SBA is defined for a general object store model. Because various object models
introduce a lot of incompatible notions, SBA assumes some families of object store
models which are enumerated M0, M1, M2 and M3, which order corresponds to their
complexities. The simplest is M0, which covers relational, nested-relational and XMLoriented databases. M0 assumes hierarchical objects with no limitations concerning
nesting of objects and collections. M0 covers also binary links (relationships) between
objects referred to as pointers. Higher-level store models introduce classes and static
inheritance (M1), object roles and dynamic inheritance (M2), and encapsulation (M3).
4.2 SBQL
SBA defines the Stack-Based Query Language (SBQL) constituting the same role for
SBA as relational algebra for a relational model (nevertheless, SBQL is much more
powerful). The language has been precisely designed from a practical point of view and
it is completely neutral with respect to data models. It can be successfully applied to
relational and object databases, XML, RDF, and others, since its basic assumption is
that it operates on data structures, not on models. Therefore, once a reflection of
appropriate structures on SBA abstract structures is defined, a precise definition of
a language operating on them appears. This behaviour changes completely an approach
to a definition of a language to a new data form – instead of defining a new language
one has to state how these data are to be processed with SBQL.
SBQL semantics is based on a name binding space paradigm. Any name in
a query is bound with a runtime being (a persistent object, a procedure, a procedure
parameter, etc.), according to a current name space. Another popular solution from
programming languages employed for SBQL is that a naming scope is defined by an
Page 65 of 235
Chapter 4
environment stack searched from top to bottom. Due to differences between
a programming environment and databases, a former concept has been extended –
a stack does not contain data itself but their pointers, moreover many object can be
bound to a single name (in this way an unified actions on collections and single objects
is achieved).
The detailed assumptions are as follows:
•
Each name is bound to a runtime being (an object, an attribute, a procedure,
a view, etc.),
•
All data are processed in the same manner, regardless if they are persistent or
transient,
•
Procedures' results belong to the same category as query results, which implies
that they can be arbitrarily combined and processed,
•
All semantic and syntactic constructs are the same for either object or their
subobjects (complete object relativism).
4.2.1 SBQL Semantics
The language is defined by the composition rule:
1. Let Q be a set of all queries.
2. If q ∈ Q (q is a query) and σ is an unary operator, then σ(q) ∈ Q.
3. If q1 ∈ Q and q2 ∈ Q (q1 and q2 are queries) and µ is a binary operator, then
µ(q1, q2) ∈ Q.
Due to these assumption, SBQL queries can be easily decomposed into subqueries
down to atomic ones (names, operators, literals). A result returned by a query is
calculated by eval(q), where q ∈ Q. SBQL semantics is defined with the following
notions:
•
Binders,
•
Binding names on ENVS,
•
Keeping temporary results on QRES,
•
eval function for algebraic and nonalgebraic operators.
4.2.1.1 Name binding
A binder is a construct allowing keeping object's name with its identifier. For
an arbitrary object named n with an identifier i, a binder is defined as n(i). A binder
Page 66 of 235
Chapter 4
concept can be generalised so that n(x) means that x can be an identifier, but also
a literal or a complex structure, including a procedure pointer. The ENVS stack
(separate from an object store) contains sections consisting of binders' collections. On
startup, ENVS contains a single section with query root objects' binders. Any object
name n appearing in a query is bound on ENVS. Since the stack is searched in a topdown direction (i.e. stating from the newest sections), its sections are searched for n(ij)
binders – a search is stopped on the first match (if no match is found, a next lower
section is searched). As a binding result (provided binders hold only object identifiers)
can be a single object identifier, a collection of identifiers or an empty collection.
4.2.1.2 Operators and eval Function
The eval function pushes its result on QRES. For an arbitrary query q, the eval
definition is:
•
If q is a string denoting a simple value (literal), eval returns this string value,
•
If q is an object name, ENVS is top-down searched for binders q(ij); all objects
bound by q name are returned,
•
If q is a name of a procedure, this procedure is called, i.e. a new section with
procedure parameters, a local environment and a return point is pushed on ENVS;
next, a procedure body is executed (it may result with pushing some result on
QRES); when finished, a section is popped from ENVS,
•
If q is in a form of q1 ∆ q2 (∆ is an algebraic operator), q1 is evaluated and its
result is pushed onto QRES, the q2 is evaluated and its result is pushed onto
QRES; then ∆ is evaluated on two elements popped from QRES – its result is
finally pushed on QRES,
•
If q is in a form of q1 Θ q2 (Θ is a nonalgebraic operator), q2 is evaluated multiple
times for each result returned by q1 in context of this result; a final result is pushed
onto QRES; the procedure can be described as follows:
For each ij returned by eval(q1) do:
■
Push on ENVS a representation of ij object's contents (subobjects'
binders),
■
Evaluate q2 and store its partial result,
■
Pop from ENVS a representation of ij object's contents,
Push all partial results onto QRES.
Page 67 of 235
Chapter 4
The main differences between nonalgebraic and algebraic operators are their
influence on ENVS and evaluating subqueries in context (or without) of other
subqueries. The common algebraic operators are:
•
Arithmetic operators (+, -, *, /, etc.),
•
Logical operators (and, or, not, etc.),
•
Alias operator (as),
•
Grouping operator (groupas).
Nonalgebraic operators are represented by:
•
Navigational dot (.),
•
Selection (where),
•
Join (join),
•
Quantifiers (all, exists).
4.2.2 Sample Queries
The following queries (with explanation of their semantics) refer to the relational
wrapper test schema described in subchapter 6.2.1. Each query is supplied with its
syntax tree (without typechecking, for simplification). They are valid wrapper queries,
i.e. they are expressed with wrapper views' names:
Example 1: Retrieve surnames and names of employees earning more than 1200
(Employee where salary > 1200).(surname, name);
Fig. 9 Sample SBQL query syntax tree for example 1
Page 68 of 235
Chapter 4
Example 2: Retrieve first names of employees named Kowalski earning less than 2000
(Employee where surname = "Kowalski" and salary < 2000).name;
Page 69 of 235
Chapter 4
Example 3: Retrieve surnames of employees and names of departments of employees named Nowak
((Employee where surname = "Nowak") as e join e.worksIn.Department as d join d.isLocatedIn.Location as
l).(e.surname, l.name);
Page 70 of 235
Chapter 4
Example 4: Retrieve surnames of employees and cities their departments are located in
(Employee as e join e.worksIn.Department as d join d.isLocatedIn.Location as l).(e.surname, l.name);
Page 71 of 235
Chapter 4
Example 5: Retrieve surnames and birth dates of employees named Kowalski working in the production department
(Employee where surname = "Kowalski" and worksIn.Department.name = "Production").(surname, birthDate);
Page 72 of 235
Chapter 4
Example 6: Retrieve surnames and birth dates of employees named Kowalski working in Łódź city
(Employee where surname = "Kowalski" and worksIn.Department.isLocatedIn.Location.name = "Łódź").(surname,
birthDate);
Page 73 of 235
Chapter 4
Example 7: Retrieve the sum of salaries of employees named Kowalski working in Łódź city
sum((Employee where surname = "Kowalski" and worksIn.Department.isLocatedIn.Location.name =
"Łódź").salary);
Page 74 of 235
Chapter 4
4.3 Updateable Object-Oriented Views
In databases, a view means an arbitrarily defined image of data stored – in terms of
distributed application (e.g., Web applications) views can be used for resolving
incompabilities between heterogeneous data sources enabling their integration [175,
176, 216], which corresponds to mediation described in subchapter 2.2.1 Wrappers and
Mediators. Database views are distinguished as materialised (representing copies of
selected data) and virtual ones (standing only for definitions of data that can be accessed
by calling such a view). A typical view definition is a procedure that can be invoked
from within a query. One of the most important features of database views is their
transparency which means that a user issuing a query must not distinguish between
a view and actual stored data (he or she must not be aware of using views), therefore
a data model and a syntax of a query language for views must conform with the ones for
physical data. Views should be characterized by the following features [177, 214]:
•
Customisation, conceptualisation, encapsulation – a user (programmer) receives
only data that are relevant to his/her interests and in a form that is suitable for
his/her activity; this facilitates users’ productivity and supports software quality
through decreasing probability of errors; views present the external layer in the
three-layered architecture (commonly referred to as the ANSI29/SPARC30
architecture [178]),
•
Security, privacy, autonomy – views give the possibility to restrict user access to
relevant parts of database,
•
Interoperability, heterogeneity, schema integration, legacy applications – views
enable
the
integration
of
distributed/heterogeneous
databases,
allowing
understanding and processing alien, legacy or remote databases according to
a common, unified schema,
•
Data independence, schema evolution, views enable the users to change physical
and logical database organisation and schema without affecting already written
applications.
The idea of updateable object views [214] relies in augmenting the definition of
a view with the information on users’ intents with respect to updating operations. Only
29
30
American National Standards Institute
Standards Planning And Requirements Committee
Page 75 of 235
Chapter 4
the view definer is able to express the semantics of view updating. To achieve it, a view
definition is divided in two parts. The first part is the functional procedure, which maps
stored objects into virtual objects (similarly to SQL). The second part contains
redefinitions of generic operations on virtual objects. These procedures express the
users’ intents with respect to update, delete, insert and retrieve operations performed on
virtual objects. A view definition usually contains definitions of subviews, which are
defined on the same rule, according to the relativism principle. Because a view
definition is a regular complex object, it may also contain other elements, such as
procedures, functions, state objects, etc. The above assumptions and SBA semantics
allow achieving the following properties [210]:
•
Full transparency of views – after defining the view user use the virtual objects in
the same way as stored object,
•
Views are automatically recursive and (as procedures) can have parameters.
The first part of a view definition has the form of a functional procedure named
virtual objects. It returns entities called seeds that unambiguously identify virtual
objects (usually seeds are OIDs of stored objects). Seeds are then (implicitly) passed as
parameters of procedures that overload operations on virtual objects. These operations
are determined in the other part of the view definition. There are distinguished four
generic operations that can be performed on virtual objects:
•
delete – removes the given virtual object,
•
retrieve (dereference) – returns the value of the given virtual object,
•
insert – puts an object being a parameter inside the given virtual object,
•
update – modifies the value of the given virtual object according to a parameter
(a new value).
Definitions of these overloading operations are procedures that are performed on
stored objects. In this way the view definer can take full control on all operations that
should happen on stored objects in response to update of the corresponding virtual
object. If some overloading procedure is not defined, the corresponding operation on
virtual objects is forbidden. The procedures have fixed names, respectively on_delete,
on_retrieve, on_new, and on_update. All procedures, including the function supplying
seeds of virtual objects, are defined in SBQL and may be arbitrarily complex.
Page 76 of 235
Chapter 4
4.4 SBQL Query Optimisation
Issues of query optimisation for SBQL were deeply discussed and analysed in [179] and
over 20 research papers (e.g., [180, 181, 182, 183, 184, 185, 186]). A summary of the
most important and currently implemented in the virtual repository (ODRA) SBA
optimisation techniques (with a short description of general object-oriented query
optimisation given as a background) is presented below.
The principal object-oriented query optimisation goals do not differ from the
ones for relational queries. The process aims to produce a semantically equivalent form
of an input query but promising better (for sure not worse) evaluation efficiency.
However the techniques for reaching it are much more diverse than former ones, which
corresponds directly to variety of object-oriented models and approaches. Since objectoriented query languages were developed in the world strongly imbued with relational
paradigms, similar solutions were produced also for them, including various relational
algebras (e.g., [187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200]) –
their description is out of the scope of the thesis, as SBQL does not rely on any algebra.
Regardless of an algebra used, any object-oriented optimisation, similarly to
the relational one, aims to:
•
Avoid evaluating Cartesian products, which however is less important than is
relational systems (in object-oriented databases relational joins are replaced with
object links or pointers),
•
Evaluate selections and projections as early as possible.
Moreover, some query transformations can be performed, e.g. concerning order
of operations (expensive operations should be performed after cheaper ones, to let them
operate on smaller collections) and identifying constant and common expressions (they
can be calculated only once and their result used many times, also in other parallely
evaluated queries). Finally, some low-level techniques (referring to physical data
organisation, e.g. data files, accessing through indices, etc.) are also utilised in objectoriented database systems.
A general SBQL query processing flow diagram and basic system modules are
shown in Fig. 16. The SBQL optimiser, in contrast to the relational one, relies mainly
on static query transformations, mainly various rewriting methods that are easy to
implement (due to the SBQL regularity), very efficient and reliable. These static
Page 77 of 235
Chapter 4
transformations are performed directly on a query syntax tree, without application of
any object-oriented algebra as an intermediate query representation. The static query
analysis is begun with two basic steps:
•
Type-checking (each name and each operator are checked according to a database
schema in order to determine if a query is syntactically correct),
•
Generation of a query syntax tree on which further rewriting (actual optimisation)
is performed.
Fig. 16 Architecture of query processing in SBQL [179]
The static optimisation is performed with auxiliary compiletime data structures
corresponding to runtime structures:
•
Static QRES (corresponding to a runtime QRES) used for modelling query results’
accumulation,
•
Static ENVS (corresponding to a runtime ENVS) used for modelling opening
sections and binding operations,
•
Metabase (corresponding to an object store, but kept unchanged during
optimisation), standing for a description (model) of a database schema; this model
contains information on object structures and types, procedure names with
a return type and argument types, auxiliary structures (e.g., indices), possibly
database statistics, etc.
Static stacks are only approximations of runtime ones, hence their content and
structure are different. The main concept used by the static query analysis is a signature
corresponding to runtime objects and ENVS’s sections – a signature stands for
a definition of a runtime being it substitutes. All the static operations are performed with
signatures only, e.g. queries (and subqueries) are assumed to return only single
signatures reflecting their result types (signatures can be also complex, e.g. in case of
structures, but they do not reflect quantities of actually returned query result’s objects).
A signature for a (sub)query is determined by its semantics, i.e.:
Page 78 of 235
Chapter 4
•
If a query is a literal (integer, real, string, boolean, date), a signature corresponds
to this literal’s primitive type (a value signature: integer signature, real signature,
string signature, boolean signature, date signature, respectively),
•
If a query is a name a signature is created by taking a value from a static ENVS
(static binding procedure of name) – a reference signature,
•
If a query is q as name, a binder signature is created (a name for an actual
signature of q, corresponding to a named result),
•
If a query is q1 ∆ q2 (∆ is an algebraic operator), a final signature is a composition
of subquery signatures according to the ∆ type inference rules (e.g. some
coercion),
•
If a query is q1 Θ q2 (Θ is a nonalgebraic operator), a final signature is
a composition of subquery signatures build according to the Θ semantics, e.g.
a structure signature for the , (comma) operator.
The static stack’s operations are executed by the static nested and static eval
procedures, corresponding to the runtime ones, respectively.
Some of the methods presented below are adapted from the relational optimisers
(e.g., pushing selections), some are designed from scratch for SBQL purposes and can
be applied only due to its regularity and specific features.
4.4.1 Independent Subqueries
In SBQL a subquery is called independent if it can be evaluated outside a loop (implied
by nonalgebraic operators) it is called from within (which corresponds to the relational
optimisation principle of evaluating selections as early as possible). The method for
determining if a subquery is independent consists of analysing on which static ENVS’s
sections names in the subquery are bound (in SBQL, any nonalgebraic operator opens
its own scope on ENVS and each name is bound in some stack’s section). In none of the
names is bound in a scope opened by the nonalgebraic operator currently being
evaluated, then the subquery is independent and it can be evaluated earlier than it is
implied by the original containing query. In order to perform this analysis, an algebraic
operator is assigned a number of a section it opens, and each name it the query is
assigned two numbers: one for the ENVS size (number of existing sections) when the
name binding occurs, the other one for a number of the section where the name is
bound.
Page 79 of 235
Chapter 4
The query syntax tree is then modified so that the subquery can be evaluated as
early as possible. The independent subqueries method is divided into two variants:
pushing out and factoring out, described below.
4.4.1.1 Factoring out
The factoring out technique is inspired by the relational method for optimising nested
subqueries, but here it is much more general. The general rule for factoring out is
formulated as follows. Given a (sub)query of a form
q1 Θ q2
where Θ is a nonalgebraic operator and q2 is expressed as:
α1 q3 α2
where q3 is a syntactically correct subquery connected to the rest of q2 by arbitrary
operators. Then the query is:
q1 Θ (α1 q3 α2)
if q3 is independent of the operator Θ and other nonalgebraic operators (if any present)
whose scopes are on ENVS above the scope opened by Θ, it q3 be can be factored out
and the general query will be (an auxiliary name x is introduced):
(q3 as x).(q1 Θ (α1 x α2))
This holds if q3 returns a single result, however if its result is more numerous (a general
case), the groupas operator should be applied:
(q3 groupas x).(q1 Θ (α1 x α2))
There are still some limitations when an independent subquery cannot be
factored out. The main exception occurs if the independent subquery consists of a single
name expression. The first reason (that could be neglected, as shown) is that a number
of evaluations of such an factored out expression is the same as without factoring out
and some additional operations for binding an auxiliary name are invoked. However,
since costs of these are negligibly small, this is not considered an issue (similarly,
a factored out expression might be evaluated multiple times if each operator it is
independent of performs one operation in the loop – again, this optimisation does not
improve the query evaluation, but it is not considerably deteriorated, still). This is not
true is a factored out name is a procedure call – in this case it could be evaluated once,
indeed, which should improve general query performance.
Page 80 of 235
Chapter 4
4.4.1.2 Pushing out
The pushing out technique is based on a distributivity property of some operators.
A nonalgebraic operator Θ is distributive if for a query:
(q1 union q2) Θ q3
is equivalent to:
(q1 Θ q3) union (q2 Θ q3)
In SBQL where, . (dot) and join operators are distributive. For them, the following
associativity rules can be derived:
(q1.q2).q3 ⇔ q1.(q2.q3)
(q. join q2) join q3 ⇔ q1 join (q2 join q3)
(q. join q2).q3 ⇔ q1 join (q2.q3)
(q. join q2) where q3 ⇔ q1 join (q2 where q3)
The distributivity property allows a new optimisation method – pushing
a selection before a join (also known in relational systems). If a whole predicate is not
independent of where, but has a form:
p1 and p2 and … pk-1 and … pk and … pk+1 and … pn-1 and pn
and some of pk subpredicate is independent, the predicate can be transformed into:
p1 and p2 and … pk-1 and pk+1 and … pn-1 and pn
provided pk is pushed so that its selection is evaluated only once.
Pushing out selections is similar to factoring out – it can be applied to a set of
nonalgebraic operators the predicate is independent of and the predicate can be
arbitrarily complex (including procedures’ invocations). However, there is a substantial
difference between these techniques – a factored out expression is evaluated after some
nonalgebraic operator Θ it is dependent on pushes its sections (opens its scope) on the
ENVS, while in case of pushing out an independent predicate is evaluated before such
an operator opens its scope (a pushed out predicate is pushed before the operator it is
dependent on).
Pushing out can be applied if all nonalgebraic operators a predicate is
independent of are distributive, and only if it is connected to the rest of its container
predicate with and operator (otherwise factoring out could be applied).
Page 81 of 235
Chapter 4
4.4.2 Rewriting Views and Query Modification
SBQL updateable views are saved as stored functional database procedures. Their
execution (on invocation from a query) can be performed with a common macrosubstitution technique; however SBQL uses also a much better solution – query
modification.
A basic view execution means that when a view name is encountered in a query,
its body is executed and the result is just pushed onto QRES (possibly for further
processing as a sub-result of the query). However if a view body is macro-substituted
for its name in a query, new optimisation possibilities open as a nested query from the
view becomes a regular subquery pending all SBQL rules and optimisation techniques.
The names in the resulting subquery are bound as other names in the query and
therefore the whole query can be rewritten regardless if some name is entered in its
textual form or macro-substituted for a view name (including nested view invocations).
Views that can be macro-substituted must be characterized by the following
features:
•
They do not create their own local environments (i.e. they operate only on global
data),
•
They are dynamically bound,
•
The only expression in the body is a single query.
The query modification is slightly more constrained for parameterized views
(i.e. views with arguments). A basic method for passing arguments (parameters) to the
procedure (here: a view) is call-by-value (or call-by-reference). When using a macrosubstitution, call-by-name should be used instead. This means that when macrosubstituting a view body, its formal parameters must be replaced with the actual
parameters. And a resulting query can be optimised in a regular way. The limitation for
using this technique is that a view cannot have side effects, precisely: its execution
cannot affect in any way its actual parameters.
The view modification method can be applied to any stored procedure, not only
a view, provided it obeys the limitations listed above.
Page 82 of 235
Chapter 4
4.4.3 Removing Dead Subqueries
A dead subquery is a part of a query whose execution does not influence the final result
and, of course, it is not used as an intermediate result for evaluating a (sub)query it is
contained within. Therefore evaluation of such a dead subquery only consumes time and
resources and should be avoided, which is performed by the method described. Dead
subqueries occur in user-typed queries and they are very often introduced as a result of
macro-substitution of views.
The procedure for identification of a dead subquery is based on excluding
subqueries that directly or indirectly contribute to the final result of the query. Queries
that can contain dead parts are these with navigation operators and/or quantifiers, as
these operators (projection and quantifiers) can use only a part of their left operand
(a subquery), ignoring the rest of it which becomes dead. Other nonalgebraic operators
in SBQL (e.g., a navigational join or selection) do not have this property as they always
consume whole operand queries’ results. Where might use only a part of its left
subquery, however it always determines the result type and therefore none part of the
left operand can be considered dead.
However, some dead subqueries cannot be removed. This rare situation occurs if
removing a dead part affects quantity of returned objects (a size of the result) – such
a subquery is called partially dead. Another interesting case is that removing one dead
subquery could make another subquery that, therefore the removing process should be
repeated until all dead parts are removed.
4.4.4 Removing Auxiliary Names
Auxiliary names (aliases) can be introduced to a query by a programmer in order to
make it clearer or to access precisely its particular parts (subquery results) in other parts.
They are also introduced automatically from views’ macro-substitution (as they exist in
views’ definitions). An auxiliary name (resulting in a binder) is processed by projection
onto an alias, which operations in many cases come out unnecessary (a projected result
can be accessed directly) and a new opportunity for optimisation appears. The operation
of removing unnecessary auxiliary names is not straightforward, as it can change
a query result.
Page 83 of 235
Chapter 4
An auxiliary name n can be removed only if:
•
A result of evaluating n is consumed by a dot operator,
•
It is used only for navigation, i.e. a direct nonalgebraic operator using n is dot and
a result of evaluating n is not used for any other nonalgebraic operator.
Moreover, an auxiliary name cannot be removed if this operation would make
names in the “uncovered” subquery (previously “hidden” under the alias) bind in other
sections than in the original query with the auxiliary name.
4.4.5 Low Level Techniques
A set of low level query optimisation techniques exists in object-oriented and relational
query optimisers. In general, they rely mainly on some physical features of a database
system (e.g. data storage and organisation). These techniques can be applied on both
data access and query transformation (e.g., introducing index invocations) stages.
4.4.5.1 Indexing
Indices are auxiliary (redundant) database structures stored at a server side accelerating
access to particular data according to given criteria. In general, an index is a twocolumn table where the first column consists of unique key values and the other one
holds non-key values which in most cases are object references. Key values are used as
an input for index search procedures. As a result, such a procedure returns suitable nonkey values from the same table row. Keys are usually collections of distinct values of
specific attributes (they can be also some complex structures or results of some queries
or procedures) of database objects (dense indices) or represent ranges of these values
(range indices).
The current ODRA implementation supports indices based on linear hashing
[201] structure which can be easily extended to its distributed version SDDS31 [202] in
order to optimally utilise a data grid computational resources. Additionally to key types
mentioned earlier (dense and range) enumerated type was introduced to improve
multiple key indexing. ODRA supports local indexing which ensures an index
transparency by providing mechanism (optimisation framework) to automatically utilise
31
Scalable Distributed Data Structure
Page 84 of 235
Chapter 4
index before query evaluation and therefore to take advantage of indices (distributed
indexing is under development).
The index optimisation procedure (substituting index call for a particular
subquery) is performed during static query rewriting. Such an index invocation is
a regular procedure (function) with parameters (corresponding to index key values) that
could be used in a regular query (because of index transparency features, index
functions’ names are rejected at query typechecking). The function accesses index data
according to given or calculated key value(s) and pushes found object identifiers onto
QRES for further regular processing.
An index call is substituted as a left where operand so that other predicates (if
exist) are evaluated on considerably smaller collection (sometimes orders of magnitude)
returned from fast index execution. A cost model is used in order to determine which
index is the most efficient (its selectivity is the issue) for a given query and what
predicate can be replaced with an index function call.
Page 85 of 235
Chapter 5
Object-Relational Integration
Methodology
5.1 General Architecture and Assumptions
Fig. 3 (page 43) presents the architecture of the eGov-Bus virtual repository and the
place taken by the described wrapper. Fig. 17 presented below shows a much more
general view of a virtual repository with its basic functional elements. The regarded
resources can be any data and service sources, however here only relational databases
are shown for simplification.
Global client 1
Global client 2
Global infrastructures
(security, authentication, transaction, indices, workflow, web services)
Data and services global virtual store
(virtual data integration and presentation)
Object-oriented
model and
query language
Schema mapping, query recognition and rewriting
Communication and transport, result reconstruction
Wrapper 1
Wrapper 2
Communication and transport, reading schema
RDBMS 1
RDBMS 2
Relational
model and
query language
Fig. 17 Virtual repository general architecture
The virtual repository provides global clients with required functionalities, e.g.
the trust and security infrastructure, communication mechanisms, etc. It is also
responsible for integration and management of virtual data and presentation of
the global schema. A set of wrappers (marked with red) interfaces between the virtual
Page 86 of 235
Chapter 5
Object-Relational Integration Methodology
repository and the resources. The basic wrapper functionalities (shown in blue boxes)
refer to:
•
Providing communication and data transportation means (both between
the wrapper and the virtual repository and between the wrapper and the resource),
•
Enabling, preferably automated, relational schema reading,
•
Mapping a relational schema to the object-oriented model,
•
Analysing object-oriented queries so that appropriate results were retrieved from
relational resources.
Another view on the virtual repository with its wrappers is presented in Fig. 18,
where schema integration stages are shown (colours correspond to the mediation
architecture shown in Fig. 1, page 25).
Global client 1
Global client 2
Object-oriented
business model
Global
schema
Global infrastructures
Global virtual store
Integration
schema
Global views
Contributory
view 1
Contributory
schema
Wrapper 1
Contributory
view 2
Contributory
schema
Administrator/
designer
Object-oriented relational
model representation (M0)
Wrapper 2
Relational model
RDBMS 1
Local
schema 1
RDBMS 2
Local
schema 2
Fig. 18 Schema integration in the virtual repository
Contributory views present relational schemata as simple M0 object-oriented
models (subchapter 4.1 SBA Object Store Models). These views are parts of the global
schema provided by the system administrator/designed, i.e. they must obey names and
object structures used in the integration schema. The integration schema is responsible
for combining local schemata (according to known fragmentation rules and ontologies)
into the global schema presented to global users and the only one available in the top of
the virtual repository. In the simplest case, the integration schema can be used as the
global schema, it can be also further modified to match the virtual repository
requirements (in the most general case there can be applied separate global schemata
according to clients’ requirements and access rights). The integration schema and its
Page 87 of 235
Chapter 5
mapping onto the global schema are expressed with the global views in the virtual
repository.
In this architecture wrappers are responsible for reading local schemata of
wrapper resources and presenting them as object-oriented schemata (simple M0
models). These object-oriented schemata are further enveloped by appropriate
contributory views.
5.2 Query Processing and Optimisation
Besides the schema mapping and translation (so that relational resources become
available in the virtual repository), the wrapper is responsible for the query processing.
The schematic query processing diagram is presented in Fig. 19 (schema models shown
in light green correspond to Fig. 18 above).
Global query (SBQL)
Object-oriented
business model
Parser + type checker
Front-end SBQL syntax tree
External wrapper
(updateable views and query modification)
Object-oriented relational
model representation (M0)
Rewriting optimiser
Back-end SBQL syntax tree
Internal wrapper (converting SBQL
subqueries into equivalent SQL)
SBQL interpreter
dynamic SQL
(ODBC, JDBC, ADO,...)
Relational schema
information
Relational model
RDBMS
Fig. 19 Query processing schema
The global ad-hoc query (referring to the global schema) issued by one of the
global clients is regularly parsed and type-checked, which results in the front-end syntax
tree (still expressed in terms of the global schema). This syntax tree is macro-substituted
with the views’ definitions (corresponding to the global, integration and contributory
schemata shown in Fig. 18) and can be submitted to query modification procedures32.
The back-end SBQL syntax tree is extremely large due to the macro-substitution
applied and it refers to the primary objects, i.e. the ones exposed directly by the
32
The views must obey some simple rules for the query modification to be applied (subchapter 4.4.2),
which must be ensured by the system administrator
Page 88 of 235
Chapter 5
wrapper. Here the SBQL rewriting optimisers (regular SBQL optimisation, subchapter
4.4) can be applied together with the internal wrapper rewriter. The wrapper query
rewriter analyses the syntax tree to find expressions corresponding to “relational”
names, i.e. names corresponding to relational tables and columns – the analysis
procedure is based on the relational schema information available to the wrapper’s
back-end, the metabase and expressions’ signatures provided by the type checker
(details in Chapter 6 Query Analysis, Optimisation and Processing).
If such names are found, their SBQL subqueries are substituted with
corresponding dynamic SQL expressions (execute immediately) to be evaluated in the
wrapped resource. According to the naive approach, each relational name should be
substituted with simple SQL: select * from R, where all records are retrieved and
processed in the virtual repository for the desired result. This approach, however always
correct and reliable, is extremely inefficient and it introduces undesired data
transportation and materialisation. Therefore, the strong need for optimisation appears
so that much evaluation load was pushed down to the wrapped relational database,
where powerful query optimisers can work (details discussed in the following
subsection).
The general wrapper query optimisation and evaluation procedure consists of the
following steps:
1. Query modification is applied to all view invocations in a query, which are macrosubstituted with seed definitions of the views. If an invocation is preceded by the
dereference operator, instead of the seed definition, the corresponding on_retrieve
function is used (analogically, on_navigate for virtual pointers). The effect is
a monster huge SBQL query referring to the M0 version of the relational model
available at the back-end.
2. The query is rewritten according to static optimisation methods defined for SBQL
such as removing dead sub-queries, factoring out independent sub-queries, pushing
expensive operators (e.g. joins) down in the syntax tree, removing unused auxiliary
names, etc. The resulting query is SBQL-optimised, but still no SQL optimisation is
applied.
3. According to the available information about the relational schema, the back-end
wrapper's mechanisms analyse the SBQL query in order to recognise patterns
Page 89 of 235
Chapter 5
representing SQL-optimiseable queries. Then, execute immediately clauses are
issued and executed by the resource driver interface (e.g. JDBC).
4. The results returned by execute immediately are pushed onto the SBQL result stack
as collections of structures, which are then used for regular SBQL query evaluation.
5.2.1 Naive Approach vs. Optimisation
As stated above, the naive implementation can be always applied, but it gives no chance
for relational optimisers to work and all processing is executed by the virtual repository
mechanism.
SBQL expressions (subqueries) resulting from step 2 in the above procedure
should be further analysed by the internal wrapper in order to find possibly largest
subqueries (patterns) transformable to (preferably) optimiseable33 SQL. The wrapper
optimisation goals are as follows:
•
Reduce amount of data retrieved and materialised (in the most favourable case
only final results can be retrieved).
•
Minimise processing by the virtual repository (in the most favourable case
retrieved results match exactly the original query intention).
The wrapper optimisation is much more challengeable than the simple rewriting
applied in the naive approach presented in the previous chapter. The first issue is that
many SBQL operators and expressions do not have relational counterparts (e.g. as,
groupas), some others’ semantics differ (e.g. the assignment operator in SBQL is not
macroscopic). Hence, the set of SQL-transformable SBQL operators has been first
isolated. According to the optimisation goals, the wrapper optimiser should attempt to
find the largest SBQL subquery that can be expressed in equivalent SQL. Therefore
the following search order has been assumed:
33
•
Aggregate functions,
•
Joins,
•
Selections,
•
Names (corresponding to relational tables).
Although SQL optimisers (subchapter 3.3 Relational Query Processing and Optimisation Architecture)
are transparent, one can assume when they actually act, e.g. when evaluating joins, selections over
indexed columns, etc.
Page 90 of 235
Chapter 5
This order is caused by possible complex expression forms and arguments
(internal expressions), i.e. aggregate functions can be evaluated over joins or selections
and joins can contain additional selection conditions. In case of joins, selections and
table names, also another optimisation is applied since projections can be found and
only desired relational columns retrieved.
The actual wrapper query processing algorithms are discussed in Chapter 6,
followed by comprehensive examples, while results of application of these algorithms
are given in Chapter 7.
5.3 A Conceptual Example
This subsection presents an abstract conceptual (implementation independent)
example of the presented relational schema wrapping and query processing approach.
First, consider a simple two-table relational schema (Fig. 20). This medical database
contains information on patients (the patientR table) and doctors (doctorR); “R” stands
for “relational” and it is introduced just to increase the example clearness. Each patient
is treated by some doctor, which is realised with the primary-foreign key relationship on
doctorR.id and patientR.doctor_id columns. Besides the primary keys there are nonunique (secondary) indices on patientR.surname and doctorR.surname columns.
doctorR
id
(PK)
name
surname
salary ...
specialty
patientR
id
(PK)
name
surname
doctor_id (FK)
Fig. 20 Relational schema for the conceptual example
This schema is imported by the wrapper and primary objects corresponding to
relational tables and columns are created, i.e. a relational table is a complex object
whose subobjects correspond to the table’s columns (with their primitive data types);
the relational names are preserved for these objects (one-to-one mapping). This simple
object-oriented schema is ready for querying, but still relational constraints are not
reflected. Therefore it is enveloped with object-oriented updateable views. The resulting
object-oriented appears (Fig. 21) – the primary-foreign key relationship is realised with
the isTreatedBy virtual pointer.
Page 91 of 235
Chapter 5
Doctor
isTreatedBy ◄
id
name
surname
salary
specialty
Patient
id
name
surname
Fig. 21 Object-oriented view-based schema for the conceptual example
The simplified code of the enveloping views with no updates defined (just
retrieval and navigation where necessary) is presented in Listing 1.
Listing 1 Simplified updateable views for the conceptual example
view DoctorDef {
virtual objects Doctor: record {d: doctorR;}[0..*] {
return (doctorR) as d;
}
/* on_retrieve skipped for Doctor */
}
view idDef {
virtual objects id: record {_id: doctorR.id;} {
return d.id as _id;
}
on_retrieve: integer {
return deref(_id);
}
}
view nameDef {
virtual objects name: record {_name: doctorR.name;} {
return d.name as _name;
}
on_retrieve: string {
return deref(_name);
}
}
view surnameDef {
virtual objects surname: record {_surname: doctorR.surname;} {
return d.surname as _surname;
}
return deref(_surname);
}
}
view salaryDef {
virtual objects salary: record {_salary: doctorR.salary;} {
return d.salary as _salary;
}
on_retrieve: real {
return deref(_salary);
}
}
view specialtyDef {
virtual objects specialty: record {_specialty: doctorR.specialty;} {
return d.specialty as _specialty;
}
return deref(_specialty);
}
}
}
Page 92 of 235
Chapter 5
view PatientDef {
virtual objects Patient: record {p: patientR;}[0..*] {
return (patientR) as p;
}
/* on_retrieve skipped for Patient */
}
view idDef {
virtual objects id: record {_id: patientR.id;} {
return p.id as _id;
}
return deref(_id);
}
}
view nameDef {
virtual objects name: record {_name: patientR.name;} {
return p.name as _name;
}
}
}
view surnameDef {
virtual objects surname: record {_surname: patientR.surname;} {
return p.surname as _surname;
}
}
}
view isTreatedByDef {
virtual objects isTreatedBy: record {_isTreatedBy: patientR.doctor_id;} {
return p.doctor_id as _isTreatedBy;
}
return deref(_isTreatedBy);
}
on_navigate: Doctor {
return Doctor where id = _isTreatedBy;
}
}
}
The next paragraphs discuss the query processing steps with the corresponding
textual forms and visualised syntax trees.
Page 93 of 235
Chapter 5
Consider an object-oriented query referring this object-oriented view-based schema aiming to retrieve surnames of doctors treating
patients named Smith, whose salary is equal to the minimum salary of cardiologists (Fig. 22):
((Patient where surname = "Smith").isTreatedBy.Doctor as doc where doc.salary = min((Doctor where specialty =
"cardiology").salary)).doc.surname;
Fig. 22 Conceptual example Input query syntax tree
The first step in the query processing is introducing implicit deref calls where necessary (Fig. 23):
(((((((Patient where (deref(surname) = "Smith")) . isTreatedBy) . Doctor)) as doc where (deref((doc . salary)) =
min(deref(((Doctor where (deref(specialty) = "cardiology")) . salary))))) . doc) . surname);
Page 94 of 235
Chapter 5
Fig. 23 Conceptual example query syntax tree with dereferences
Then, macro-substitute deref calls with on_retrieve and on_navigate definitions for virtual objects and virtual pointers, respectively. Substitute
all view invocations with the queries from sack definitions (the syntax tree illustration is skipped due to its large size). The step allows the query
modification procedures (subsection 4.4.2 Rewriting Views and Query Modification) applied in the next steps due to the simple single-command
views’ bodies:
((((((((patientR) as p where ((((p . surname)) as _surname . deref(_surname)) = "Smith")) . ((p . doctor_id)) as _isTreatedBy) .
((doctorR) as d where ((((d . id)) as _id . deref(_id)) = deref(_isTreatedBy))))) as doc where (((doc . ((d . salary)) as
_salary) . deref(_salary)) = min(((((doctorR) as d where ((((d . specialty)) as _specialty . deref(_specialty)) = "cardiology"))
. ((d . salary)) as _salary) . deref(_salary))))) . doc) . ((d . surname)) as _surname);
Now, remove auxiliary names where possible (i.e.: p, d, _surname, _isTreatedBy, _id, _salary. _specialty, _salary) – the views’ definitions
macro-substituted in the previous step are now regular parts of the original query and syntactical transformations can be applied (Fig. 24):
Page 95 of 235
Chapter 5
((((((doctorR where (deref(id) = ((patientR where (deref(surname) = "Smith")) . deref(doctor_id))))) as doc where ((doc .
deref(salary)) = min(((doctorR where (deref(specialty) = "cardiology")) . deref(salary))))) . doc) . surname)) as _surname;
Fig. 24 Conceptual example query syntax tree after removing auxiliary names
Apply SBQL optimisation methods – here two independent subqueries can be found (for evaluating the minimum cardiologists’ salary and
patients names Smith). They are pulled in front of the query and their results given auxiliary names aux0 and aux1 (Fig. 25):
(((((min(((doctorR where (deref(specialty) = "cardiology")) . deref(salary)))) as aux0 . ((((((patientR where (deref(surname) =
"Smith")) . deref(doctor_id))) as aux1 . (doctorR where (deref(id) = aux1)))) as doc where ((doc . deref(salary)) = aux0))) .
doc) . surname)) as _surname;
Page 96 of 235
Chapter 5
Fig. 25 Conceptual example query syntax tree after SBQL optimisation
Page 97 of 235
Chapter 5
In the following query form the wrapper analysis and wrapper optimisation are performed. Basing on the relational model information available
the following SQL queries invoked with execute immediately can be created (Fig. 26):
exec_immediately("select min(salary) from doctorR where specialty = 'cardiology'") as aux0 . exec_immediately("select doctor_id
from patientR where surname = 'Smith'") as aux1 . exec_immediately("select surname from doctorR where salary = '" + aux0 + "' and
id = '" + aux1 + "'") as _surname;
Fig. 26 Conceptual example query syntax tree after wrapper optimisation
Page 98 of 235
Chapter 5
Any of the SQL queries will be executed in the relational resource with application of
indices, where available. The minimum processing is required by the virtual repository
– the partial results returned from the first two execute immediately calls are pushed
onto stacks as they parameterise the last SQL query. The last execute immediately call
retrieves the final result matching the original query semantics.
Please notice, that the above transformations are valid only there is exactly one
employee named Smith; nevertheless the idea holds in more general cases, as proved by
more complex examples.
Page 99 of 235
Chapter 6
Query Analysis, Optimisation and
Processing
The algorithms presented in the following chapter refer to step 3 in the query
optimisation and evaluation procedure described above (subchapter 5.2 Query
Processing and Optimisation, page 89). They assume that the input SBQL query in
view-rewritten (macro-substitution and query modification applied) and type-checked.
6.1 Proposed Algorithms
The algorithm shown in Fig. 27 is applied for the query analysis. First, the query
tree is checked if relational names exist. If not, the analysis stops – the query does not
refer to the wrapper and it should not be modified.
Otherwise, the tree is checked if it is an update on a delete query. If so, the check
if the wrapper optimisation can be applied is performed. Currently only range
expressions are detected at this stage, since they do not have counterparts in SQL and
pointing to some particular object for updating/deleting cannot be translated. If such an
expression is found as the operation target object, an error is returned and the algorithm
stops. In all other cases, the delete/update tree is rewritten (the detailed procedure
presented in subsections 6.1.2 and 6.1.3, respectively) and returned.
If query is selection (neither update nor delete), possibly SQL-transformable
patterns are searched (aggregate functions, joins, selections, table names). For each
target pattern all matches are found starting from the tree root so that the most external
ones are first processed (i.e. the ones corresponding to the largest subqueries). Each
match is checked if it can be optimised (i.e. if all subexpressions of the corresponding
Page 100 of 235
Chapter 6
Query Analysis and Processing
tree branch are recognised and transformable). If so, the transformation is applied
(details in subsection 6.1.1). Otherwise, the tree branch corresponding to the current
match is returned unoptimised for further analysis and processing.
START
Get tree
NO
NO
Is delete?
Aktualizacja?
Is update?
YES
YES
Relational names
exist?
NO
Return original tree
YES
Make tree copy
Can method
be applied?
Target : aggregate functions,
joins, selections, names
NO
Return error
YES
Rewrite branch (tree)
Next target exists?
NO
Return optimised
tree
YES
Next target
STOP
Find all target matches
NO
Next match exists?
Rewrite branch
YES
Next match
YES
Can method
be applied?
NO
Leave branch unchanged
Fig. 27 Query analysis algorithm
At the beginning of the query analysis and transformation the tree copy is made.
The copy is returned instead of an optimised tree if some unexpected error occurs (e.g.
an unsupported expression signature) when some branches are already modified. This
ensures that query evaluation is always correct, however executed without the wrapper
optimisation (the naive approach is applied in such case).
6.1.1 Selecting Queries
The following algorithm is applied to each wrapper-optimiseable subquery (tree branch)
found according to the analysis procedure described above (Fig. 27).
Page 101 of 235
Chapter 6
START
Get tree
Remeber result signature
Aggregate function?
YES
Remember function type
NO
Can table names
be recognized?
NO
Return error
YES
Find table names
Can selection
conditions be found?
NO
Return error
YES
Find selection conditions
Find projections
(colum names)
Build SQL query
Build result pattern
(from signature)
Build SBQL expression
Reconstruct typological
information
Return SBQL
expression
STOP
Fig. 28 Selecting query processing algorithm
Selecting queries are optimised according to the algorithm shown in Fig. 28. The
first step is remembering the input (sub)query signature for further use. The query is
checked if it is an aggregate function (if so, the function type is remembered). The next
steps are finding table names invoked (if the names cannot be established, e.g. in case of
unsupported signatures, an error is returned and the procedure stops) and selection
conditions (if the conditions cannot be established, e.g. in case of unsupported
signatures, an error is returned and the procedure stops). Next, projections (column
names) are found. The projected names are not required for the most queries (although
projections can seriously limit amounts of data retrieved and materialised), they can be
required by some aggregate functions (e.g. min, max, avg). The final steps consist of
building a SQL query string (subsection 6.1.4) basing on the analysis performed and
Page 102 of 235
Chapter 6
enveloping the query string in the appropriate SBQL expression for regular stack-based
evaluation (subsection 6.1.5). The expression is given back the signature of the original
input query, it is also provided with the description of the result to be reconstructed. The
description is established basing on the original signature and it enables pushing onto
stack the result matching the original query semantics.
If this procedure returns an error, the overall query is processed in the
unoptimised form.
6.1.2 Deleting Queries
The following algorithm is applied only if the query is recognised as deleting in the
analysis process illustrated in Fig. 27.
Processing deleting queries is similar to selecting ones, since delete operators in
SBQL and SQL are similar (in both languages they are macroscopic), the algorithm is
presented in Fig. 29. The input query tree is checked for the table name, from which the
deletion should be performed and selection conditions to point records to be deleted,
again an error is returned if any of these fails. There is no need to remember the input
query signature, since in SBQL the delete operator does not return a value (an empty
result). However, since the common practice in relational databases is to return the
number of records affected by the deletion, an integer signature (and the corresponding
result pattern) is assigned to the final SBQL expression. This mismatch between regular
SBQL deleting and wrapper deleting does not affect the query semantics as it does
enforce any further modifications by the programmer. But the additional information
might be useful in some cases.
If this procedure returns an error, it is propagated and returned instead of the
optimised query, since there is no other way of expressing deletes, as it was possible for
selects.
Page 103 of 235
Chapter 6
START
Get tree
Can table name
be recognized?
NO
Return error
YES
Find table name
Can selection
NO
Return error
YES
Build SQL query
(integer value)
information
Return SBQL
expression
STOP
Fig. 29 Deleting query processing algorithm
6.1.3 Updating Queries
The following algorithm is applied only if the query is recognised as updating in the
analysis process illustrated in Fig. 27.
Selecting and deleting queries are processed similarly; however updates in
SBQL substantially differ from SQL. The semantics of the assignment operator (:=) in
SBQL requires a single object as the left-hand side expression, in SQL the update
operation in macroscopic. Further, the SBQL update returns a reference to the affected
object, which cannot be realised directly in SQL.
Due to these differences, the algorithm shown in Fig. 30 has been designed.
First, the procedure analyses the left-hand side expression of the assignment operator to
find the table and its column to be updated and again selection conditions are detected
Page 104 of 235
Chapter 6
(in case of an unrecognised expression, an error is returned and the procedure ends).
Then the right-hand side expression is isolated for being used as the update value.
START
Get tree
Can table name
be recognized?
NO
Return error
YES
Find table name
(LHS expression)
Can selection
NO
Return error
YES
(LHS expression)
Find projection
(LHS expression)
Isolate RHS
expression
Build SBQL
count check expr.
Build SQL
update query
(integer value)
information
Build IfThen
expression
Return IfThen
expression
STOP
Fig. 30 Updating query processing algorithm
According to the selection conditions found, the aggregate count function is
realised in SQL in order to check the number of rows to be affected (in order to comply
with the SBQL assignment semantics). The other SQL expression is the actual update
constructed according to the analysis results. Both SQL query strings are embedded in
Page 105 of 235
Chapter 6
the SBQL IfThen expression evaluating first the count check, then performing the SQL
update (provided the count check result is 1). No operation is performed otherwise.
The original reference signature is not available from the SQL evaluation and it
is replaced with the integer value signature – on the query evaluation the number of
affected rows is returned (0 or 1 values are possible only).
The problem of changing the original typological information (an integer value
instead of a reference) unfortunately might cause some problems (detected already in
the query typechecking stage, not the runtime). Some approaches for overcoming this
issue were considered like performing additional selection after the actual update in
order to return the reference. Such behaviour would be much useful, however, since the
reference to the same object (realising a wrapped relational column) is dynamically
assigned on each selection and retrieval and it does not provide any constant mapping
between the object-oriented and relational stores.
Similarly to deleting queries, if this procedure returns an error, it is propagated
and returned instead of the optimised query, since there is no other way of expressing
updates, as it was possible for selects.
6.1.4 SQL Query String Generation
Any of the presented algorithms (selecting, deleting or updating queries) requires
building a SQL query string based on conditions returned from the appropriate analysis
process. The actual SQL generation algorithm is irrelevant (in the prototype
implementation zql [203] is employed), nevertheless some issues should be pointed
here.
Problems arise if the analysed SBQL query involves relational names (referring
to tables and columns), literal values (strings, integers, etc.) and pure object-oriented
expressions (mixed queries), e.g. names not referring to the wrapper schema, procedure
calls, SBQL-specific expressions (e.g. now(), random()). Such expressions cannot be
just substituted to SQL query strings as first they must be stack-based evaluated. Hence,
the proposed algorithm is presented in Fig. 31. This procedure should be performed
over the actual query analysis and its results should be restored in final SQL query
strings but for simplification it is skipped in diagrams presented in Fig. 27, Fig. 28, Fig.
29 and Fig. 30.
Page 106 of 235
Chapter 6
START
Get tree
Find „forbidden”
expressions
Extract and replace „forbidden ”
expressions with unique dummy
expressions (e.g. strings)
Store externally oroginal
„forbidden ” expressions
with their dummy expressions
Restore correct signatures
to dummy expressions
(e.g. string values)
Process tree with dummy
expressions with
appropriate algorithm
Search generated
SQL strings for
dummy expressions
Split SQL strings for
dummy expressions
Concatenate SQL substrings
with original expressions with
SBQL operators (+)
Replace original SQL strings
with concatenated ones
Return modified
tree
STOP
Fig. 31 SQL generation and processing algorithm for mixed queries
The input SBQL syntax tree is searched for “forbidden” expressions, i.e. the
ones that cannot be mapped onto the relational schema or embedded in SQL strings.
These expressions are extracted from the tree and their branches replaced with unique
dummy expressions (e.g. string expressions) with appropriate signatures. The original
expressions are stored with the corresponding dummy ones for the future
reconstruction. This modified tree is submitted to analysis and transformations
described in the previous subsections. The resulting tree is searched for SQL query
strings. Each SQL string is searched for the dummy expressions and split into two
substrings over the match. The substrings are then concatenated with the corresponding
original expression related to the dummy one (regular SBQL + operator used). The tree
branch corresponding to the SQL query is again type-checked to introduce typological
information and modifications required for the stack-based evaluation. The final SQL
Page 107 of 235
Chapter 6
query appears on runtime when all restored SBQL subexpressions are evaluated on
stacks and the final string can be concatenated (also in a stack-based manner).
6.1.5 Stack-Based Query Evaluation and Result Reconstruction
As stated above, SQL query strings executed with dynamic SQL must be enveloped
with appropriate SBQL expressions providing typological information so that the stackbased evaluation can be performed. These SBQL expressions are referred to execsql and
they have to be provided with additional information, i.e. how to process returned
relational results so that the original SBQL query semantics is unchanged. This
information should be passed somehow to the runtime environment, where expression
signatures available during the analysis process do not exists anymore. Another piece
information to be supplied for the wrapper query evaluation is some unique identifier of
the particular wrapper – as shown in Fig. 17, the virtual repository integrates many
resources.
During evaluating SBQL query34, when execsql expression is encountered, its
arguments (the SQL query, the result pattern, the wrapper identifier) are pushed onto
stacks. If the SQL query is to be evaluated separately, regular expression evaluation
procedures are first applied. When the string is ready, it is send to the appropriate
wrapper pointed by the identifier. Returned relational results are processed according to
the result pattern and pushed onto the query result stack. In the execsql expression is
a subexpression of a larger query the results can be further processed. If not, they are
returned directly to the client.
In the prototype implementation (Appendix C), the wrapper identifier denotes
the corresponding database module name, while the result patterns are expressed as
simple string expressions.
6.2 Query Analysis and Optimisation Examples
The examples of query processing performed by the wrapper presented below are based
on two relational schemata: “employees” and “cars” described in the following
subsection.
34
The actual evaluation is performed on the compiled query byte code, which step is skipped in the
procedure description for simplification
Page 108 of 235
Chapter 6
6.2.1 Relational Test Schemata
The first schema (Fig. 32) presents some company employees data. The main table
(employees) contains personal data and is related by one-to-many relation to
the departments table, which in turn is related to the locations table. The other schema
(Fig. 33) presents data on cars. The main table (cars) contains car data and is related by
one-to-many relation to the models table, which in turn is related to the makes table.
By assumption, the schemata can be maintained on separate machines and they
are only logically related by employees.id and cars.owner_id columns (an employee can
own a car), therefore more general and real-life wrapper actions can be simulated and
analysed.
employees
id
departments
(PK)
id
(PK)
name
name
surname
location_id
locations
id
(PK)
name
(FK)
sex
salary
info
birth_date
department_id (FK)
Fig. 32 The "employees" test relational schema
cars
id
models
(PK)
owner_id
model_id
id
makes
(PK)
name
(FK)
make_id
id
(PK)
name
(FK)
year
colour
Fig. 33 The "cars" test relational schema
Besides the primary keys (with appropriate unique indices), secondary (nonunique) indices are maintained at employees.surname, employees.salary, employees.sex
departments.name, locations.name columns.
The schemata are automatically wrapped to simple internal object-oriented
schemata where a table corresponds to a complex object, while a column corresponds to
its simple subobject with a corresponding primitive data type. The applied naming
convention uses original table and column names prefixed with the $ character (e.g.
“employees” results in “$employees”). The complete generation procedure is described
Page 109 of 235
Chapter 6
in Appendix C (subchapter Relational Schema Wrapping) and any intermediate
implementation-dependent steps are skipped in this chapter.
These simple object-oriented models are then enveloped with administratordesigned views (regarded as contributory schema views) realising virtual pointers
responsible for primary-foreign key pairs, the views can also control data access (e.g. in
the example they disallow updating primary keys and virtual pointers – the commented
code blocks). The code of the views used in the example is presented in Listing 2. The
views transform the relational schemata into object-oriented models shown in Fig. 34
(for simplification, the fields corresponding to virtual pointers are not shown, although
on_retrieve procedures are defined).
worksIn
*►
isOwnedBy
Employee
*►
id
name
surname
sex
salary
info
birthDate
Car
id
colour
year
Department
isLocatedIn
*►
id
name
isModel
*►
Model
Location
id
name
isMake
*►
id
name
Make
id
name
Fig. 34 The resulting object-oriented schema
Listing 2 Code of views for the test schemata
view EmployeeDef {
virtual objects Employee: record { e: employees; }[0..*] {
return (employees) as e;
}
on_retrieve: record { id: integer; name: string; surname: string;
sex: string; salary: real; info: string;
birthDate: date; worksIn: integer; } {
return ( deref(e.id) as id, deref(e.name) as name,
deref(e.surname) as surname, deref(e.sex) as sex,
deref(e.salary) as salary, deref(e.info) as info,
deref(e.birth_date) as birthDate,
deref(e.department_id) as worksIn );
}
on_delete {
delete e;
}
view idDef {
virtual objects id: record { _id: employees.id; } {
return e.id as _id;
}
return deref(_id);
}
/* do not update the primary key */
Page 110 of 235
Chapter 6
}
view nameDef {
virtual objects name: record { _name: employees.name; } {
return e.name as _name;
}
}
on_update(newName: string) {
_name := newName;
}
}
view surnameDef {
virtual objects surname: record { _surname: employees.surname; } {
return e.surname as _surname;
}
}
on_update(newSurname: string) {
_surname := newSurname;
}
}
view sexDef {
virtual objects sex: record { _sex: employees.sex; } {
return e.sex as _sex;
}
return deref(_sex);
}
on_update(newSex: string) {
_sex := newSex;
}
}
view salaryDef {
virtual objects salary: record { _salary: employees.salary; } {
return e.salary as _salary;
}
on_retrieve: real {
}
on_update(newSalary: real) {
_salary := newSalary;
}
}
view infoDef {
virtual objects info: record { _info: employees.info; } {
return e.info as _info;
}
return deref(_info);
}
on_update(newInfo: string) {
_info := newInfo;
}
}
view birthDateDef {
virtual objects birthDate: record { _birthDate: employees.birth_date; } {
return e.birth_date as _birthDate;
}
on_retrieve: date {
return deref(_birthDate);
}
on_update(newBirthDate: date) {
_birthDate := newBirthDate;
}
}
view worksInDef {
Page 111 of 235
Chapter 6
virtual objects worksIn: record { _worksIn: employees.department_id; } {
return e.department_id as _worksIn;
}
return deref(_worksIn);
}
/* do not update the virtual pointer */
on_navigate: Department {
return Department where id = _worksIn;
}
}
}
view DepartmentDef {
virtual objects Department: record { d: departments; }[0..*] {
return (departments) as d;
}
on_retrieve: record { id: integer; name: string; isLocatedIn: integer; } {
return ( deref(d.id) as id, deref(d.name) as name,
deref(d.location_id) as isLocatedIn );
}
on_delete {
delete d;
}
view idDef {
virtual objects id: record { _id: departments.id; } {
return d.id as _id;
}
return deref(_id);
}
}
view nameDef {
virtual objects name: record { _name: departments.name; } {
return d.name as _name;
}
}
_name := newName;
}
}
view isLocatedInDef {
virtual objects isLocatedIn: record { _isLocatedIn:
departments.location_id; } {
return d.location_id as _isLocatedIn;
}
return deref(_isLocatedIn);
}
on_navigate: Location {
return Location where id = _isLocatedIn;
}
}
}
view LocationDef {
virtual objects Location: record { l: locations; }[0..*]
return (locations) as l;
}
on_retrieve: record { id: integer; name: string; } {
return ( deref(l.id) as id, deref(l.name) as name );
}
on_delete {
delete l;
Page 112 of 235
{
Chapter 6
}
view idDef {
virtual objects id: record { _id: locations.id; } {
return l.id as _id;
}
return deref(_id);
}
}
view nameDef {
virtual objects name: record { _name: locations.name; } {
return l.name as _name;
}
}
_name := newName;
}
}
}
view CarDef {
virtual objects Car: record { c: cars; }[0..*] {
return (cars) as c;
}
on_retrieve: record { id: integer; isOwnedBy: integer; isModel: integer;
colour: string; year: integer; } {
return ( deref(c.id) as id, deref(c.owner_id) as isOwnedBy,
deref(c.model_id) as isModel, deref(c.colour) as colour,
deref(c.year) as year );
}
on_delete {
delete c;
}
view idDef {
virtual objects id: record { _id: cars.id; } {
return c.id as _id;
}
return deref(_id);
}
}
view isOwnedByDef {
virtual objects isOwnedBy: record { _isOwnedBy: cars.owner_id; } {
return c.owner_id as _isOwnedBy;
}
return deref(_isOwnedBy);
}
on_navigate: Employee {
return Employee where id = _isOwnedBy;
}
}
view isModelDef {
virtual objects isModel: record { _isModel: cars.model_id; } {
return c.model_id as _isModel;
}
return deref(_isModel);
}
on_navigate: Model {
return Model where id = _isModel;
}
Page 113 of 235
Chapter 6
}
view colourDef {
virtual objects colour: record { _colour: cars.colour; } {
return c.colour as _colour;
}
return deref(_colour);
}
on_update(newColour: string) {
_colour := newColour;
}
}
view yearDef {
virtual objects year: record { _year: cars.year; } {
return c.year as _year;
}
return deref(_year);
}
on_update(newYear: integer) {
_year := newYear;
}
}
}
view ModelDef {
virtual objects Model: record { m: models; }[0..*] {
return (models) as m;
}
on_retrieve: record { id: integer; isMake: integer; name: string;
return ( deref(m.id) as id, deref(m.make_id) as isMake,
deref(m.name) as name );
}
on_delete {
delete m;
}
view idDef {
virtual objects id: record { _id: models.id; } {
return m.id as _id;
}
return deref(_id);
}
}
view isMakeDef {
virtual objects isMake: record { _isMake: models.make_id; } {
return m.make_id as _isMake;
}
return deref(_isMake);
}
on_navigate: Make {
return Make where id = _isMake;
}
}
view nameDef {
virtual objects name: record { _name: models.name; } {
return m.name as _name;
}
}
_name := newName;
}
}
Page 114 of 235
} {
Chapter 6
}
view MakeDef {
virtual objects Make: record { m: makes; }[0..*] {
return (makes) as m;
}
on_retrieve: record { id: integer; name: string; } {
return ( deref(m.id) as id, deref(m.name) as name );
}
on_delete {
delete m;
}
view idDef {
virtual objects id: record { _id: makes.id; } {
return m.id as _id;
}
return deref(_id);
}
}
view nameDef {
virtual objects name: record { _name: makes.name; } {
return m.name as _name;
}
}
m.name := newName;
}
}
}
The following examples present subsequent query forms corresponding to
the substantial processing steps performed by the prototype implemented:
•
Raw – the syntactically correct ad-hoc query from a client,
•
Typechecked – the query after the typological control (dereferences and type casts
introduced where necessary) submitted for the view macro-substitution and
the query rewriting steps,
•
View-rewritten – the query after the view macro-substitution and the query
modification, ready for the wrapper analysis and optimisation; this query form
refers directly to the lowest level relational names recognisable by the wrapper
(relational names prefixed with “$”); this form of the query is re-typechecked,
•
Optimised – the wrapper optimised query form, where the possibly best
optimisation was performed basing on the relational model available,
•
Simply-rewritten – the query corresponding to the naive wrapper action.
The simply-rewritten query forms are not shown for imperative queries as
inapplicable ones. Similarly, for the multi-wrapper queries, completely optimised
queries are not available due to the current virtual repository limitations (the
Page 115 of 235
Chapter 6
justification for this wrapper behaviour is given in subsection 6.2.4 prior to
the corresponding examples).
The visualised syntax trees are provided for simple queries; they are skipped for
more complex queries due to their exceeding sizes, however.
Page 116 of 235
Chapter 6
6.2.2 Selecting Queries
Example 1: Retrieve surnames and names of employees earning more than 1200
Raw:
(Employee where salary > 1200).(surname, name);
Fig. 35 Raw (parsed) query syntax tree for example 1
Typechecked:
((Employee where (deref(salary) > (real)(1200))) . (surname , name))
Fig. 36 Typechecked query syntax tree for example 1
View-rewritten:
(((($employees) as _$employees) as e where (((e . (_$employees . $salary)) . deref(_VALUE)) >
(real)(1200))) . (((e . (_$employees . $surname))) as _surname , ((e . (_$employees . $name))) as
_name))
Page 117 of 235
Chapter 6
Fig. 37 View-rewritten query syntax tree for example 1
Optimised:
execsql("select employees.surname, employees.name from employees where (employees.salary > 1200)",
"<0 | | | none | struct <1 $employees | $surname | _surname | none | binder 1> <1 $employees | $name
| _name | none | binder 1> 0>", "admin.wrapper1")
Fig. 38 Optimised query syntax tree for example 1
Simply-rewritten: ((((execsql("select employees.info, employees.department_id, employees.surname, employees.salary,
employees.id, employees.sex, employees.name, employees.birth_date from employees", "<0 $employees |
| | none | ref 0>", "admin.wrapper1")) as _$employees) as e where (((e . (_$employees . $salary)) .
deref(_VALUE)) > (real)(1200))) . (((e . (_$employees . $surname))) as _surname , ((e . (_$employees
. $name))) as _name))
Page 118 of 235
Chapter 6
Basing on the query form provided by the view-rewriter performing macro-substituting views’ definitions and query modification steps
(Fig. 37), the wrapper optimiser searches the syntax tree to find patterns (expressions) transformable into SQL-optimiseable subqueries (finally
enveloped with execsql expressions). The largest expression in the query is where (marked with the red ellipse in Fig. 37) with the corresponding
syntax tree branch. After analysing this expression for selection conditions, projected columns are established basing on the right-hand side of the
root dot expression signature (the comma expression marked with the blue ellipse in Fig. 37 involving $surname and $name corresponding to
relational columns).
The optimised expression contains a single execsql expression where only surnames and names of employees (projection) earning more
than 1200 (selection) are retrieved, which exactly matches the initial query intention. No processing is required from the virtual repository; the
result patterns included in the execsql allow construction of a valid SBQL result compliant with the primary query signature.
In the unoptimised form (Fig. 39) the SQL query retrieves all records and the actual processing (selection and projection) is completely
performed by the virtual repository. The wrapper-rewriting simply performs replacement of the $employees name expression recognised as
a relational table name (the corresponding black ellipses in Fig. 37 and Fig. 39).
Page 119 of 235
Chapter 6
Fig. 39 Simply-rewritten query syntax tree for example 1
Page 120 of 235
Chapter 6
Example 2: Retrieve first names of employees named Kowalski earning less than 2000
Raw:
(Employee where surname = "Kowalski" and salary < 2000).name;
Fig. 40 Raw (parsed) query syntax tree for example 2
Typechecked:
((Employee where ((deref(surname) = "Kowalski") and (deref(salary) < (real)(2000)))) . name)
Fig. 41 Typechecked query syntax tree for example 2
View-rewritten:
(((($employees) as _$employees) as e where ((((e . (_$employees . $surname)) . deref(_VALUE)) =
"Kowalski") and (((e . (_$employees . $salary)) . deref(_VALUE)) < (real)(2000)))) . ((e .
(_$employees . $name))) as _name)
Page 121 of 235
Chapter 6
Fig. 42 View-rewritten query syntax tree for example 2
Optimised:
execsql("select employees.name from employees where ((employees.surname = 'Kowalski') AND
(employees.salary < 2000))", "<0 $employees | $name | _name | none | binder 0>", "admin.wrapper1")
Fig. 43 Optimised query syntax tree for example 2
| | none | ref 0>", "admin.wrapper1")) as _$employees) as e where ((((e . (_$employees . $surname))
. deref(_VALUE)) = "Kowalski") and (((e . (_$employees . $salary)) . deref(_VALUE)) <
(real)(2000)))) . ((e . (_$employees . $name))) as _name)
Page 122 of 235
Chapter 6
Similarly to the previous example, the largest optimiseable subquery found is where expression (the red ellipse in Fig. 42). This
expression is analysed for selection conditions, here the complex and condition mappable directly to the SQL operator is recognised.
The projection searched by navigating up to the tree root (the dot expression) reveals only a single column denoted by $name (contained within
the unary as expression, the blue ellipse). Again, the resulting SQL string carries all query evaluation conditions.
The unoptimised query form (Fig. 44) is developed exactly as in the previous example by replacing the $employees name expression
recognised as a relational table name (the corresponding black ellipses in Fig. 42 and Fig. 44).
Page 123 of 235
Chapter 6
Fig. 44 Simply-rewritten query syntax tree for example 2
Page 124 of 235
Chapter 6
Example 3: Retrieve surnames of employees and names of departments of employees named Nowak
Raw:
Typechecked:
View-rewritten:
Optimised:
(Employee as e join e.worksIn.Department as d).(e.surname, d.name);
(((Employee) as e join (((e . worksIn) . Department)) as d) . ((e . surname) , (d . name)))
((((($employees) as _$employees) as e) as e join (((e . ((e . (_$employees . $department_id))) as
_worksIn) . ((($departments) as _$departments) as d where (((d . (_$departments . $id)) .
deref(_VALUE)) = (_worksIn . deref(_VALUE)))))) as d) . ((e . ((e . (_$employees . $surname))) as
_surname) , (d . ((d . (_$departments . $name))) as _name)))
execsql("select employees.surname, departments.name from employees, departments where
(departments.id = employees.department_id)", "<0 | | | none | struct <1 $employees | $surname |
_surname | none | binder 1> <1 $departments | $name | _name | none | binder 1> 0>",
"admin.wrapper1")
Simply-rewritten: (((((execsql("select employees.info, employees.department_id, employees.surname, employees.salary,
| | none | ref 0>", "admin.wrapper1")) as _$employees) as e) as e join (((e . ((e . (_$employees .
$department_id))) as _worksIn) . (((execsql("select departments.name, departments.location_id,
departments.id from departments", "<0 $departments | | | none | ref 0>", "admin.wrapper1")) as
_$departments) as d where (((d . (_$departments . $id)) . deref(_VALUE)) = (_worksIn .
deref(_VALUE)))))) as d) . ((e . ((e . (_$employees . $surname))) as _surname) , (d . ((d .
(_$departments . $name))) as _name)))
The raw query does not introduce explicit selection conditions; they appear after the macro-substituting view definitions and the query
modification from virtual pointers’ on_navigate procedures. The largest expressions recognised by the wrapper analyser is the join (relational
names $employees and $departments as arguments). Join conditions are found (corresponding to a primary-foreign key relation in the relational
schema), and finally projections established.
The resulting SQL performs the join and retrieves only requested column values, no processing is required from the virtual repository. In
case of the unoptimised query, two SQL selects are executed retrieving all records from employees and departments tables. The expensive join
has to be evaluated by the virtual repository.
Page 125 of 235
Chapter 6
Example 4: Retrieve surnames of employees and cities their departments are located in
Raw:
Typechecked:
(Employee as e join e.worksIn.Department as d join d.isLocatedIn.Location as l).(e.surname, l.name);
((((Employee) as e join (((e . worksIn) . Department)) as d) join (((d . isLocatedIn) . Location))
as l) . ((e . surname) , (l . name)))
View-rewritten:
(((((($employees) as _$employees) as e) as e join (((e . ((e . (_$employees . $department_id))) as
_worksIn) . ((($departments) as _$departments) as d where (((d . (_$departments . $id)) .
deref(_VALUE)) = (_worksIn . deref(_VALUE)))))) as d) join (((d . ((d . (_$departments .
$location_id))) as _isLocatedIn) . ((($locations) as _$locations) as l where (((l . (_$locations .
$id)) . deref(_VALUE)) = (_isLocatedIn . deref(_VALUE)))))) as l) . ((e . ((e . (_$employees .
$surname))) as _surname) , (l . ((l . (_$locations . $name))) as _name)))
Optimised:
execsql("select employees.surname, locations.name from employees, locations, departments where
((departments.id = employees.department_id) AND (locations.id = departments.location_id))", "<0 | |
| none | struct <1 $employees | $surname | _surname | none | binder 1> <1 $locations | $name | _name
| none | binder 1> 0>", "admin.wrapper1")
Simply-rewritten: ((((((execsql("select employees.info, employees.department_id, employees.surname, employees.salary,
| | none | ref 0>", "admin.wrapper1")) as _$employees) as e) as e join (((e . ((e . (_$employees .
$department_id))) as _worksIn) . (((execsql("select departments.name, departments.location_id,
deref(_VALUE)))))) as d) join (((d . ((d . (_$departments . $location_id))) as _isLocatedIn) .
(((execsql("select locations.id, locations.name from locations", "<0 $locations | | | none | ref
0>", "admin.wrapper1")) as _$locations) as l where (((l . (_$locations . $id)) . deref(_VALUE)) =
(_isLocatedIn . deref(_VALUE)))))) as l) . ((e . ((e . (_$employees . $surname))) as _surname) , (l
. ((l . (_$locations . $name))) as _name)))
Similarly to the previous example, join conditions appear after the view’s definitions are macro-substituted – the analysis procedure is the
same, although three relational tables joined over primary-foreign key pairs are recognised. The optimised query executes only one SQL query
evaluating the join and retrieving only required columns. The unoptimised one retrieves all records from these three tables (three separate SQL
queries) and join operations are evaluated and the projection performed by the virtual repository.
Page 126 of 235
Chapter 6
Example 5: Retrieve surnames and birth dates of employees named Kowalski working in the production department
Raw:
Typechecked:
(Employee where surname = "Kowalski" and worksIn.Department.name = "Production").(surname,
birthDate);
((Employee where ((deref(surname) = "Kowalski") and (deref(((worksIn . Department) . name)) =
"Production"))) . (surname , birthDate))
View-rewritten:
"Kowalski") and ((((((e . (_$employees . $department_id))) as _worksIn . ((($departments) as
deref(_VALUE))))) . (d . (_$departments . $name))) . deref(_VALUE)) = "Production"))) . (((e .
(_$employees . $surname))) as _surname , ((e . (_$employees . $birth_date))) as _birthDate))
Optimised:
execsql("select employees.surname, employees.birth_date from employees, departments where
((employees.surname = 'Kowalski') AND ((departments.name = 'Production') AND (departments.id =
employees.department_id)))", "<0 | | | none | struct <1 $employees | $surname | _surname | none |
binder 1> <1 $employees | $birth_date | _birthDate | none | binder 1> 0>", "admin.wrapper1")
. deref(_VALUE)) = "Kowalski") and ((((((e . (_$employees . $department_id))) as _worksIn .
(((execsql("select departments.name, departments.location_id, departments.id from departments", "<0
$departments | | | none | ref 0>", "admin.wrapper1")) as _$departments) as d where (((d .
(_$departments . $id)) . deref(_VALUE)) = (_worksIn . deref(_VALUE))))) . (d . (_$departments .
$name))) . deref(_VALUE)) = "Production"))) . (((e . (_$employees . $surname))) as _surname , ((e .
(_$employees . $birth_date))) as _birthDate))
The raw query contains two explicit selection conditions; the next one corresponding to the primary-foreign key relationship arises from
the workIn virtual pointer on_navigate procedure macro-substituted. The relational tables are detected and selection conditions with projected
columns established. The optimised query executes a single SQL query exactly matching the original intention, while the unoptimised one
requires again performing evaluation by the virtual repository mechanisms.
Page 127 of 235
Chapter 6
Example 6: Retrieve surnames and birth dates of employees named Kowalski working in Łódź city
Raw:
Typechecked:
(Employee where surname = "Kowalski" and worksIn.Department.isLocatedIn.Location.name =
"Łódź").(surname, birthDate);
((Employee where ((deref(surname) = "Kowalski") and (deref(((((worksIn . Department) . isLocatedIn)
. Location) . name)) = "Łódź"))) . (surname , birthDate))
View-rewritten:
"Kowalski") and ((((((((e . (_$employees . $department_id))) as _worksIn . ((($departments) as
deref(_VALUE))))) . ((d . (_$departments . $location_id))) as _isLocatedIn) . ((($locations) as
_$locations) as l where (((l . (_$locations . $id)) . deref(_VALUE)) = (_isLocatedIn .
deref(_VALUE))))) . (l . (_$locations . $name))) . deref(_VALUE)) = "Łódź"))) . (((e . (_$employees
. $surname))) as _surname , ((e . (_$employees . $birth_date))) as _birthDate))
Optimised:
execsql("select employees.surname, employees.birth_date from employees, locations, departments where
((employees.surname = 'Kowalski') AND ((locations.name = 'Łódź') AND ((departments.id =
employees.department_id) AND (locations.id = departments.location_id))))", "<0 | | | none | struct
<1 $employees | $surname | _surname | none | binder 1> <1 $employees | $birth_date | _birthDate |
none | binder 1> 0>", "admin.wrapper1")
. deref(_VALUE)) = "Kowalski") and ((((((((e . (_$employees . $department_id))) as _worksIn .
(((execsql("select departments.name, departments.location_id, departments.id from departments", "<0
$departments | | | none | ref 0>", "admin.wrapper1")) as _$departments) as d where (((d .
(_$departments . $id)) . deref(_VALUE)) = (_worksIn . deref(_VALUE))))) . ((d . (_$departments .
$location_id))) as _isLocatedIn) . (((execsql("select locations.id, locations.name from locations",
"<0 $locations | | | none | ref 0>", "admin.wrapper1")) as _$locations) as l where (((l .
(_$locations . $id)) . deref(_VALUE)) = (_isLocatedIn . deref(_VALUE))))) . (l . (_$locations .
$name))) . deref(_VALUE)) = "Łódź"))) . (((e . (_$employees . $surname))) as _surname , ((e .
(_$employees . $birth_date))) as _birthDate))
The navigation expressed in the raw query with virtual pointers reveals two additional selection conditions (primary-foreign key pairs)
after macro-substituting pointer’s on_navigate procedures. These conditions are recognised together with the explicit one and the optimised
query relies on a single SQL query matching exactly the original semantics.
Page 128 of 235
Chapter 6
Example 7: Retrieve the sum of salaries of employees named Kowalski working in Łódź city
Raw:
Typechecked:
sum((Employee where surname = "Kowalski" and worksIn.Department.isLocatedIn.Location.name =
"Łódź").salary);
sum(deref(((Employee where ((deref(surname) = "Kowalski") and (deref(((((worksIn . Department) .
isLocatedIn) . Location) . name)) = "Łódź"))) . salary)))
View-rewritten:
sum(((((($employees) as _$employees) as e where ((((e . (_$employees . $surname)) . deref(_VALUE)) =
"Kowalski") and ((((((((e . (_$employees . $department_id))) as _worksIn . ((($departments) as
deref(_VALUE))))) . ((d . (_$departments . $location_id))) as _isLocatedIn) . ((($locations) as
_$locations) as l where (((l . (_$locations . $id)) . deref(_VALUE)) = (_isLocatedIn .
deref(_VALUE))))) . (l . (_$locations . $name))) . deref(_VALUE)) = "Łódź"))) . (e . (_$employees .
$salary))) . deref(_VALUE)))
Optimised:
execsql("select sum(employees.salary) from employees, locations, departments where
((employees.surname = 'Kowalski') AND ((locations.name = 'Łódź') AND ((departments.id =
employees.department_id) AND (locations.id = departments.location_id))))", "<0 $employees | $salary
| | real | value 0>", "admin.wrapper1")
Simply-rewritten: sum((((((execsql("select employees.info, employees.department_id, employees.surname,
employees.salary, employees.id, employees.sex, employees.name, employees.birth_date from employees",
"<0 $employees | | | none | ref 0>", "admin.wrapper1")) as _$employees) as e where ((((e .
(_$employees . $surname)) . deref(_VALUE)) = "Kowalski") and ((((((((e . (_$employees .
$department_id))) as _worksIn . (((execsql("select departments.name, departments.location_id,
deref(_VALUE))))) . ((d . (_$departments . $location_id))) as _isLocatedIn) . (((execsql("select
locations.id, locations.name from locations", "<0 $locations | | | none | ref 0>",
"admin.wrapper1")) as _$locations) as l where (((l . (_$locations . $id)) . deref(_VALUE)) =
(_isLocatedIn . deref(_VALUE))))) . (l . (_$locations . $name))) . deref(_VALUE)) = "Łódź"))) . (e .
(_$employees . $salary))) . deref(_VALUE)))
The analysis procedure is similar as in the previous example, although the largest transformable expression found is the sum aggregate
function. Again, the selection conditions (explicit and implicit ones introduced by the macro-substituted on_navigate procedures) are expressed
in a single SQL query evaluating the aggregate function in the wrapped resource environment.
Page 129 of 235
Chapter 6
6.2.3 Imperative Constructs
Example 1: Delete employees named Nowak
Raw:
delete Employee where surname = "Nowak";
Fig. 45 Raw (parsed) query syntax tree for example 1 (imperative query)
Typechecked:
delete((Employee where (deref(surname) = "Nowak")))
Fig. 46 Typechecked query syntax tree for example 1 (imperative query)
View-rewritten:
delete(((((($employees) as _$employees) as e where (((e . (_$employees . $surname)) . deref(_VALUE))
= "Nowak")) . e) . _$employees))
Page 130 of 235
Chapter 6
Fig. 47 View-rewritten query syntax tree for example 1 (imperative query)
Optimised:
execsql("delete from employees where (employees.surname = 'Nowak')", "", "admin.wrapper1")
Fig. 48 Optimised query syntax tree for example 1 (imperative query)
As shown in the analysis procedure described in subsection 6.1.2, processing deleting queries is similar to selecting ones, but only
selections are detected (projections are not applicable).
Page 131 of 235
Chapter 6
Example 2: Set salaries of all employees to 1000
Raw:
Employee.salary := 1000;
Typechecked:
((Employee . salary) := (real)(1000))
Fig. 50Typechecked query syntax tree for example 2 (imperative query)
View-rewritten:
(((($employees) as _$employees) as e . ((e . (_$employees . $salary))) as _salary) . (((real)(1000))
as newSalary . (_salary . ((newSalary) as new$salary . (_VALUE := new$salary)))))
Page 132 of 235
Chapter 6
Optimised:
if ((execsql("select COUNT(*) from employees", "", "admin.wrapper1") = 1)) then (execsql("update
employees set salary=1000", "", "admin.wrapper1")
Processing updating queries (subsection 6.1.3) requires introducing additional check corresponding to microscopic assignment operator in
SBQL. Therefore basing on common selection conditions, first a number of rows to be modified is checked (the red ellipse in Fig. 52), then
the actual update is conditionally executed (the blue ellipse).
Page 133 of 235
Chapter 6
Example 3: Set salaries of all employees who earn less than 1000 to 1200
Raw:
(Employee where salary < 1000).salary := 1200;
Typechecked:
(((Employee where (deref(salary) < (real)(1000))) . salary) := (real)(1200))
Fig. 54 Typechecked query syntax tree for example 3 (imperative query)
Page 134 of 235
Chapter 6
View-rewritten:
((((($employees) as _$employees) as e where (((e . (_$employees . $salary)) . deref(_VALUE)) <
(real)(1000))) . ((e . (_$employees . $salary))) as _salary) . (((real)(1200)) as newSalary .
(_salary . ((newSalary) as new$salary . (_VALUE := new$salary)))))
Page 135 of 235
Chapter 6
Optimised:
if ((execsql("select COUNT(*) from employees where (employees.salary < 1000)", "", "admin.wrapper1")
= 1)) then (execsql("update employees set salary=1200 where (employees.salary < 1000)", "",
"admin.wrapper1"))
The query presented in this example introduces explicit selection conditions. These conditions are reflected in both the check query
(count, the red ellipse in Fig. 56) and the actual update (the blue ellipse).
Page 136 of 235
Chapter 6
6.2.4 Multi-Wrapper and Mixed Queries
Multi-wrapper queries mean queries invoking more than one wrapper instance, on the
other hand mixed queries combine both “relational” and pure object-oriented objects
and expressions. Such queries are unlikely to happen in the assumed wrapper
application and the virtual repository architecture since any query entering the wrapper
will refer only to the wrapper. Nevertheless, if any subquery does not refer to the target
wrapper, it must be evaluated separately and its result substituted to the wrapper query
for local rewriting and evaluation, partial results returned and stack-processed for the
final result. Nevertheless, the implementation allows executing such queries; however
wrapper optimisation is not always performed but simple rewriting only. Of course, this
does not exclude application of native SBQL optimisers, e.g., based on independent
subqueries’ methods, which also much improves the overall performance (unfortunately
more operations must be executed by virtual repository instead of wrapped resources’
engines).
Multi-wrapper and mixed queries were tested with application of the both test
schemata (subchapter 6.2.1) where employees’ and cars’ data are stored in separate
relational databases and they are integrated into the virtual repository by separate
wrappers. These schemata are logically connected by columns on employees and cars
tables (this information should be used by the global administrator/designer when
creating the global view and integration rules, which is reflected in the sample views in
Listing 2).
In the optimal environment, multi-wrapper queries should be processed in
a single-wrapper context, similarly to mixed ones. This means that the global multiwrapper queries should be decomposed by the virtual repository mechanisms and sent
to appropriate wrappers. This ensures that relational names corresponding to the local
wrapper are recognised correctly and other ones (evaluated by other wrappers) are
regarded as external ones. Such queries can be transformed as mixed ones, i.e.
expressions evaluated by non-local wrappers can be simply used for parametrizing SQL
query strings.
.
Page 137 of 235
Chapter 6
Example 1: Retrieve all employees with ID value equal to the variable idTest
Raw:
Employee where id = idTest;
Fig. 57 Raw (parsed) query syntax tree for example 1 (mixed query)
Typechecked:
(Employee where (deref(id) = deref(idTest)))
Fig. 58 Typechecked query syntax tree for example 1(mixed query)
View-rewritten:
((($employees) as _$employees) as e where (((e . (_$employees . $id)) . deref(_VALUE)) =
deref(idTest)))
Page 138 of 235
Chapter 6
Fig. 59 View-rewritten query syntax tree for example 1(mixed query)
Optimised:
execsql((("select employees.info, employees.department_id, employees.surname, employees.salary,
employees.id, employees.sex, employees.name, employees.birth_date from employees where (employees.id
= '" + (string)(deref(idTest))) + "')"), "<0 $employees | | e | none | binder 0>", "admin.wrapper1")
Fig. 60 Optimised query syntax tree for example 1(mixed query)
The SQL query string is evaluated as concatenation of common SQL query substrings with the pure SBQL expression
(string)(deref(idTest))). The final (evaluated) form is executed by the wrapper.
Page 139 of 235
Chapter 6
Example 2: Retrieve surname and salary of the employee whose id is equal to the value returned from an SBQL procedure procedure()
Raw:
(Employee where id = procedure(1)).(surname, salary);
Fig. 61 Raw (parsed) query syntax tree for example 3 (mixed query)
Typechecked:
((Employee where (deref(id) = procedure(1))) . (surname , salary))
Fig. 62 Typechecked (parsed) query syntax tree for example 3 (mixed query)
View-rewritten:
(((($employees) as _$employees) as e where (((e . (_$employees . $id)) . deref(_VALUE)) =
procedure(1))) . (((e . (_$employees . $surname))) as _surname , ((e . (_$employees . $salary))) as
_salary))
Page 140 of 235
Chapter 6
Fig. 63 View-rewritten query syntax tree for example 3 (mixed query)
Optimised:
execsql((("select employees.surname, employees.salary from employees where (employees.id = '" +
(string)(procedure(1))) + "')"), "<0 | | | none | struct <1 $employees | $surname | _surname | none
| binder 1> <1 $employees | $salary | _salary | none | binder 1> 0>", "admin.wrapper1")
Fig. 64 Optimised query syntax tree for example 3 (mixed query)
Page 141 of 235
Chapter 6
Example 3: Retrieve cars owned by employees whose surname is equal to the variable surnameTest
Raw:
Typechecked:
(cars as c where c.owner_id in (employees as e where e.surname = surnameTest).(e.id)).c;
(((cars) as c where (deref((c . owner_id)) in deref((((employees) as e where (deref((e . surname)) =
deref(surnameTest))) . (e . id))))) . c)
View-rewritten:
(((($cars) as _$cars) as c where (((c . (_$cars . $owner_id)) . deref(_VALUE)) in ((((($employees)
as _$employees) as e where (((e . (_$employees . $surname)) . deref(_VALUE)) = deref(surnameTest)))
. (e . (_$employees . $id))) . deref(_VALUE)))) . c)
“Optimised”:
((((execsql("select cars.owner_id, cars.year, cars.colour, cars.id, cars.model_id from cars", "<0
$cars | | | none | ref 0>", "admin.wrapper2")) as _$cars) as c where (((c . (_$cars . $owner_id)) .
deref(_VALUE)) in (((((execsql("select employees.info, employees.department_id, employees.surname,
employees.salary, employees.id, employees.sex, employees.name, employees.birth_date from employees",
"<0 $employees | | | none | ref 0>", "admin.wrapper1")) as _$employees) as e where (((e .
(_$employees . $surname)) . deref(_VALUE)) = deref(surnameTest))) . (e . (_$employees . $id))) .
deref(_VALUE)))) . c)
Example 4: Retrieve a string composed of a make name, a model name and a car production year for employees whose surname is equal to the
variable surnameTest
Raw:
Typechecked:
View-rewritten:
(((Car as c where c.isOwnedBy in (Employee as e where e.surname = surnameTest).(e.id)) join
c.isModel.Model as m) join m.isMake.Make as mm).(mm.name + " " + m.name + " " + c.year);
(((((Car) as c where (deref((c . isOwnedBy)) in deref((((Employee) as e where (deref((e . surname))
= deref(surnameTest))) . (e . id))))) join (((c . isModel) . Model)) as m) join (((m . isMake) .
Make)) as mm) . ((((deref((mm . name)) + " ") + deref((m . name))) + " ") + (string)(deref((c .
year)))))
((((((($cars) as _$cars) as c) as c where (((c . (c . (_$cars . $owner_id))) . deref(_VALUE)) in
(((((($employees) as _$employees) as e) as e where (((e . (e . (_$employees . $surname))) .
deref(_VALUE)) = deref(surnameTest))) . (e . ((e . (_$employees . $id))) as _id)) . (_id .
deref(_VALUE))))) join (((c . ((c . (_$cars . $model_id))) as _isModel) . ((($models) as _$models)
as m where (((m . (_$models . $id)) . deref(_VALUE)) = (_isModel . deref(_VALUE)))))) as m) join
(((m . ((m . (_$models . $make_id))) as _isMake) . ((($makes) as _$makes) as m where (((m . (_$makes
. $id)) . deref(_VALUE)) = (_isMake . deref(_VALUE)))))) as mm) . ((((((mm . (m . (_$makes .
$name))) . deref(_VALUE)) + " ") + ((m . (m . (_$models . $name))) . deref(_VALUE))) + " ") +
(string)(((c . (c . (_$cars . $year))) . deref(_VALUE)))))
Page 142 of 235
Chapter 6
“Optimised”:
(((((((execsql("select cars.owner_id, cars.year, cars.colour, cars.id, cars.model_id from cars", "<0
$cars | | | none | ref 0>", "admin.wrapper2")) as _$cars) as c) as c where (((c . (c . (_$cars .
$owner_id))) . deref(_VALUE)) in ((((((execsql("select employees.info, employees.department_id,
employees.surname, employees.salary, employees.id, employees.sex, employees.name,
employees.birth_date from employees", "<0 $employees | | | none | ref 0>", "admin.wrapper1")) as
_$employees) as e) as e where (((e . (e . (_$employees . $surname))) . deref(_VALUE)) =
deref(surnameTest))) . (e . ((e . (_$employees . $id))) as _id)) . (_id . deref(_VALUE))))) join
(((c . ((c . (_$cars . $model_id))) as _isModel) . (((execsql("select models.name, models.make_id,
models.id from models", "<0 $models | | | none | ref 0>", "admin.wrapper2")) as _$models) as m where
(((m . (_$models . $id)) . deref(_VALUE)) = (_isModel . deref(_VALUE)))))) as m) join (((m . ((m .
(_$models . $make_id))) as _isMake) . (((execsql("select makes.id, makes.name from makes", "<0
$makes | | | none | ref 0>", "admin.wrapper2")) as _$makes) as m where (((m . (_$makes . $id)) .
deref(_VALUE)) = (_isMake . deref(_VALUE)))))) as mm) . ((((((mm . (m . (_$makes . $name))) .
deref(_VALUE)) + " ") + ((m . (m . (_$models . $name))) . deref(_VALUE))) + " ") + (string)(((c . (c
. (_$cars . $year))) . deref(_VALUE)))))
Queries presented in examples 3 and 4 unfortunately cannot be completely optimised as no single-wrapper context can be established and
the names used are not recognised correctly. The query decomposition issues are to be performed by the virtual repository so that appropriate
subqueries can be analysed by corresponding wrappers and the methods similar to mixed queries applied. Nevertheless, this situation is not
hopeless as the SBQL optimisation can be applied, as shown in the following subsection.
Page 143 of 235
Chapter 6
6.2.5 SBQL Optimisation over Multi-Wrapper Queries
The examples of multi-wrapper queries shown in subchapter 6.2.4 are simply rewritten
only since in the current ODRA implementation the mechanism responsible for query
decomposition and distributed processing are still under development (simulating
decomposition mechanism is useless unless the mechanisms are verified by real-life
tests). However, in such cases SBQL optimisers still hold and can much improve the
overall performance. The following examples refer to the multi-wrapper schema
(employees and cars) and they invoke extremely expensive joins, therefore methods of
independent subqueries are effectively employed.
The SBQL code is simplified as view rewriting and wrapper rewriting steps are
not shown, so that only wrapper-related names from views exist (in the actual
processing these names are macro-substituted with their view definition and then
wrapper rewriting is applied).
The results of the SBQL optimisation over simply-rewritten wrapper queries are
presented in subchapter 7.3 Application of SBQL optimisers.
Page 144 of 235
Chapter 6
Example 1: Retrieve cars owned by employees named Nowak
Raw:
(Car as c where c.isOwnedBy in (Employee as e where e.surname = "Nowak").(e.id)).c;
Fig. 65 Raw query syntax tree for example 1 (multi-wrapper query)
Typechecked:
(((Car) as c where (deref((c . isOwnedBy)) in deref((((Employee) as e where (deref((e . surname)) =
"Nowak")) . (e . id))))) . c)
Page 145 of 235
Chapter 6
Fig. 66 Typechecked query syntax tree for example 1 (multi-wrapper query)
SBQL-optimised:
(((deref((((Employee) as e where (deref((e . surname)) = "Nowak")) . (e . id)))) groupas $aux0 .
((Car) as c where (deref((c . isOwnedBy)) in $aux0))) . c)
Page 146 of 235
Chapter 6
Fig. 67 SBQL-optimised query syntax tree for example 1 (multi-wrapper query)
The subquery recognised as an independent one within the type-checked query (the red ellipse in Fig. 66) is pushed out of the rest of the
query (the red ellipse in Fig. 67). The similar operation is illustrated in the next (a little more complex) example.
Page 147 of 235
Chapter 6
Example 2: Retrieve a string composed of a make name, a model name and a car production year for employees whose surname is Nowak
Raw:
(((Car as c where c.isOwnedBy in (Employee as e where e.surname = "Nowak").(e.id)) join
c.isModel.Model as m) join m.isMake.Make as mm).(mm.name + " " + m.name + " " + c.year);
Fig. 68 Raw query syntax tree for example 2 (multi-wrapper query)
Page 148 of 235
Chapter 6
Typechecked:
(((((Car) as c where (deref((c . isOwnedBy)) in deref((((Employee) as e where (deref((e . surname))
= "Nowak")) . (e . id))))) join (((c . isModel) . Model)) as m) join (((m . isMake) . Make)) as mm)
. ((((deref((mm . name)) + " ") + deref((m . name))) + " ") + (string)(deref((c . year)))))
Fig. 69 Typechecked query syntax tree for example 2 (multi-wrapper query)
Page 149 of 235
Chapter 6
SBQL-optimised:
(((((deref((((Employee) as e where (deref((e . surname)) = "Nowak")) . (e . id)))) groupas $aux0 .
((Car) as c where (deref((c . isOwnedBy)) in $aux0))) join (((c . isModel) . Model)) as m) join (((m
. isMake) . Make)) as mm) . ((((deref((mm . name)) + " ") + deref((m . name))) + " ") +
(string)(deref((c . year)))))
Fig. 70 SBQL-optimised query syntax tree for example 2 (multi-wrapper query)
Page 150 of 235
Chapter 6
6.3 Sample Use Cases
The simplest wrapper usage is presented in the above examples, where only basic views
realising relational models are applied. Since the main application of the wrapper is to
contribute to the virtual repository, the wrapped relational schema must be transparently
accessible in the global schema (the global view) according to the virtual repository
model via a contributory view (or a set of cascaded views responsible for integration
and global schemata, Fig. 18). In this subchapter some possible views referring to the
test schemata are described (query optimisation procedures are skipped due to
complexities of the corresponding examples, nevertheless the presented rules and
procedures still hold).
The below views rely directly on the views shown in Listing 2, which
corresponds to the schema translation within the virtual repository.
6.3.1 Rich Employees
The following SBQL view presents rich employees, i.e. employees who earn more than
2000:
Listing 3 SBQL view code for retrieving “rich employees”
view RichEmployeeDef {
virtual objects RichEmployee: record { e: Employee; }[0..*] {
return (Employee where salary > 2000) as e;
}
on_retrieve: record { fullname: string; sex: string; salary: real; } {
return ((deref(e.name) + " " + deref(e.surname)) as fullname,
deref(e.sex) as sex, deref(e.salary) as salary);
}
view fullnameDef {
virtual objects fullname: record { _fullname: string; } { return
(deref(e.name) + " " + deref(e.surname)) as _fullname; }
return _fullname;
}
}
view sexDef {
virtual objects sex: record { _sex: Employee.sex; } {
return e.sex as _sex;
}
return deref(_sex);
}
}
view salaryDef {
virtual objects salary: record { _salary: Employee.salary; } {
return e.salary as _salary;
}
on_retrieve: real {
}
}
}
Page 151 of 235
Chapter 6
Queries issued to this view could be:
1. Retrieve number of rich employees:
count(RichEmployee);
2. Retrieve the minimum salary of rich employees:
min(RichEmployee.salary);
3. Retrieve full names of rich employees earning 5000:
(RichEmployee where salary = 5000).fullname;
4. Retrieve sum of salaries of rich employees named Jan Kowalski35:
sum((RichEmployee where fullname = "Jan Kowalski").salary);
6.3.2 Employees with Departments
This view presents selected employees data with names of departments they work in
(the join over the virtual pointer is used):
Listing 4 SBQL view code for retrieving employees with their departments
view EmployeeDepartmentDef {
virtual objects EmployeeDepartment: record { e: Employee; d: Department;
}[0..*] {
return Employee as e join e.worksIn.Department as d;
}
on_retrieve: record { fullname: string; salary: real; department: string; }
{
deref(e.salary) as salary, deref(d.name) as department);
}
view fullnameDef {
virtual objects fullname: record { _fullname: string; } {
return (deref(e.name) + " " + deref(e.surname)) as _fullname;
}
return _fullname;
}
}
view salaryDef {
virtual objects salary: record { _salary: real; } {
return deref(e.salary) as _salary;
}
on_retrieve: real {
return _salary;
}
}
view departmentDef {
virtual objects department: record { _department: string; } {
return deref(d.name) as _department;
}
return deref(d.name);
35
Please notice that query 4 cannot be optimised by the wrapper since its selection condition is evaluated
over the complex expression (string concatenation), which operation is not available in the standard
SQL
Page 152 of 235
Chapter 6
}
}
}
1. Retrieve a string composed of the employee’s full name and the department’s name:
EmployeeDepartment.(fullname + "/" + department);
2. Retrieve the minimum salary of employees working in the production department:
min((EmployeeDepartment where department =
"Production").salary);
6.3.3 Employees with Cars
This view presents selected employees with their cars data (joins over virtual pointers
applied, data retrieved from separate relational schemata represented by different
wrappers):
Listing 5 SBQL view code for retrieving employees with their cars
view EmployeeCarDef {
virtual objects EmployeeCar: record { e: Employee; c: Car; ma: Make; mo:
Model; }[0..*] {
return Car as c join c.isOwnedBy.Employee as e join c.isModel.Model as mo
join mo.isMake.Make as ma;
}
on_retrieve: record { fullname: string; salary: real; make: string;
model: string; colour: string; year: integer; } {
deref(e.salary) as salary, deref(ma.name) as make,
deref(mo.name) as model, deref(c.colour) as colour,
deref(c.year) as year);
}
view fullnameDef {
}
return _fullname;
}
}
view salaryDef {
}
on_retrieve: real {
return _salary;
}
}
view makeDef {
virtual objects make: record { _make: string; } {
return deref(ma.name) as _make;
}
return _make;
}
}
view modelDef {
virtual objects model: record { _model: string; } {
Page 153 of 235
Chapter 6
return deref(mo.name) as _model;
}
return _model;
}
}
view colourDef {
virtual objects colour: record { _colour: string; } {
return deref(c.colour) as _colour;
}
return _colour;
}
}
view yearDef {
virtual objects year: record { _year: integer; } {
return deref(c.year) as _year;
}
return _year;
}
}
}
1. Retrieve full names of employees earning more than 2000 with colours of their cars:
(EmployeeCar where salary > 2000).(fullname, colour);
2. Retrieve full names and salaries of employees driving Hondas produced in 2007:
(EmployeeCar where make = "Honda" and year = "2007").(fullname,
salary);
6.3.4 Rich Employees with White Cars
This view presents selected data of rich employees owning white cars together with
selected cars’ data:
Listing 6 SBQL view code for retrieving rich employees with white cars
view RichEmployeeWhiteCarDef {
virtual objects RichEmployeeWhiteCar: record { e: Employee; c: Car; ma:
Make; mo: Model; }[0..*] {
return (Car where colour = "white") as c join (c.isOwnedBy.Employee where
salary > 2000) as e join c.isModel.Model as mo join mo.isMake.Make as ma;
}
on_retrieve: record { fullname: string; salary: real; make: string;
model: string; year: integer; } {
deref(e.salary) as salary, deref(ma.name) as make,
deref(mo.name) as model, deref(c.year) as year);
}
view fullnameDef {
}
return _fullname;
}
}
view salaryDef {
Page 154 of 235
Chapter 6
}
on_retrieve: real {
return _salary;
}
}
view makeDef {
virtual objects make: record { _make: string; } {
return deref(ma.name) as _make;
}
return _make;
}
}
view modelDef {
virtual objects model: record { _model: string; } {
return deref(mo.name) as _model;
}
return _model;
}
}
view yearDef {
virtual objects year: record { _year: integer; } {
return deref(c.year) as _year;
}
return _year;
}
}
}
1. Retrieve the minimum salary of employees driving Hondas produced after 2005:
min((RichEmployeeWhiteCar where make = "Honda" and year >
2005).salary);
2. Retrieve count of white Honda Civic cars owned by rich employees:
count(RichEmployeeWhiteCar where make = "Honda" and model =
"Civic");
Page 155 of 235
Chapter 7
Wrapper Optimisation Results
The wrapper optimisation results are collected average values from 10 subsequent
measurements performed on the test schemata (subchapter 6.2.1) populated with
random data with distributions presented below.
The measurements were taken in the following environment (a single machine):
Table 1 Optimisation testbench configuration
Property
Processor
RAM
HDD
OS
JVM
ODRA server Java heap
Wrapper server Java heap
RDBMS
Value
Intel Mobile Core 2 Duo T7400, 2.16 GHz, 4MB cache
2048 MB
200 GB, 5400 rpm
MS Windows XP SP2, 32 bit
Sun JRE SE 1.6.0_02-b06
1024 MB
512 MB
PostgreSQL 8.2
Due to relatively high memory consumption at the ODRA side (both for the
object store and the stack-based processing), the tests were unfortunately limited to the
maximum population of 1000 employees – greater values require disc memory
swapping causing misleading result.
Since the measurements were taken on a single machine, network throughput
and traffic delays are not considered (only local loopback communication), however in
real-life applications they can seriously affect overall query evaluation time, especially
when transferring huge amounts of data (e.g., in case on unoptimised queries).
The results may differ for various data distributions, they become more
repeatable as the number of employee records increases and their data distribution is
closer to the assumed one. The results can be also dependent on the RDBMS used.
Page 156 of 235
Chapter 7
7.1 Relational Test Data
Data distribution in the “employees” schema data is presented in the following figures,
employees' birth dates not shown below are distributed randomly between 1950-01-01
and 1987-12-31), sexes are equally distributed. The employees.info column contains
location
random text (lorem ipsum) between 200 bytes and 20 kilobytes long.
Warszaw a
Łódź
Kraków
Wrocław
Poznań
Gdańsk
Szczecin
probability
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
production
retail
w holesale
research
public relations
customer service
human
security
probability
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
0,4
0,45
salary
Fig. 72 Employee’s department distribution
5000
2000
1500
1200
800
500
probability
0
0,05
0,1
0,15
0,2
0,25
Fig. 73 Employee’s salary distribution
info's length
department
Fig. 71 Department’s location distribution
20 kB
10 kB
9 kB
7 kB
5 kB
2 kB
1 kB
500 B
200 B
probability
0
0,05
0,1
0,15
Fig. 74 Employee’s info’s length distribution
Page 157 of 235
0,2
0,25
0,07
0
0,05
0
ANNA
MARIA
KATARZYNA
MAŁGORZAT
AGNIESZKA
KRYSTYNA
BARBARA
EWA
ELśBIETA
ZOFIA
JANINA
TERESA
JOANNA
MAGDALENA
MONIKA
JADWIGA
DANUTA
IRENA
HALINA
HELENA
BEATA
ALEKSANDR
MARTA
DOROTA
MARIANNA
GRAśYNA
JOLANTA
STANISŁAW
IWONA
KAROLINA
BOśENA
URSZULA
JUSTYNA
RENATA
ALICJA
PAULINA
SYLWIA
NATALIA
WANDA
AGATA
ANETA
IZABELA
EWELINA
MARZENA
WIESŁAWA
GENOWEFA
PATRYCJA
KAZIMIERA
EDYTA
STEFANIA
0
NOWAK
KOWALSKA
WIŚNIEWSKA
WÓJCIK
KOWALCZYK
KAMIŃSKA
LEWANDOWSKA
ZIELIŃSKA
SZYMAŃSKA
WOŹNIAK
DĄBROWSKA
KOZŁOWSKA
JANKOWSKA
MAZUR
WOJCIECHOWSKA
KWIATKOWSKA
KRAWCZYK
PIOTROWSKA
KACZMAREK
GRABOWSKA
PAWŁOWSKA
MICHALSKA
ZAJĄC
KRÓL
JABŁOŃSKA
WIECZOREK
NOWAKOWSKA
WRÓBEL
MAJEWSKA
OLSZEWSKA
STĘPIEŃ
JAWORSKA
MALINOWSKA
ADAMCZYK
NOWICKA
GÓRSKA
DUDEK
PAWLAK
WITKOWSKA
WALCZAK
RUTKOWSKA
SIKORA
BARAN
MICHALAK
SZEWCZYK
OSTROWSKA
TOMASZEWSKA
PIETRZAK
JASIŃSKA
WRÓBLEWSKA
0,08
JAN
ANDRZEJ
PIOTR
KRZYSZTOF
STANISŁAW
TOMASZ
PAWEŁ
JÓZEF
MARCIN
MAREK
MICHAŁ
GRZEGORZ
JERZY
TADEUSZ
ADAM
ŁUKASZ
ZBIGNIEW
RYSZARD
DARIUSZ
HENRYK
MARIUSZ
KAZIMIERZ
WOJCIECH
ROBERT
MATEUSZ
MARIAN
RAFAŁ
JACEK
JANUSZ
MIROSŁAW
MACIEJ
SŁAWOMIR
JAROSŁAW
KAMIL
WIESŁAW
ROMAN
WŁADYSŁAW
JAKUB
ARTUR
ZDZISŁAW
EDWARD
MIECZYSŁAW
DAMIAN
DAWID
PRZEMYSŁAW
SEBASTIAN
CZESŁAW
LESZEK
DANIEL
WALDEMAR
Chapter 7
probability
0,07
0,06
0,05
0,04
0,03
0,02
0,01
Fig. 75 Female employee’s first name distribution
probability
0,06
0,05
0,04
0,03
0,02
0,01
Fig. 76 Female employee’s surname distribution
probability
0,04
0,03
0,02
0,01
Fig. 77 Male employee’s first name distribution
Page 158 of 235
Chapter 7
0,07
probability
0,06
0,05
0,04
0,03
0,02
0
NOWAK
KOWALSKI
WIŚNIEWSKI
WÓJCIK
KOWALCZYK
KAMIŃSKI
LEWANDOWSKI
ZIELIŃSKI
WOŹNIAK
SZYMAŃSKI
DĄBROWSKI
KOZŁOWSKI
JANKOWSKI
MAZUR
WOJCIECHOWSKI
KWIATKOWSKI
KRAWCZYK
KACZMAREK
PIOTROWSKI
GRABOWSKI
ZAJĄC
PAWŁOWSKI
KRÓL
MICHALSKI
WRÓBEL
WIECZOREK
JABŁOŃSKI
NOWAKOWSKI
MAJEWSKI
STĘPIEŃ
OLSZEWSKI
JAWORSKI
MALINOWSKI
DUDEK
ADAMCZYK
PAWLAK
GÓRSKI
NOWICKI
SIKORA
WALCZAK
WITKOWSKI
BARAN
RUTKOWSKI
MICHALAK
SZEWCZYK
OSTROWSKI
TOMASZEWSKI
PIETRZAK
ZALEWSKI
WRÓBLEWSKI
0,01
Fig. 78 Male employee’s surname distribution
In the “cars” schema makes are randomly selected values of the following set:
Audi, Ford, Honda, Mitsubishi, Toyota, and Volkswagen. For each make a random
model is selected (a list of models is shown in the table below). A production year is a
random value between 1990 and 2007.
Table 2 Test data for cars
Make
Audi
Ford
Honda
Mitsubishi
Toyota
Volkswagen
Models
A3, A4, A6, A08, TT
Escort, Focus, GT, Fusion, Galaxy
Accord, Civic, Prelude
3000GT, Diamante, Eclipse, Galant
Camry, Celica, Corolla, RAV4, Yaris
Golf, Jetta, Passat
7.2 Optimisation vs. Simple Rewriting
Below there is presented a set of representative queries with their raw SBQL forms and
the corresponding SQL query strings embedded in optimised queries (intermediate
transformations with examples are described in the previous chapter). Each query is
given a plot of average unoptimised, average optimised execution times for 10, 50, 100,
500 and 1000 employee records together with the optimisation gain (the evaluation
times’ ratio). The reference queries (unoptimised ones) are realised with the naive
wrapper approach, i.e. all processing is performed by the virtual repository on
materialised data retrieved from wrapped relational databases. Additional short
comments are issued to explain the optimisation gain for particular queries if needed.
Please notice that the following optimisation results do not rely on referring to
infrequent column values and the resulting unnaturally enforced reduction of amount of
Page 159 of 235
Chapter 7
data transported and materialised (e.g. surnames “Nowak” is the most probable value
regardless of an employee sex).
Query 1: Retrieve surnames of all employees
Employee.surname;
select employees.surname from employees
ref. avg. time
4500
opt. avg. time
gain
4000
9,00
3500
8,00
7,00
3000
time [ms]
10,00
6,00
2500
5,00
2000
gain
SBQL:
SQL:
4,00
1500
3,00
1000
2,00
500
1,00
0
10
0,00
1000
100
no. of em ployees
Fig. 79 Evaluation times and optimisation gain for query 1
The optimisation gain for this simple query is affected by the simple projection
applied – no undesired data is transported and materialised.
Query 2: Retrieve employees earning more than 1200
SBQL:
SQL:
Employee where salary > 1200;
select employees.info, employees.department_id,
employees.surname, employees.salary, employees.id,
employees.sex, employees.name, employees.birth_date from
employees where (employees.salary > 1200
ref. avg. time
5000
opt. avg. time
gain
3,00
4500
2,50
4000
2,00
3000
2500
1,50
gain
time [ms]
3500
2000
1,00
1500
1000
0,50
500
0
10
100
no. of em ployees
0,00
1000
In this query the selection is performed over the indexed salary column. No
projection is applied, however. All records from selected rows are retrieved and
materialised, which matches the original query semantics.
Page 160 of 235
Chapter 7
Query 3: Retrieve surnames of employee earning more than 1200
SBQL:
SQL:
(Employee where salary > 1200).surname;
select employees.surname from employees where (employees.salary
> 1200)
ref. avg. time
5000
opt. avg. time
gain
25,00
4500
4000
20,00
3000
15,00
gain
time [ms]
3500
2500
2000
10,00
1500
1000
5,00
500
0
10
0,00
1000
100
no. of em ployees
Query 3 introduces the projection applied on the selection as in the previous
case. This increases the gain about 10 times (for 1000 employees) comparing to query
2; the similar projection was applied in query 1 with the corresponding gain value.
Query 4: Retrieve employees whose surname is Kowalski
SBQL:
SQL:
Employee where surname = "Kowalski";
employees where (employees.surname = 'Kowalski')
ref. avg. time
5000
opt. avg. time
gain
4500
35,00
30,00
4000
time [ms]
3000
20,00
2500
15,00
2000
1500
gain
25,00
3500
10,00
1000
5,00
500
0
10
100
no. of em ployees
0,00
1000
Query 5: Retrieve first names of employees whose surname is Kowalski
SBQL:
SQL:
(Employee where surname = "Kowalski").name;
select employees.name from employees where (employees.surname =
'Kowalski')
Page 161 of 235
Chapter 7
ref. avg. time
6000
opt. avg. time
gain
140,00
120,00
5000
100,00
80,00
3000
gain
time [ms]
4000
60,00
2000
40,00
1000
20,00
0
10
0,00
1000
100
no. of em ployees
Query 5 allows the selection over the indexed surname column combined with
the single-column projection.
Query 6: Retrieve the employee with id equal 1
SBQL:
SQL:
Employee where id = 1;
employees where (employees.id = 1)
ref. avg. time
5000
opt. avg. time
gain
4500
160,00
140,00
4000
120,00
100,00
3000
2500
80,00
2000
gain
time [ms]
3500
60,00
1500
40,00
1000
20,00
500
0
10
100
no. of em ployees
0,00
1000
In this query only the selection is performed, but the unique index corresponding
to the primary key on the employees table is used and only one record is retrieved and
materialised.
Query 7: Retrieve first name of the employee with id equal 1
SBQL:
SQL:
(Employee where id = 1).name;
select employees.name from employees where (employees.id = 1)
Page 162 of 235
Chapter 7
ref. avg. time
6000
opt. avg. time
gain
180,00
160,00
5000
140,00
120,00
100,00
3000
80,00
2000
gain
time [ms]
4000
60,00
40,00
1000
20,00
0
10
0,00
1000
100
no. of em ployees
Query 7 introduces also the projection, which slightly increases the gain
comparing to query 6.
Query 8: Retrieve all employees named Kowalski earning more than 1200
SBQL:
SQL:
Employee where salary > 1200 and surname = "Kowalski";
employees where ((employees.salary > 1200) AND
(employees.surname = 'Kowalski'))
ref. avg. time
5000
opt. avg. time
gain
4500
80,00
70,00
4000
60,00
3000
50,00
2500
40,00
2000
gain
time [ms]
3500
30,00
1500
20,00
1000
10,00
500
0
10
100
no. of em ployees
0,00
1000
Query 8 optimisation gain is based on the selections over the indexed columns
surname and salary.
Query 9: Retrieve first names of employees named Kowalski earning more than 1200
SBQL:
SQL:
(Employee where salary > 1200 and surname = "Kowalski").name;
select employees.name from employees where ((employees.salary >
1200) AND (employees.surname = 'Kowalski'))
Page 163 of 235
Chapter 7
ref. avg. time
5000
opt. avg. time
gain
140,00
4500
120,00
4000
time [ms]
3000
80,00
2500
gain
100,00
3500
60,00
2000
1500
40,00
1000
20,00
500
0
10
0,00
1000
100
no. of em ployees
Here, the projection is also applied comparing to query 8 (the same selection
conditions). Therefore, the gain still increases.
Query 10: Retrieve employees named Kowalski earning more than 1200 or named
Nowak
SBQL:
SQL:
Employee where salary > 1200 and surname = "Kowalski" or
surname = "Nowak";
employees where (((employees.salary > 1200) AND
(employees.surname = 'Kowalski')) OR (employees.surname =
'Nowak'))
ref. avg. time
6000
opt. avg. time
gain
18,00
16,00
5000
14,00
12,00
10,00
3000
8,00
2000
gain
time [ms]
4000
6,00
4,00
1000
2,00
0
10
100
no. of em ployees
0,00
1000
The gain is relatively low comparing to previous queries containing selections
over indexed columns. This phenomenon is caused by the alternative (or) used in the
selection. Therefore the number of retrieved records is large (according to the test data
distribution).
Page 164 of 235
Chapter 7
Query 11: Retrieve first names of employees named Kowalski earning more than 1200
or named Nowak
SBQL:
SQL:
(Employee where salary > 1200 and surname = "Kowalski" or
surname = "Nowak").name;
select employees.name from employees where (((employees.salary
> 1200) AND (employees.surname = 'Kowalski')) OR
(employees.surname = 'Nowak'))
ref. avg. time
6000
opt. avg. time
gain
90,00
80,00
5000
70,00
60,00
50,00
3000
40,00
2000
gain
time [ms]
4000
30,00
20,00
1000
10,00
0
10
0,00
1000
100
no. of employees
The projection introduced in query 11 allows increasing the gain a few times
comparing to the same selection conditions as in query 10.
Query 12: Retrieve employees named Kowalski or Nowak and earning more than 1200
ref. avg. time
6000
opt. avg. time
gain
35,00
30,00
5000
25,00
4000
20,00
gain
SQL:
Employee where salary > 1200 and (surname = "Kowalski" or
surname = "Nowak");
employees where ((employees.salary > 1200) AND
((employees.surname = 'Kowalski') OR (employees.surname =
'Nowak')))
time [ms]
SBQL:
3000
15,00
2000
10,00
1000
5,00
0
10
100
no. of em ployees
Page 165 of 235
0,00
1000
Chapter 7
Query 13: Retrieve surnames and salaries employees named Kowalski or Nowak and
earning more than 1200
SBQL:
SQL:
(Employee where salary > 1200 and (surname = "Kowalski" or
surname = "Nowak")).(surname, salary);
select employees.surname, employees.salary from employees where
((employees.salary > 1200) AND ((employees.surname =
'Kowalski') OR (employees.surname = 'Nowak')))
ref. avg. time
7000
opt. avg. time
gain
6000
120,00
100,00
5000
60,00
gain
time [ms]
80,00
4000
3000
40,00
2000
20,00
1000
0
10
0,00
1000
100
no. of em ployees
Query 14: Retrieve employees named Kowalski whose salaries are between 800 and
2000
SBQL:
SQL:
((Employee where surname = "Kowalski") where salary > 800)
where salary < 2000;
employees where (((employees.surname = 'Kowalski') AND
(employees.salary > 800)) AND (employees.salary < 2000))
ref. avg. time
6000
opt. avg. time
gain
70,00
60,00
5000
50,00
40,00
gain
time [ms]
4000
3000
30,00
2000
20,00
1000
10,00
0
10
100
no. of em ployees
0,00
1000
In query 14 the logical correspondence of nested where SBQL operators to SQL
and operators is used. The resulting selection refers to indexed columns surname and
salary.
Page 166 of 235
Chapter 7
Query 15: Retrieve first names of employees named Kowalski whose salaries are
between 800 and 2000
SBQL:
SQL:
(((Employee where surname = "Kowalski") where salary > 800)
where salary < 2000).name;
select employees.name from employees where (((employees.surname
= 'Kowalski') AND (employees.salary > 800)) AND
(employees.salary < 2000))
ref. avg. time
5000
opt. avg. time
gain
4500
140,00
120,00
4000
time [ms]
3000
80,00
2500
gain
100,00
3500
60,00
2000
1500
40,00
1000
20,00
500
0
10
0,00
1000
100
no. of em ployees
The gain is increased comparing to query 14 by additional projection applied.
Query 16: Retrieve employees with departments they work in
SBQL:
SQL:
Employee join worksIn.Department;
employees.sex, employees.name, employees.birth_date,
departments.name, departments.location_id, departments.id from
employees, departments where (departments.id =
employees.department_id)
ref. avg. time
50000
opt. avg. time
gain
12,00
45000
10,00
40000
8,00
30000
25000
6,00
gain
time [ms]
35000
20000
4,00
15000
10000
2,00
5000
0
10
100
no. of em ployees
0,00
1000
The gain in query 16 results from evaluating joins by the relational resource
(primary-foreign key relationship applied). The relatively low value is caused by large
amount of data retrieved and materialised (all records from the employees table with the
corresponding records from the departments table).
Page 167 of 235
Chapter 7
Query 17: Retrieve surnames if employees and names of departments they work in
SBQL:
SQL:
(Employee as e join e.worksIn.Department as d).(e.surname,
d.name);
select employees.surname, departments.name from employees,
departments where (departments.id = employees.department_id)
ref. avg. time
50000
opt. avg. time
gain
70,00
45000
60,00
40000
time [ms]
30000
40,00
25000
gain
50,00
35000
30,00
20000
15000
20,00
10000
10,00
5000
0
10
0,00
1000
100
no. of em ployees
The gain increases comparing to query 16 due to the projection applied.
Query 18: Retrieve surname and department’s name of the employee with id equal 1
SBQL:
SQL:
((Employee where id = 1) as e join e.worksIn.Department as
d).(e.surname, d.name);
departments where ((employees.id = 1) AND (departments.id =
employees.department_id))
ref. avg. time
5000
opt. avg. time
gain
140,00
4500
120,00
4000
100,00
3000
80,00
gain
time [ms]
3500
2500
60,00
2000
1500
40,00
1000
20,00
500
0
10
100
no. of em ployees
0,00
1000
The join conditions and the projection are the same as in query 17. However the
join is performed by the relational resource only for a single employees record pointed
by the primary key column. Therefore the query is evaluated faster and the amount of
data retrieved and materialised substantially decreased. Also the unoptimised evaluation
time much decreases due to the selection limiting the number of joins.
Page 168 of 235
Chapter 7
Query 19: Retrieve surname and department’s name of the employee named Nowak
SBQL:
SQL:
((Employee where surname = "Nowak") as e join
e.worksIn.Department as d).(e.surname, d.name);
departments where ((employees.surname = 'Nowak') AND
(departments.id = employees.department_id))
ref. avg. time
8000
opt. avg. time
gain
7000
120,00
100,00
6000
time [ms]
4000
60,00
3000
gain
80,00
5000
40,00
2000
20,00
1000
0
10
0,00
1000
100
no. of em ployees
Similarly to query 18, here an extra selection condition is given. The gain is
slightly lower as the index on the surname column is not unique and more records are
joined and retrieved. The surname selection improves also the unoptimised times (less
as in the case of the unique index in query 18).
Query 20: Retrieve employees with departments they work in and departments’
locations
Employee join worksIn.Department join isLocatedIn.Location;
employees.sex, employees.name, employees.birth_date,
departments.name, departments.location_id, departments.id,
locations.id, locations.name from employees, departments,
locations where ((departments.id = employees.department_id) AND
(locations.id = departments.location_id))
opt. avg. time
gain
20,00
90000
18,00
80000
16,00
70000
14,00
60000
12,00
50000
10,00
40000
8,00
30000
6,00
20000
4,00
10000
2,00
0
10
100
no. of em ployees
Page 169 of 235
0,00
1000
gain
ref. avg. time
100000
time [ms]
SBQL:
SQL:
Chapter 7
The explanation of the relatively low gain is the same as in query 16 as the triple
join is evaluated without any additional selections (except from the primary-foreign
keys’ dependencies).
Query 21: Retrieve surnames of employees with names of the locations their
departments are located in
SBQL:
SQL:
(Employee as e join e.worksIn.Department as d join
d.isLocatedIn.Location as l).(e.surname, l.name);
select employees.surname, locations.name from employees,
locations, departments where ((departments.id =
employees.department_id) AND (locations.id =
departments.location_id))
ref. avg. time
100000
opt. avg. time
gain
120,00
90000
100,00
80000
80,00
60000
50000
60,00
gain
time [ms]
70000
40000
40,00
30000
20000
20,00
10000
0
10
100
no. of em ployees
0,00
1000
The gain substantially increases comparing to query 20 due to the projection
realised in SQL.
Query 22: Retrieve surnames and names of location where employees named Nowak
work
SBQL:
SQL:
((Employee where surname = "Nowak") as e join
e.worksIn.Department as d join d.isLocatedIn.Location as
l).(e.surname, l.name);
locations, departments where (((employees.surname = 'Nowak')
AND (departments.id = employees.department_id)) AND
(locations.id = departments.location_id))
Page 170 of 235
Chapter 7
time [ms]
opt. avg. time
gain
120,00
10000
100,00
8000
80,00
6000
60,00
4000
40,00
2000
20,00
0
10
gain
ref. avg. time
12000
0,00
1000
100
no. of em ployees
The gain is slightly lower than in query 21, although the additional selection
over the indexed surname column is introduced. This behaviour is caused by the limited
number of join operations evaluated by the virtual repository due to this selection.
Query 23: Retrieve surname and name of location where the employees with id equal 1
works
SBQL:
SQL:
((Employee where id = 1) as e join e.worksIn.Department as d
join d.isLocatedIn.Location as l).(e.surname, l.name);
locations, departments where (((employees.id = 1) AND
(departments.id = employees.department_id)) AND (locations.id =
departments.location_id))
ref. avg. time
5000
opt. avg. time
gain
120,00
4500
100,00
4000
80,00
3000
2500
60,00
gain
time [ms]
3500
2000
40,00
1500
1000
20,00
500
0
10
100
no. of em ployees
0,00
1000
Similarly to query 22, the gain value does not increase, although the selection
with the unique index is introduced. The explanation is the same. Please notice that
evaluation times for the unoptimised query is much shorter than previously.
Page 171 of 235
Chapter 7
Query 24: Retrieve the number of employees
SBQL:
SQL:
count(Employee);
select COUNT(*) from employees
ref. avg. time
4500
opt. avg. time
gain
4000
140,00
120,00
3500
100,00
2500
80,00
2000
60,00
gain
time [ms]
3000
1500
40,00
1000
20,00
500
0
10
0,00
1000
100
no. of em ployees
Query 24 uses the aggregate function that can be completely evaluated by the
relational resource. No data is actually retrieved and materialised except for the function
result (just a number).
Query 25: Retrieve the number of employees named Kowalski
SBQL:
SQL:
count(Employee where surname = "Kowalski");
select COUNT(*) from employees where (employees.surname =
'Kowalski')
ref. avg. time
5000
opt. avg. time
gain
4500
160,00
140,00
4000
120,00
100,00
3000
2500
80,00
2000
gain
time [ms]
3500
60,00
1500
40,00
1000
20,00
500
0
10
100
no. of em ployees
0,00
1000
The gain is slightly increased comparing to query 24 as the additional selection
over the indexed surname column is used as the aggregate function argument.
Page 172 of 235
Chapter 7
Query 26: Retrieve the average salary of employees named Kowalski
SBQL:
SQL:
avg((Employee where surname = "Kowalski").salary);
select avg(employees.salary) from employees where
(employees.surname = 'Kowalski')
ref. avg. time
5000
opt. avg. time
gain
160,00
4500
140,00
4000
120,00
100,00
3000
2500
80,00
2000
gain
time [ms]
3500
60,00
1500
40,00
1000
20,00
500
0
10
0,00
1000
100
no. of em ployees
The gain is very similar as in query 25 – the selection conditions are the same,
although another aggregate function is used.
Query 27: Retrieve the sum of salaries of employees earning less than 2000
SBQL:
SQL:
sum((Employee where salary < 2000).salary);
select sum(employees.salary) from employees where
(employees.salary < 2000)
ref. avg. time
6000
opt. avg. time
gain
140,00
120,00
5000
100,00
80,00
gain
time [ms]
4000
3000
60,00
2000
40,00
1000
20,00
0
10
100
no. of em ployees
0,00
1000
Query 28: Retrieve employees working in the production department
SBQL:
SQL:
Employee where worksIn.Department.name = "Production";
employees, departments where ((departments.name = 'Production')
AND (departments.id = employees.department_id))
Page 173 of 235
Chapter 7
ref. avg. time
50000
opt. avg. time
gain
30,00
45000
25,00
40000
20,00
30000
25000
15,00
gain
time [ms]
35000
20000
10,00
15000
10000
5,00
5000
0
10
0,00
1000
100
no. of em ployees
The join is evaluated with the additional selection condition. Therefore the gain
is increased comparing to the similar query 16, where only primary-foreign key
relationship was used.
Query 29: Retrieve surnames and birth dates of employees in the production department
SBQL:
SQL:
(Employee where worksIn.Department.name =
"Production").(surname, birthDate);
select employees.surname, employees.birth_date from employees,
departments where ((departments.name = 'Production') AND
(departments.id = employees.department_id))
ref. avg. time
50000
opt. avg. time
gain
140,00
45000
120,00
40000
100,00
30000
80,00
gain
time [ms]
35000
25000
60,00
20000
15000
40,00
10000
20,00
5000
0
10
100
no. of em ployees
0,00
1000
The gain improves comparing to query 28 as the projection is introduced.
Query 30: Retrieve surname and birth date of the employee with id equal 1 working in
the production department
SBQL:
SQL:
(Employee where id = 1 and worksIn.Department.name =
"Production").(surname, birthDate);
departments where ((employees.id = 1) AND ((departments.name =
'Production') AND (departments.id = employees.department_id)))
Page 174 of 235
Chapter 7
ref. avg. time
6000
opt. avg. time
gain
180,00
160,00
5000
140,00
120,00
100,00
3000
80,00
2000
gain
time [ms]
4000
60,00
40,00
1000
20,00
0
10
0,00
1000
100
no. of em ployees
Again, the gain is increased as the selection with the uniquely indexed column is
applied besides the one existing in query 29. Please notice that evaluation times for the
unoptimised query is much shorter than previously.
Query 31: Retrieve surnames and birth dates of employees named Kowalski working in
the production department
SBQL:
SQL:
(Employee where surname = "Kowalski" and
worksIn.Department.name = "Production").(surname, birthDate);
departments where ((employees.surname = 'Kowalski') AND
((departments.name = 'Production') AND (departments.id =
employees.department_id)))
ref. avg. time
6000
opt. avg. time
gain
140,00
120,00
5000
100,00
80,00
gain
time [ms]
4000
3000
60,00
2000
40,00
1000
20,00
0
10
100
no. of em ployees
0,00
1000
The selection condition uses the non-uniquely indexed surname column. Hence
the gain is slightly lower than for query 30.
Page 175 of 235
Chapter 7
Query 32: Retrieve employees whose department is located in Łódź
SBQL:
SQL:
Employee where worksIn.Department.isLocatedIn.Location.name =
"Łódź";
employees, locations, departments where ((locations.name =
'Łódź') AND ((departments.id = employees.department_id) AND
(locations.id = departments.location_id)))
ref. avg. time
100000
opt. avg. time
gain
90000
400,00
350,00
80000
300,00
250,00
60000
50000
200,00
40000
gain
time [ms]
70000
150,00
30000
100,00
20000
50,00
10000
0
10
0,00
1000
100
no. of employees
The high gain for query 32 is caused by the complete evaluation of the triple join
with the additional selection by the relational database. Further, the projection is used,
as only employees records are retrieved and materialised.
Query 33: Retrieve surnames and birth dates of employees whose department is located
in Łódź
ref. avg. time
100000
opt. avg. time
gain
1200,00
90000
1000,00
80000
70000
800,00
60000
50000
600,00
40000
400,00
30000
20000
200,00
10000
0
10
100
no. of em ployees
Page 176 of 235
0,00
1000
gain
SQL:
(Employee where worksIn.Department.isLocatedIn.Location.name =
locations, departments where ((locations.name = 'Łódź') AND
((departments.id = employees.department_id) AND (locations.id =
departments.location_id)))
time [ms]
SBQL:
Chapter 7
The gain increases comparing to query 32, since the explicit projection is
introduced (undesired employees columns are not retrieved).
Query 34: Retrieve surname and birth date of the employee with id equal 1 whose
department is located in Łódź
SBQL:
SQL:
(Employee where id = 1 and
worksIn.Department.isLocatedIn.Location.name =
locations, departments where ((employees.id = 1) AND
((locations.name = 'Łódź') AND ((departments.id =
departments.location_id))))
ref. avg. time
5000
opt. avg. time
gain
120,00
4500
100,00
4000
80,00
3000
2500
60,00
gain
time [ms]
3500
2000
40,00
1500
1000
20,00
500
0
10
100
no. of em ployees
0,00
1000
Here, the gain is much lower than in previous queries – the selection with the
unique index improves much the relational query evaluation time, however it limits also
the number of joins performed by the virtual repository. Evaluation times the
unoptimised query is again improved.
Query 35: Retrieve surnames and birth dates of named Kowalski whose department is
located in Łódź
SBQL:
SQL:
(Employee where surname = "Kowalski" and
worksIn.Department.isLocatedIn.Location.name =
locations, departments where ((employees.surname = 'Kowalski')
AND ((locations.name = 'Łódź') AND ((departments.id =
Page 177 of 235
Chapter 7
time [ms]
opt. avg. time
gain
160,00
7000
140,00
6000
120,00
5000
100,00
4000
80,00
3000
60,00
2000
40,00
1000
20,00
0
10
gain
ref. avg. time
8000
0,00
1000
100
no. of em ployees
Query 36: Retrieve the number of employees working in the production department
SBQL:
SQL:
count(Employee where worksIn.Department.name = "Production");
select COUNT(*) from employees, departments where
ref. avg. time
50000
opt. avg. time
gain
1200,00
45000
1000,00
40000
800,00
30000
25000
600,00
gain
time [ms]
35000
20000
400,00
15000
10000
200,00
5000
0
10
100
no. of em ployees
0,00
1000
The aggregate function is evaluated over the double join directly in the relational
resource without any data materialisation – only the function result (a number) is
retrieved.
Query 37: Retrieve the number of employees named Kowalski working in the
production department
SBQL:
SQL:
count(Employee where surname = "Kowalski" and
worksIn.Department.name = "Production");
select COUNT(*) from employees, departments where
((employees.surname = 'Kowalski') AND ((departments.name =
'Production') AND (departments.id = employees.department_id)))
Page 178 of 235
Chapter 7
ref. avg. time
6000
opt. avg. time
gain
200,00
180,00
5000
160,00
140,00
120,00
3000
100,00
gain
time [ms]
4000
80,00
2000
60,00
40,00
1000
20,00
0
10
0,00
1000
100
no. of employees
The gain lower than in case of query 36 is caused by limitation of number of
navigations (from virtual pointers) evaluated by the virtual repository (the additional
selection condition). The evaluation times substantially decrease for the unoptimised
query comparing to the previous case.
Query 38: Retrieve the sum of salaries of employees in the production department
SBQL:
SQL:
sum((Employee where worksIn.Department.name =
"Production").salary);
select sum(employees.salary) from employees, departments where
ref. avg. time
50000
opt. avg. time
gain
1400,00
45000
1200,00
40000
1000,00
30000
800,00
gain
time [ms]
35000
25000
600,00
20000
15000
400,00
10000
200,00
5000
0
10
100
no. of em ployees
0,00
1000
The gain arises from the aggregate function over the join evaluated completely
by the relational database.
Page 179 of 235
Chapter 7
Query 39: Retrieve the number of employees working in Łódź
SBQL:
SQL:
count(Employee where
worksIn.Department.isLocatedIn.Location.name = "Łódź");
select COUNT(*) from employees, locations, departments where
ref. avg. time
100000
opt. avg. time
gain
2500,00
90000
80000
2000,00
60000
1500,00
gain
time [ms]
70000
50000
40000
1000,00
30000
20000
500,00
10000
0
10
0,00
1000
100
no. of em ployees
The gain increases comparing to query 38 due to the triple join evaluation
avoided in the virtual repository for the aggregate function.
Query 40: Retrieve the sum of salaries of employees working in Łódź
SBQL:
SQL:
sum((Employee where
worksIn.Department.isLocatedIn.Location.name = "Łódź").salary);
select sum(employees.salary) from employees, locations,
departments where ((locations.name = 'Łódź') AND
((departments.id = employees.department_id) AND (locations.id =
ref. avg. time
100000
opt. avg. time
gain
2500,00
90000
80000
2000,00
60000
1500,00
gain
time [ms]
70000
50000
40000
1000,00
30000
20000
500,00
10000
0
10
100
no. of em ployees
0,00
1000
The query and the gain are similar as in the previous example, although another
aggregate function is used.
Page 180 of 235
Chapter 7
Query 41: Retrieve the sum of salaries of employees named Kowalski working in Łódź
SQL:
sum((Employee where surname = "Kowalski" and
worksIn.Department.isLocatedIn.Location.name = "Łódź").salary);
select sum(employees.salary) from employees, locations,
departments where ((employees.surname = 'Kowalski') AND
ref. avg. time
time [ms]
8000
opt. avg. time
gain
160,00
7000
140,00
6000
120,00
5000
100,00
4000
80,00
3000
60,00
2000
40,00
1000
20,00
0
10
100
no. of em ployees
gain
SBQL:
0,00
1000
The additional selection limits the number of joins to be performed by the virtual
repository; therefore the gain is lower than in query 42. Evaluation times
the unoptimised query substantially decrease.
7.3 Application of SBQL optimisers
As presented in subchapters 6.2.4 and 6.2.5, in the current virtual repository
implementation some wrapper-oriented queries are not optimally rewritten, but they can
be optimised with SBQL methods. Due to the very large memory consumption
necessary for evaluation of unoptimised queries, the tests were performed only on 10,
50 and 100 records of employees and corresponding cars, so that disc memory
swapping could be avoided.
The plots included compare average raw and SBQL-optimised query evaluation
times and they present the average evaluation time ratio (the optimisation gain). For
simplification, the wrapper execsql expressions substituted for each wrapper-related
name and retrieving unconditionally all records are not shown. Unfortunately, the mere
number of data points does not allow the gain curve shape to be informative, which is
also affected by low numbers of records not reflecting the assumed data distribution.
Page 181 of 235
Chapter 7
Query 1: Retrieve cars owned by employees named Nowak
Raw SBQL: (Car as c where c.isOwnedBy in (Employee as e where
SBQLoptimised:
e.surname = "Nowak").(e.id)).c;
(((deref((((Employee) as e where (deref((e . surname)) =
"Nowak")) . (e . id)))) groupas $aux0 . ((Car) as c where
(deref((c . isOwnedBy)) in $aux0))) . c)
ref. avg. time
70000
opt. avg. time
gain
80,00
70,00
60000
60,00
50000
40,00
gain
time [ms]
50,00
40000
30000
30,00
20000
20,00
10000
10,00
0
0
0,00
100
50
no. of em ployees
Fig. 120 Evaluation times and optimisation gain for query 1 (SBQL optimisation)
Query 2: Retrieve a string composed of a make name, a model name and a car
production year for employees whose surname is Nowak
Raw SBQL: (((Car as c where c.isOwnedBy in (Employee as e where
SBQLoptimised:
e.surname = "Nowak").(e.id)) join c.isModel.Model as m)
join m.isMake.Make as mm).(mm.name + " " + m.name + " " +
c.year);
(((((deref((((Employee) as e where (deref((e . surname)) =
"Nowak")) . (e . id)))) groupas $aux0 . ((Car) as c where
(deref((c . isOwnedBy)) in $aux0))) join (((c . isModel) .
Model)) as m) join (((m . isMake) . Make)) as mm) .
((((deref((mm . name)) + " ") + deref((m . name))) + " ") +
(string)(deref((c . year)))))
ref. avg. time
70000
opt. avg. time
gain
40,00
35,00
60000
30,00
50000
20,00
gain
time [ms]
25,00
40000
30000
15,00
20000
10,00
10000
5,00
0
0
50
no. of em ployees
0,00
100
Fig. 121 Evaluation times and optimisation gain for query 2 (SBQL optimisation)
The lower gain (comparing to the query 1) is caused by the remaining join not
rewritten according to the independent subqueries’ algorithm and has to be evaluated in
an unoptimised form.
Page 182 of 235
Chapter 8
Summary and Conclusions
The objectives assumed have been accomplished and the theses stated in the presented
Ph.D. dissertation have been proved true:
1. Legacy relational databases can be transparently integrated to an object-oriented
virtual repository and their data can be processed and updated with an objectoriented query language indistinguishably from purely object-oriented data without
materialisation or replication.
The designed and implemented relational schema import procedure allows generic,
automated and completely transparent integration of any number of legacy relational
databases into the virtual repository structures. Relational schemata are presented,
accessible and processed indistinguishably from real object-oriented data. An imported
relational schema is enveloped with updateable object-oriented views defined in SBQL
used for the virtual repository management and maintenance and available as the toplevel end-user interface language. Therefore, a wrapped relational database is processed
transparently as any other virtual repository resource. The actual distinction from
object-oriented data occurs at the wrapper level, below the covering views, i.e. no
intermediate virtual repository stage is “aware” of the actual data source.
The wrapper, responsible for interfacing between the virtual repository and the
relational database, performs analysing SBQL queries for relational names and
involving them (sub)queries. Such possibly largest (sub)queries are transformed into
special SBQL expressions redirecting SQL query strings to the wrapper on SBQL query
evaluation. The wrapper sends SQL queries to the wrapped database and returns SBQLresults to the query processing engine, so that the results are further processed as other
object-oriented data (or returned directly to the end user). The wrapper query rewriter
Page 183 of 235
Chapter 8
allows also imperative SBQL constructs and relational databases can be updated with
the object-oriented query language according to the SBQL semantics.
2. Appropriate optimisation mechanisms can be developed and implemented for such
a system in order to enable coaction of the object-oriented virtual repository
optimisation together with native relational resource optimisers.
At the query rewriting stage, the wrapper aims to find SBQL query patterns
corresponding to SQL-optimiseable queries that can be efficiently evaluated by the
underlying relational database, so that minimum processing is required at the SBQL
side. Currently implemented patterns correspond to aggregate functions, joins, where
selections and projections, however all optimisation-related resource information is
available to the wrapper. Therefore one could provide some cost model allowing still
more efficient decision-making solutions. Besides transforming SBQL (sub)queries into
their SQL counterparts, efficient SBQL optimisation is available during query
processing. This optimisation includes necessary procedures, like view rewriting (query
modification methods) required for the wrapper optimiser, but also the ones much
improving SBQL query evaluation, e.g. based on independent subqueries. Hence, after
SQL-oriented optimisation, the whole SBQL query can be still optimised so that the
overall performance is increased.
8.1 Prototype Limitations and Further Works
The current wrapper implementation proves the theses and demonstrates the developed
optimisation methods. The prototype ensures the complete functionality of the virtual
repository based on both relational resources and object-oriented ones (including other
integrated external data sources, e.g. Web Services, XML, etc.). However, some
improvements can be introduced.
The minor one concerns a completely automated primary wrapper view
generation process – the current one creates only basic objects corresponding to the
relational columns and tables with primary on_retrieve, on_update and on_delete
procedures. The improved view generator should generate also virtual pointers
(on_navigate procedures) corresponding to relational primary-foreign key constraints.
This feature is partially implemented, but it does not allow multi-column keys and
multiple foreign keys for a single table, and it was skipped in the examples provided.
For the most general case, a relational database schema with table relations must be
Page 184 of 235
Chapter 8
expressed with additional object-oriented views by a system administrator (the
contributory and integration views), which approach has been used in the examples
presented in the thesis.
The current considerations and implementation do not allow creating relational
data. The SQL insert statements are irregular and the do not obey the general language
syntax. Nevertheless extending the wrapper views with on_create procedures and
recognising the corresponding expressions seems possible, although rather challenging.
Yet another improvement, substantial to the virtual repository operation, and not
directly related to the wrapper itself, is processing distributed queries and decomposing
them to appropriate resources. This complex and sophisticated feature is still under
development for the virtual repository and it will be tested with the presented wrapper
in the feature. In the thesis, simple wrapper distribution functionalities were discussed
with examples in subchapter 6.2.4 Multi-Wrapper and Mixed Queries. Distributed query
decomposition and processing features are out of the scope of the thesis, however.
8.2 Additional Wrapper Functionalities
The wrapper designed and implemented for the thesis is focussed on relational
databases. During eGov-Bus works, however, it has been extended with additional
functionalities allowing integration of other types of resources, not necessarily available
with JDBC and SQL. The flexible and modular wrapper prototype architecture allowed
implementing wrapper modes dedicated to SD-SQL databases and SWARD
repositories.
SD-SQL36 databases [204] operate on database stored procedures responsible for
transparent distribution of data between separate database instances together with
automated managing and querying these instances. Queries targeting SD-SQL resources
are not ordinary SQL strings, but they are the appropriate stored procedure calls – the
corresponding extension has been introduced to the wrapper query generator. Query
strings and results are processed by JDBC, as in case of regular relational databases.
SWARD (mentioned in subchapter 2.2.4.1).being a part of the Amos II project
(subchapter 2.2.1.3) required extending the wrapper with an extra interface enabling
communication with this datasource, instead of the standard JDBC. Further, RDQL
36
Scalable Distributed SQL
Page 185 of 235
Chapter 8
generator was implemented, instead of the standard SQL. Therefore, the presented
wrapper can be also used for accessing other RDF-resources (with minor modifications,
probably), provided they can be queried with RDQL (this language is currently regarded
obsolete). The existing RDQL query generator can be also extended into a SPARQL
query generator without much effort.
For either SD-SQL or SWARD, the wrapper action is the same as in case of
regular relational databases, the only difference is in the resource language specific
query strings (preferably optimised) generated and sent to the resource. Result retrieval
and reconstruction procedures are generic and they did not require modifications.
Page 186 of 235
Appendix A
The eGov-Bus Project
The thesis has been accomplished under the eGov-Bus project [205], which is an
acronym for the Advanced eGovernment Information Service Bus project supported by
the European Community under "Information Society Technologies" priority of the
Sixth Framework Programme (contract number: FP6-IST-4-026727-STP). The project
is a 24-month international research aiming at designing foundations of a system
providing citizens and businesses with improved access to virtual public services, which
are based on existing national eGovernment services and which support cross-border
“life events”. The project participants are:
•
Rodan Systems SA (Poland),
•
Centre de Recherche en Informatique Appliquée – Universite Paris Dauphine
(France),
•
Europäisches EMIC Innovations Center GmbH (Germany),
•
Department of Information Technology, Uppsala University (Norway),
•
Polish-Japanese Institute for Information Technology (Poland),
•
Axway Software (France),
•
Zentrum für Sichere Informationstechnologie (Austria),
•
Ministry of Interior and Administration (MSWiA) (Poland).
The overall eGov-Bus project objective is to research, design and develop
technology innovations which will create and support a software environment that
provides user-friendly, advanced interfaces to support “life events” of citizens and
businesses – administration interactions involving many different government
organisations within the European Union. The “life-events” model organises services
and allows users to access services in a user-friendly and seamless manner, by hiding
the functional fragmentation and the organisational complexity of the public sector.
Page 187 of 235
Appendix A
This approach transforms governmental portals into virtual agencies, which cluster
functions related to the customer’s everyday life, regardless of the responsible agency or
branch of government. Such virtual agencies offer single points of entry to multiple
governmental agencies (European, national, regional and local) and provide citizens and
businesses with the opportunity to interact easily and seamlessly with several public
agencies. “Life-events” lead to a series of transactions between users (citizens and
enterprises) and various public sector organisations, often crossing traditional
department boundaries. There are substantial information needs and service needs for
the user that can span a range of organisations and be quite complicated. An example of
a straightforward life event “moving house” within only one country such as Poland
may require complex interaction with a number of Government information systems.
The detailed objectives are:
1. Create adaptable process management technologies by enabling virtual services to
be combined dynamically from the available set of eGovernment functions.
2. Improve effective usage of advanced web service technologies by eGovernment
functions by means of service-level agreements, an audit trail, semantic
representations, better availability and performance.
3. Exploit and integrate current and ongoing research results in the area of natural
language processing to provide user-friendly, customisable interfaces to the eGovBus.
4. Organise currently available web services according to the specific life-event
requirements, creating a comprehensive workflow process that provides clear
instructions for end users and allows them to personalise services as required.
5. Research a secure, non-repudable audit trail for combined Web services by
promoting qualified electronic signature technology.
6. Support a virtual repository of data sources required by life-event processes,
including meta-data, declarative rules, and procedural knowledge about governing
life-events categories.
7. Provide these capabilities based on a highly available, distributed and secure
architecture that makes use of existing systems.
Generally citizens and businesses will profit from more accessible public
services. The following concrete benefits will be achieved:
•
Improved public services for citizens and businesses,
Page 188 of 235
Appendix A
•
Easier access to cross-border services and therefore a closer European Union,
•
Improved quality of life and quality of communication,
•
Reduced red tape and thus an increase in productivity.
To accomplish these challenging objectives, eGov-Bus researches advances in
business process and Web service technologies. Virtual repositories provide data
abstraction, and a security service framework ensures adequate levels of data protection
and information security. Multi-channel interfaces allow citizens easy access using their
preferred interface.
The eGov-Bus architecture is to comprise three distinct classes of software
components, namely the newly developed features resulting from the project research
and development effort, the modified and extended pre-existing software components
either proprietary software components licensed to the project by the project partners or
open software, and the pre-existing information system features. The latter category
pertains to the eGovernment information systems to be rendered inter-operable with the
use of the eGov-Bus prototype as well as to the middleware software components such
as workflow management engines.
Page 189 of 235
Appendix B
The ODRA Platform
ODRA (Object Database for Rapid Application development) [206] is a prototype
object-oriented application development environment currently being constructed at the
Polish-Japanese Institute of Information Technology under the eGov-Bus project. Its
aim is to design a next generation development tool for future database application
programmers. The tool is based on SBQL. The SBQL execution environment consists
of a virtual machine, a main memory DBMS and an infrastructure supporting
distributed computing.
The main goal of the ODRA project is to develop new paradigms of database
application development. This goal can be reached by increasing the level of abstraction
at which a programmer works with application of a new, universal, declarative
programming language, together with its distributed, database-oriented and objectoriented execution environment. Such an approach provides a functionality common to
the variety of popular technologies (such as relational/object databases, several types of
middleware,
general
purpose
programming
languages
and
their
execution
environments) in a single universal, easy to learn, interoperable and effective to use
application programming environment. The principle ideas implemented in order to
achieve this goal are the following:
1. Object-oriented design. Despite the principal role of object-oriented ideas in
software modelling and in programming languages, these ideas have not succeeded
yet in the field of databases. ODRA approach is different from current ways of
perceiving object databases, represented mostly by the ODMG standard [207] and
database-related Java technologies (e.g., [208, 209]). The system is built upon the
SBA methodology ([210, 211]). This allows to introduce for database programming
all the popular object-oriented mechanisms (like objects, classes, inheritance,
Page 190 of 235
Appendix B
The ODRA Platform
polymorphism, encapsulation), as well as some mechanisms previously unknown
(like dynamic object roles [212, 213] or interfaces based on database views [214,
215]).
2. Powerful query language extended to a programming language. The most
important feature of ODRA is SBQL, an object-oriented query and programming
language. SBQL differs from programming languages and from well-known query
languages, because it is a query language with the full computational power of
programming languages. SBQL alone makes possible to create fully fledged
database-oriented applications. A chance to use the same very-high-level language
for most database application development tasks may greatly improve
programmers’ efficiency, as well as software stability, performance and
maintenance potential.
3. Virtual repository as a middleware. In a networked environment it is possible to
connect several hosts running ODRA. All systems tied in this manner can share
resources in a heterogeneous and dynamically changing, but reliable and secure
environment. This approach to distributed computing is based on object-oriented
virtual updatable database views [216]. Views are used as wrappers (or mediators)
on top of local servers, as a data integration facility for global applications, and as
customisers that adopt global resources to needs of particular client applications.
This technology can be perceived as contribution to distributed databases,
Enterprise Application Integration (EAI), Grid Computing and Peer-To-Peer
networks.
The distributed nature of contemporary information systems requires highly
specialised
software
facilitating
communication
and
interoperability
between
applications in a networked environment. Such software is usually referred to as
middleware and is used for application integration. ODRA supports informationoriented and service-oriented application integration. The integration can be achieved
through several techniques known from research on distributed/federated databases. The
key feature of ODRA-based middleware is the concept of transparency. Due to this
transparency many complex technical details of the distributed data/service environment
need not to be taken into account in an application code. ODRA supports the following
transparency forms:
•
Transparency of updating made from the side of a global client,
Page 191 of 235
Appendix B
The ODRA Platform
•
Transparency of distribution and heterogeneity,
•
Transparency of data fragmentation,
•
Transparency of data/service redundancies and replications,
•
Transparency of indexing,
•
etc.
These forms of transparency have not been solved to a satisfactory degree by
current technologies. For example, Web Services support only transparency of location
and transparency of implementation. Transparency is achieved in ODRA through the
concept of a virtual repository (Fig. 1). The repository seamlessly integrates distributed
resources and provides a global view on the whole system, allowing one to utilise
distributed software resources (e.g., databases, services, applications) and hardware
(processor speed, disk space, network, etc.). It is responsible for the global
administration and security infrastructure, global transaction processing, communication
mechanisms, ontology and metadata management. The repository also facilitates data
access by several redundant data structures (global indexes, global caches, replicas), and
protects data against random system failures.
A user of the repository sees data exposed by the systems integrated by means of
the virtual repository through a global integration view. The main role of the integration
view is to hide complexities of mechanisms involved in access to local data sources.
The view implements a CRUD behaviour which can be augmented with logic
responsible for dealing with horizontal and vertical fragmentation, replication, network
failures, etc. Thanks to the declarative nature of SBQL, these complex mechanisms can
often be expressed in one line of code. The repository has a highly decentralised
architecture. In order to get access to the integration view, clients do not send queries to
any centralised location in the network. Instead, every client possesses its own copy of
the global view, which is automatically downloaded from the integration server after
successful authentication to the repository. A query executed on the integration view is
to be optimised using such techniques as rewriting, pipelining, global indexing and
global caching.
Local sites are fully autonomous, which means it is not necessary to change
them in order to make their content visible to the global user of the repository. Their
content is visible to global clients through a set of contributory views which must
conform to the integration view (be a subset of it). Non-ODRA data sources are
Page 192 of 235
Appendix B
The ODRA Platform
available to global clients through a set of wrappers, which map data stored in them to
the canonical object model assumed for ODRA. There are wrappers developed for
several popular databases, languages and middleware technologies.
Despite of their diversity, they can all be made available to global users of the
repository. A global user may not only query local data sources, but also update their
content using SBQL. Instead of exposing raw data, the repository designer may decide
to expose only procedures. Calls to such procedures can be executed synchronously and
asynchronously. Together with SBQL’s support for semistructured data, this feature
enables a document-oriented interaction, which is characteristic to current technologies
supporting Service Oriented Architecture (SOA).
B.1 ODRA Optimisation Framework37
In terms of ODRA, an optimiser means any mechanism transforming a query into
a semantically equivalent form, not only directly for a better performance but also to
enable other optimisers to work. For example, some optimisers may require specific
transformations (e.g., macro-substituting view or procedure definitions) to be executed
first. These operations do not improve performance themselves, but subsequently
applied optimisations do (e.g. based on independent subquery methods). Using
a common name for all these mechanisms is used because all of them work in a similar
way and they are served by the same framework.
ODRA optimisation framework allows defining an optimisation sequence, i.e.
a collection of subsequent optimisers influencing query evaluation (there is also
a possibility to turn any optimisation off) for arbitrary performance tuning. Such
a sequence is a session variable and does not affect other sessions. The supported and
implemented optimisers are (a code name for a sequence definition given in italics):
•
None (none) – no optimisation performed,
•
Independent subquery methods (independent),
•
Dead subquery removal (dead),
•
Union-distributive (union) – parallel execution of some distributive queries,
•
Wrapper rewriting (wrapperrewrite) – simple relational wrapper query rewriting
(the naive approach),
37 SBQL optimisation techniques has been described in details in subchapter 4.4
Page 193 of 235
Appendix B
The ODRA Platform
•
Wrapper
optimisation
(wrapperoptimize)
–
relational
wrapper
query
optimisation,
•
Procedure rewriting (rewrite) – macro-substituting procedure calls for query
modification,
•
View rewriting (viewrewrite) – macro-substituting view calls for query
modification,
•
Indices (index) – substituting some subqueries with appropriate index calls.
The default optimisation sequence is empty (no optimisation is performed),
which is correct for all queries (but not efficient, of course). Therefore the optimisers to
be used should be put in a current optimisation sequence, e.g., for a relational wrapper
queries the minimum optimisation sequence is view rewriting and wrapper rewriting38.
Please notice, that a sequence order is also important, as the simple wrapper
rewriting would not succeed if views used in a query are not macro-substituted with
their definitions before. This sequence can be further expanded with some other
optimisers, e.g., removing dead subqueries and using independent subquery methods.
Similarly, for an optimised execution of a relational wrapper query the minimum
sequence is view rewriting and wrapper optimisation. This sequence can be also
extended with some other optimisers.
Preferably, a default optimisation sequence should contain the most common
optimisers (or just the ones required for a proper execution of an arbitrary query). The
other promising solution is to set the most appropriate sequence dynamically basing on
a particular query semantics. However, these functionalities are out of the scope of the
thesis and they may be realised in a further virtual repository development.
38
Actually there is no need to use the simple wrapper rewriter preceded with the view rewriter. In the
current ODRA implementation unoptimised (naive) wrapper calls are compiled in views’ binary code
and any wrapper-related query is evaluated correctly (but with no optimisation) without any
optimisation in the current sequence defined.
Page 194 of 235
Appendix C
The Prototype Implementation
The following subchapters are focused on various implementation issues and solutions
developed and implemented for the wrapper prototype.
C.1 Architecture
The wrapper is implemented in client-server architecture schematically presented in Fig.
122, together with the fundamental virtual repository components.
Virtual repository
Object-oriented views
resource
ODRA
resource
resource
Wrapper client
Wrapper server
JDBC client
RDBMS
Fig. 122 Wrapper architecture
The client is embedded in the ODRA (described shortly in Appendix B) instance,
while the multithreaded server is an independent module located between the ODRA
and the wrapped relational database. The internal wrapper communication is realised
with the dedicated protocol (described below) with TCP/IP sockets, while
communication with the resource relies on the embedded JDBC client.
Page 195 of 235
Appendix C
C.1.1 Communication protocol
The client-server communication (based on a simple internal text protocol) is
established on a server listener port. In Listing 7 there is shown a sample
communication procedure (single-thread, only one client connected) seen at the server
side, while Listing 8 presents the same procedure at the client side. The example
presents a simple procedure of retrieving the metabase from the wrapper server (the
strings are URL-encoded for preserving non-ASCII characters, the example presents
decoded values).
Listing 7 Client-server communication example (server side)
SBQL wrapper server thread #889677296
#[email protected]
Java Service Wrapper
<http://wrapper.tanukisoftware.org>
listening...
connected from /127.0.0.1
<- hello
-> hello
-> SBQL wrapper server thread
-> SBQL wrapper server is running under
-> Big thanks to Tanuki Software
->
<->
->
<->
<->
<->
<->
<<<->
<->
request.identify
identity: admin
welcome [email protected]
ready
send.metabase
data.ready: XSD prepared
get.transfer.port
transfer.port: 3131
get.data.length
data.length: 3794
send.data
sending.data
transfer finished in 16 ms
transfer rate 231 kB/s
data.received
want.another
bye
bye
Listing 8 Client-server communication example (client side)
Connecting the server...
Connecting 'localhost' on port 2000 (1 of 10)...
Connection established...
admin -> hello
admin <- hello
admin <- SBQL wrapper server thread #[email protected]
admin <- SBQL wrapper server is running under Java Service Wrapper
admin <- Big thanks to Tanuki Software <http://wrapper.tanukisoftware.org>
admin <- request.identify
admin -> identity: admin
admin <- welcome [email protected]
admin <- ready
admin -> send.metabase
admin <- data.ready: XSD prepared
admin -> get.transfer.port
admin <- transfer.port: 3131
admin -> get.data.length
admin <- data.length: 3794
Page 196 of 235
Appendix C
admin
admin
admin
admin
admin
admin
admin
admin
->
<->
->
->
<->
<-
send.data
sending.data
transfer finished in 16 ms
transfer rate 231 kB/s
data.received
want.another
bye
bye
All the communication protocol commands and messages for the server and the
client with their meanings are shown in the tables below. Please notice that there are
still some miscellaneous messages not interpreted neither by the client nor the server –
they contain just some useful information than can be used for debugging.
Besides the communication protocol, binary data transfer is opened on another
port whenever it is needed. In the sample communication procedure presented in the
listings above, the port is dynamically assigned by the client’s get.transfer.port
command for sending a serialised XSD document.
Table 3 Wrapper protocol server commands and messages
Command/message
bye
close
data.length
data.ready
error
Parameter(s)
Data length in bytes
Variant information on
what data is ready
Error code according to
the WrapperException
caught
hello
ready
Start the communication session
The server is ready for client
requests
Rejects a client request or identity,
currently unused
Asks the client for its name (for
client authentication), currently
only a database user name is
retrieved
Asks the client for establishing its
mode, currently supported modes
are: SQL (1), SD-SQL (2) and
SWARD/RDF (3)
Data sending started
Returns to the client a port number
for binary data transfer
Asks the client if another request
will be sent in this communication
session
reject
request.identity
request.mode
sending.data
transfer.port
Meaning
Finish the communication session
Close the session immediately,
currently unused
Length of data to be sent
Data is prepared and ready for
retrieval
Error indication
Port number
want.another
Page 197 of 235
Appendix C
Table 4 Wrapper protocol client commands and messages
Command/message
hello
bye
close
identity
Parameter(s)
Client identity string
get.transfer.port
query
Query string
send.data
get.data.length
data.received
send.database
send.metabase
mode
Mode number
Meaning
Start the communication session
Finish the communication session
Close the session immediately,
currently unused
Send the client identity string or
name used for authentications,
currently only a database user
name is sent
Asks the server for a port number
for receiving binary data
Sends a query string for evaluation
in the wrapper resource
Tells the server to start
transferring binary data
Asks the server for binary data
length in bytes
Tells the server binary data is
received and the socket can be
released
Requests a database model
Requests a metabase XSD
document
Sends the client mode, currently
supported modes are: SQL (1), SDSQL (2) and SWARD/RDF (3)
C.2 Relational Schema Wrapping
The very first step performed only once (unless the wrapped resource schema changes)
is the generation of a XML document containing description of a wrapped resource.
The schema description is required by the wrapper, however its details are skipped now
(the generation procedure and the document structure are described in the following
parts of the appendix).
A virtual repository wrapper is instantiated when a wrapper module is created.
A wrapper module is a regular database module, but it asserts that all the names within
the module are “relational” (i.e. imported from a relational schema) except for
automatically generated views referring to these names (this procedure is described in
the following paragraphs).
A wrapper instance contains a wrapper client capable of communication with the
appropriate server. All the wrapper instances are stored in a global (static) session
object and are available by a wrapper module name (it is a simple way to check whether
Page 198 of 235
Appendix C
a module is wrapper module – just test if a current session contains a wrapper for
a module name). Thus, once a wrapper is created, it is available to any session
(including the ones initialised in the future) as its module is.
Each wrapper instance is provided with its local transient auto-expandable data
store so that intermediate query results do not affect the actual ODRA data store and
they do not interfere with other objects. This means that a wrapper creates its own
transient store for each session. The store is destroyed (its memory is released) when
session closes, similarly all stores for the particular wrapper are destroyed when this
wrapper module is deleted from a database.
The CLI39 command for creating a wrapper module is:
add module <modulename> as wrapper on <host>:<port>
where <modulename> is a name of a module to add, <host> is a wrapper server host
(IP or name), and <port> is its listener port. The command is parsed and sent to the
ODRA server, where a module is created as a submodule of the current one and a new
wrapper instantiated. Once it is ready, it is stored in a current session wrapper pool with
a global name of the created module.
A wrapper instantiation consists of the following steps:
1. Initialise a local client,
2. Test communication with the server,
3. Retrieve and store locally a programmatic model,
4. Retrieve XSD and create a metabase,
5. Create primary views enveloping “relational” metaobjects.
Whenever a server process needs a wrapper (e.g., for query rewriting or
execution), it retrieves it from a current session with a global name of a module for
which a request is served. A wrapper client is used whenever a communication with the
server is required, e.g., when a “relational” query is executed.
C.2.1 Example
A relational schema used below is already described in subchapter 6.2.1 Relational Test
Schemata, for clearness of the example it is again presented in Fig. 123.
39
Command Line Interface, the ODRA basic text console
Page 199 of 235
Appendix C
employees
id
departments
(PK)
id
(PK)
name
name
surname
location_id
locations
id
(PK)
name
(FK)
sex
salary
info
birth_date
department_id (FK)
Fig. 123 Wrapped legacy relational schema
This wrapper client retrieves the schema description from its server and creates
appropriate metadata in the ODRA metabase (one-to-one mapping applied, each table is
represented as a single complex object). The corresponding object-oriented schema is
presented in the Fig. 124.
$employees
$id
$name
$surname
$sex
$salary
$info
$birth_date
$department_id
$departments
$id
$name
$location_id
$locations
$id
$name
Fig. 124 Lowest-level object-oriented wrapper schema
This schema import procedure employs the native ODRA XSD/XML importer. The
drawback of this solution is that the importer creates complex objects (for possibility of
storing XML annotations) and actual values are pushed down to subobjects named
_VALUE (they appear in examples of queries with views macro-substituted included in
the thesis). Therefore some primitive encapsulation is required at this stage to prevent
users from this strictly implementation-dependent feature. Names used in this schema
(regarded as relational ones in query analysis and optimisation procedures, subchapter
6.2 Query Analysis and Optimisation Examples) are simply names of relational tables
and columns prefixed with “$”. This ensures that they are not available in ad-hoc
queries as such names are not valid identifiers recognised by the ODRA SBQL parser.
Thus, the metaobjects with $-prefixed names are covered by automatically
generated views (Fig. 125) referring to the original relational names of wrapped tables
and columns. This is the final automatically generated stage for of the wrapped
relational schema. It can be already queried or covered by a set of views (subchapters
Page 200 of 235
Appendix C
6.2.1 Relational Test Schemata and 6.3 Sample Use Cases) so that it can contribute to
the global schema of the virtual repository.
employees
id
name
surname
sex
salary
info
birth_date
department_id
department
id
name
location_id
locations
id
name
Fig. 125 Primary wrapper views
As stated in the thesis conclusions, the wrapper could be extended with the fully
automated view generator, so that virtual pointers are created without any administrator
interference.
C.2.2 Relational Schema Models
The prototype uses different relational schema models depending on its module (level in
the architecture). Their structures depend on a particular application.
C.2.2.1 Internal Wrapper Server Model
A relational schema is stored at the server side as an XML file. When the server is
started, it loads a schema model according to a database name specified as a default one
in a properties file or given as a start-up parameter (described below). This model
description is a base for all the other models.
A schema description file is an XML document based on a DTD listed below (a
modified and extended version of the original Torque 3.2 DTD [217]). The DTD
instance is available at http://jacenty.kis.p.lodz.pl/relational-schema.dtd for validation
purposes.
Listing 9 Contents of relational-schema.dtd
<!ELEMENT database (table+)>
<!ATTLIST database
name CDATA #REQUIRED
>
<!ELEMENT table (column+, best-row-id?, foreign-key*, index*)>
<!ATTLIST table
>
<!ELEMENT column EMPTY >
<!ATTLIST column
nullable (true | false) "false"
type (BIT | TINYINT | SMALLINT | INTEGER | BIGINT | FLOAT | REAL | NUMERIC |
DECIMAL | CHAR | VARCHAR | LONGVARCHAR | DATE | TIME | TIMESTAMP | BINARY |
Page 201 of 235
Appendix C
VARBINARY | LONGVARBINARY | NULL | OTHER | JAVA_OBJECT | DISTINCT | STRUCT |
ARRAY | BLOB | CLOB | REF | BOOLEANINT | BOOLEANCHAR | DOUBLE) #IMPLIED
size CDATA #IMPLIED
scale CDATA #IMPLIED
default CDATA #IMPLIED
description CDATA #IMPLIED
>
<!ELEMENT best-row-id (best-row-id-column+)>
<!ELEMENT best-row-id-column EMPTY>
<!ATTLIST best-row-id-column
>
<!ELEMENT foreign-key (reference+)>
<!ATTLIST foreign-key
foreign-table CDATA #REQUIRED
name CDATA #IMPLIED
>
<!ELEMENT reference EMPTY>
<!ATTLIST reference
local CDATA #REQUIRED
foreign CDATA #REQUIRED
>
<!ELEMENT index (index-column+)>
<!ATTLIST index
unique (true | false) #REQUIRED
type (1 | 2 | 3 | 4) #IMPLIED
pages CDATA #IMPLIED
cardinality CDATA #IMPLIED
filter-condition CDATA #IMPLIED
>
<!ELEMENT index-column EMPTY>
<!ATTLIST index-column
>
The schema generation is actually based on Torque 3.2 [218]. Prior to Torque
3.3 there was no information gathered or processed on a relational schema indices (in
the
current
RC1
this
feature
is
still
missing),
therefore
the
specialised
SchemaGenerator class (package odra.wrapper.generator) was introduced as
an extension to the standard TorqueJDBCTransformTask.
Application of a Torque-based generator assures access to the most popular
RDBMS via standard JDBC drivers: Axion, Cloudscape, DB2, DB2/AS400, Derby,
Firebird, Hypersonic, Informix, InstantDB, Interbase, MS Access, MS SQL, MySQL,
Oracle, Postgres, SapDB, Sybase, Weblogic.
An appropriate driver class should be available in a classpath prior to a schema
generation. Currently there are only three drivers provided in the project, for
PostgreSQL 8.x [219] (postgresql-8.1-405.jdbc3.jar), Firebird 2.x [220] (jaybird-full2.1.1.jar) and MS SQL Server 2005 [221] (jtds-1.2.jar), as these RDBMSs were used
for tests. A sample schema description (corresponding to the relational schema shown in
Fig. 123) is provided in Listing 10.
Listing 10 Sample schema description
Page 202 of 235
Appendix C
<?xml version="1.0"?>
<!DOCTYPE database SYSTEM "http://jacenty.kis.p.lodz.pl/relationalschema.dtd">


<!--author: Jacek Wislicki, [email protected]>
<database name="wrapper">
<table name="departments">
<column name="id" nullable="false" type="INTEGER"/>
<column name="name" nullable="false" size="64" type="VARCHAR"/>
<column name="location_id" nullable="false" type="INTEGER"/>
<best-row-id>
<best-row-id-column name="id"/>
</best-row-id>
<foreign-key foreign-table="locations">
<reference foreign="id" local="location_id"/>
</foreign-key>
<index cardinality="2" name="departments_pkey" pages="8"
type="3" unique="true">
<index-column name="id"/>
</index>
</table>
<table name="employees">
<column name="surname" nullable="false" size="64" type="VARCHAR"/>
<column name="sex" nullable="false" size="1" type="CHAR"/>
<column name="salary" nullable="false" scale="65531"
size="65535" type="NUMERIC"/>
<column name="info" nullable="true" size="10240" type="VARCHAR"/>
<column name="birth_date" nullable="false" type="DATE"/>
<column name="department_id" nullable="false" type="INTEGER"/>
<best-row-id>
</best-row-id>
<foreign-key foreign-table="departments">
<reference foreign="id" local="department_id"/>
</foreign-key>
<index cardinality="12" name="employee_sex_ix" pages="10"
type="3" unique="false">
<index-column name="sex"/>
</index>
<index cardinality="10" name="employee_name_ix" pages="10"
<index-column name="surname"/>
</index>
<index cardinality="11" name="employee_salary_ix" pages="10"
<index-column name="salary"/>
</index>
<index cardinality="7" name="employees_pkey" pages="10" type="3"
unique="true">
</index>
</table>
<table name="locations">
<best-row-id>
</best-row-id>
<index cardinality="2" name="locations_pkey" pages="7" type="3"
unique="true">
</index>
<index cardinality="2" name="locations_name_key" pages="7"
type="3" unique="true">
<index-column name="name"/>
Page 203 of 235
Appendix C
</index>
</table>
<table name="pg_logdir_ls">
<column name="filetime" nullable="true" type="TIMESTAMP"/>
<column name="filename" nullable="true" type="VARCHAR"/>
</table>
</database>
A schema description XML file can be also created or edited manually if
necessary, e.g., if a RDBMS does not offer all the required information via JDBC or
only selected tables/views are to be exposed to the wrapper. In the example in Listing
10 an unnecessary element to be removed manually is the pg_logdir_ls table – the
system object automatically read by the JDBC connection.
C.2.2.2 Programmatic Model
A programmatic model is build according to a XML schema description by classes in
the odra.wrapper.model package. The model structure is similar to the one used by
Torque, but it is written from scratch in order to realise correctly all relational database
structures including indices and primary-foreign key dependencies. This model offers
quick access to the structure of a relational database and all its features.
The programmatic model is send to the client on its request, i.e. on a wrapper
initialisation. It is used at the client side for SBQL query analysis and rewriting.
C.2.2.3 Metabase
A metabase is a regular ODRA metabase created basing on an XSD import reflecting
the relational schema. The schema file is generated on the fly and sent to a client by the
server on its request, i.e. on a wrapper initialisation (just after programmatic model
retrieval). A metabase creation is based on a programmatic model stored at the server
side. A sample XSD schema is shown in Listing 11.
Listing 11 Sample XSD for metabase
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:sql="http://jacenty.kis.p.lodz.pl"
targetNamespace="http://jacenty.kis.p.lodz.pl" elementFormDefault="qualified">
<xsd:element name="wrapper">
<xsd:complexType>
<xsd:all minOccurs="0">
<xsd:element name="locations" minOccurs="0">
<xsd:complexType>
<xsd:element name="name" type="xsd:string" minOccurs="0"
maxOccurs="1"/>
<xsd:element name="id" type="xsd:integer" minOccurs="0"
maxOccurs="1"/>
Page 204 of 235
Appendix C
</xsd:all>
</xsd:complexType>
</xsd:element>
<xsd:element name="pg_logdir_ls" minOccurs="0">
<xsd:complexType>
<xsd:element name="filename" type="xsd:string" minOccurs="0"
maxOccurs="1"/>
<xsd:element name="filetime" type="xsd:date" minOccurs="0"
maxOccurs="1"/>
</xsd:all>
</xsd:complexType>
</xsd:element>
<xsd:element name="employees" minOccurs="0">
<xsd:complexType>
maxOccurs="1"/>
<xsd:element name="birth_date" type="xsd:date" minOccurs="0"
maxOccurs="1"/>
<xsd:element name="salary" type="xsd:double" minOccurs="0"
maxOccurs="1"/>
<xsd:element name="sex" type="xsd:string" minOccurs="0"
maxOccurs="1"/>
<xsd:element name="surname" type="xsd:string" minOccurs="0"
maxOccurs="1"/>
<xsd:element name="department_id" type="xsd:integer"
minOccurs="0" maxOccurs="1"/>
<xsd:element name="info" type="xsd:string" minOccurs="0"
maxOccurs="1"/>
maxOccurs="1"/>
</xsd:all>
</xsd:complexType>
</xsd:element>
<xsd:element name="departments" minOccurs="0">
<xsd:complexType>
maxOccurs="1"/>
<xsd:element name="location_id" type="xsd:integer"
minOccurs="0" maxOccurs="1"/>
maxOccurs="1"/>
</xsd:all>
</xsd:complexType>
</xsd:element>
</xsd:all>
</xsd:complexType>
</xsd:element>
</xsd:schema>
During a creation of metabase from XSD a native ODRA mechanism is used
with a slight modification – a root element is omitted (it is required for a well-formed
XML document, but does not reflect an actual relational schema).
C.2.2.4 Type Mapping
During wrapping relational resources there is the strong necessity to perform reliable
type mapping operations between relational and object-oriented systems. In the
prototype implementation, there is also an intermediate stage corresponding to passing
Page 205 of 235
Appendix C
schema description via XSD documents. All these levels (relational, XSD, objectoriented) use their specific primitive data types that are managed by the wrapper.
The type mapping procedures are implemented programmatically and stored in
Java classes (possibly maps’ definitions could be stored in files, but this is not the
issue). The table shown below contains corresponding SQL, XSD and ODRA data
types. The default type applied for an undefined relational data type (due to enormous
heterogeneity between various RDBMSs there might be some types not covered by the
prototype definitions, still) is string. The string type is also assumed for relational data
types currently not implemented in ODRA (including binary data types like BLOB).
Table 5 Type mapping between SQL, XSD and SBQL
SQL
varchar
varchar2
char
text
memo
clob
integer
int
int2
int4
int8
serial
smallint
bigint
byte
serial
number
float
real
numeric
decimal
bool
boolean
bit
date
timestamp
XSD
SBQL
string
string
integer
integer
double
real
boolean
boolean
date
date
C.2.3 Result Retrieval and Reconstruction
A wrapper server transforms a SQL query result into XML string (compliant with an
imported module metabase) and sends it serialised form to a client. A deserialised
object received by a client is returned to a requesting wrapper instance. The only
Page 206 of 235
Appendix C
exception to this rule (no XML used) is applied when a result is a primitive value
(integer, double or boolean) from an aggregate function or some imperative statement
(a result is a number of rows affected in a relational database, subchapters 6.1.2 and
6.1.3).
Regardless of a relational result structure (retrieved columns’ order within
a result row), before forming an XML document, it is analysed and grouped into tuple
sub-results, where column results are grouped by their tables. Therefore ODRA results
can be easily created with structures corresponding to the metabase. The sample XML
document with the query result (cut to two result tuple) is presented in Listing 12.
The corresponding query is:
(Employee as e join e.worksIn.Department as d join d.isLocatedIn.Location as
l).(e.surname, e.name, d.name, l.name);
Listing 12 Sample result XML document
<?xml version="1.0" encoding="UTF-8"?>
<sql:wrapper xmlns:sql="http://jacenty.kis.p.lodz.pl">
<sql:tuple>
<sql:employees>
<sql:surname>Pietrzak</sql:surname>
<sql:name>Piotr</sql:name>
</sql:employees>
<sql:departments>
<sql:name>Security</sql:name>
</sql:departments>
<sql:locations>
<sql:name>Warszawa</sql:name>
</sql:locations>
</sql:tuple>
<sql:tuple>
<sql:employees>
<sql:surname>Wojciechowska</sql:surname>
<sql:name>Agnieszka</sql:name>
</sql:employees>
<sql:departments>
<sql:name>Retail</sql:name>
</sql:departments>
<sql:locations>
<sql:name>Warszawa</sql:name>
</sql:locations>
</sql:tuple>
</sql:wrapper>
A result pattern is responsible for carrying type-checker signature information to
the runtime environment (subchapter 6.1.5). In the implementation the pattern is
a regular string (provided as a second parameter of execsql expression) that can be
parsed
easily
and
transformed
into
a
corresponding
class
instance.
The
ResultPattern class provides methods for creating SBQL results from objects
created from XML documents. A sample result pattern string (resulting from the
signature of the query used above) is presented below in Listing 13.
Page 207 of 235
Appendix C
Listing 13 Sample result pattern string
<0 | | | none | struct
<1 $employees | $surname | _surname | none | binder 1>
<1 $employees | $name | _name | none | binder 1>
<1 $departments | $name | _name | none | binder 1>
<1 $locations | $name | _name | none | binder 1>
0>
A result pattern description is stored between <i and i> markups, where i stands
of a pattern nesting level. A top pattern level is 0, the first nested patterns are given
level 1, etc. The mechanism allows any result pattern nesting level, however flat
relational results use only a two-level structure (some more complex cases can occur for
named results – binders).
The example pattern shown above (Listing 13) is a simple structure consisting of
four fields being binders corresponding to view seeds’ names (prefixed with “_”,
Listing 2) build over the bottom-level relational names (prefixed with “$”). However,
a pattern can be arbitrarily complex. The first two strings in the result pattern denote the
table name and the column name, where applicable (the external structure in the
example does require these values). The pattern description can contain the following
additional information:
•
An alias name for a result (binder),
•
A dereference mode (none stands for no dereference, other modes are string,
boolean, integer, real and date),
•
A result type (possible values are ref, struct, binder and value).
Basing on a result pattern, SBQL result is reconstructed from an XML document
containing a SQL results.
As stated above, the wrapper uses a local auto-expandable store for temporary
object creation; therefore the references returned by queries are volatile (valid only
within the current session). Moreover, the results are deleted before a next query
execution. Object keeping, locking and deleting should be controlled by some
transaction mechanism, unfortunately none is currently available.
For keeping reference to actual relational data, the wrapper can expand each
SQL query with best row identifiers (i.e. primary key or unique indices) and foreign key
column values, regardless of an actual query semantics (of course, a final result does not
contain these data). Because this information is available in the materialised result, a
result of a query can be further processed by SBQL program. Currently
this functionality is turned off, butt can be freely (de)activated by toggling
Page 208 of 235
Appendix C
WRAPPER_EXPAND_WITH_IDS
and
WRAPPER_EXPAND_WITH_REFS
flags
in
odra.wrapper.net.Server.java.
C.3 Installation and Launching
The following subsections contain the description of the wrapper configuration and
launching procedures. Prior to these activities, the Java Virtual Machine 1.6.x (standard
edition) or newer must be installed in the system – it can be downloaded from
http://java.sun.com/javase/downloads/index.jsp.
C.3.1 CD Contents
The CD included contains the ODRA project snapshot (revision 2157 with stripped
local SVN information) and the preconfigured MS Windows version of Java Service
Wrapper (described in subchapter Service Launch). The current version of ODRA is
available via SVN from svn://odra.pjwstk.edu.pl:2401/egovbus. The file structure of the
CD is as follows:
•
odra.zip – the ODRA project source code and resources (running described in
subsection C.4 Prototype Testing),
•
jsw.win.zip – the preconfigured Java Service Wrapper (described in subsection
C.3.6.2 Service Launch),
•
ph.d.thesis.jacek_wislicki.pdf – this thesis text in PDF format.
The ODRA project contained in odra.zip needs to be unzipped to the local file
system. The project can be edited under Eclipse 3.2 (verified); other Java IDEs might
require some modifications. The organisation of the project file structure is as follows
(internal Eclipse project configuration entries skipped):
EGB
|_build
|_conf
|_dist
|_lib
|_res
|_src
|_tools
|_xml2xml
|_build.xml
– compiled classes
– runtime configuration files
– ODRA precompiled distribution
– project libraries
– resources, including sample batch files
– Java source code
– parser libraries used on build
– XML2XML mapper distribution (irrelevant)
– Ant build file
Page 209 of 235
Appendix C
The project can be run directly from the provided distribution files (as shown in
subsection C.4 Prototype Testing), it can be build from source with Apache Ant [222]
basing on the build.xml configuration.
C.3.2 Test Schemata Generation
Any relational schema can be used with the prototype; still there is a possibility to
create the test schemata (described subchapter 6.2.1) for the prototype testing. The
resources are available in res/wrapper of the ODRA project included. First, a database
schema should be created manually according to schema.sql. The SQL used in the file is
rather universal, but there might be some problems on some RDBMS (e.g.,
unrecognised data types). Unfortunately, the author was not able to provide an
appropriate automated application (e.g., Torque still does not serve indices).
C.3.3 Connection Configuration
The connection configuration file is connection.properties (a standard Torque
configuration file) whose sample can be found the project /conf directory (Listing 14).
Listing 14 Sample contents of connection properties
torque.database.default = postgres_employees
#configuration for the postgres database (employees)
torque.database.postgres_employees.adapter = postgresql
torque.dsfactory.postgres_employees.factory =
org.apache.torque.dsfactory.SharedPoolDataSourceFactory
torque.dsfactory.postgres_employees.connection.driver = org.postgresql.Driver
torque.dsfactory.postgres_employees.connection.url =
jdbc:postgresql://localhost:5432/wrapper
torque.dsfactory.postgres_employees.connection.user = wrapper
torque.dsfactory.postgres_employees.connection.password = wrapper
#configuration for the firebird database (employees)
torque.database.firebird_employees.adapter = firebird
torque.dsfactory.firebird_employees.factory =
torque.dsfactory.firebird_employees.connection.driver =
org.firebirdsql.jdbc.FBDriver
torque.dsfactory.firebird_employees.connection.url =
jdbc:firebirdsql:localhost/3050:c:/tmp/wrapper.gdb
torque.dsfactory.firebird_employees.connection.user = wrapper
torque.dsfactory.firebird_employees.connection.password = wrapper
#configuration for the postgres database (cars)
torque.database.postgres_cars.adapter = postgresql
torque.dsfactory.postgres_cars.factory =
torque.dsfactory.postgres_cars.connection.driver = org.postgresql.Driver
torque.dsfactory.postgres_cars.connection.url =
jdbc:postgresql://localhost:5432/wrapper2
torque.dsfactory.postgres_cars.connection.user = wrapper
torque.dsfactory.postgres_cars.connection.password = wrapper
Page 210 of 235
Appendix C
#configuration for the ms sql database (SD-SQL)
torque.database.sdsql.adapter = mssql
torque.dsfactory.sdsql.factory =
torque.dsfactory.sdsql.connection.driver = net.sourceforge.jtds.jdbc.Driver
torque.dsfactory.sdsql.connection.url =
jdbc:jtds:sqlserver://212.191.89.51:1433/SkyServer
torque.dsfactory.sdsql.connection.user = sa
torque.dsfactory.sdsql.connection.password =
The
sample
file
contains
four
data
sources
predefined
(named
postgres_employees, firebird_employees, postgres_cars and sdsql) for different
RDBMSs and schemata. The same configuration file can be used for different wrapped
databases. However, a separate server must be started for each resource.
A torque.database.default property defines a default database if none is specified as an
input of an application (e.g., the wrapper server). The other properties mean (the xxx
word should be substituted with a unique data source name that is further used for
pointing at the resource):
•
torque.database.xxx.adapter – a JDBC adapter/driver name,
•
torque.dsfactory.xxx.factory – a data source factory class,
•
torque.dsfactory.xxx.connection.driver – a JDBC driver class,
•
torque.dsfactory.xxx.connection.url – a JDBC resource-dependent connection
URL,
•
torque.dsfactory.xxx.connection.user – a database user name,
•
torque.dsfactory.xxx.connection.password – a database user password.
The correct configuration entered to connection.properties is required for
the next wrapper launching steps.
C.3.4 Test Data Population
Once the relational schemata are created and their configuration entered to the
connection.properties file, they can be populated with sample data (subchapter 7.1
Relational Test Data). In order to load the data into the database run
odra.wrapper.misc.testschema.Inserter.
This
application
inserts
data
according to the schema integrity constraints and the assumed distributions.
The inserter takes two parameters: mode – the target schema (“employees” or
“cars” values allowed), and number_of_employees – a number of employee records (or
corresponding cars). The sample startup command can be:
java odra.wrapper.misc.testschema.Inserter employees 100
Page 211 of 235
Appendix C
A sample output from the inserter is shown in Listing 15 (previous data are deleted, if
any present):
Listing 15 Sample inserter output
Wrapper test data population started...
connected
100 records deleted from employees
8 records deleted from departments
7 records deleted from locations
7 locations created
8 departments created
100 employees created
Wrapper test data population finished in 1594 ms...
The inserter application connects automatically to the resource defined in
connection.properties as the default one (torque.database.default value).
C.3.5 Schema Description Generation
This step is necessary for the wrapper action as it provides its server with the
description of the wrapped relational database (details in subchapter C.2.2.1 Internal
Wrapper Server Model).
Once the configuration.properties contains the record defined for a wrapped
relational
schema,
the
schema
generator
process
odra.wrapper.generator.SchemaGeneratorApp.
can
be
launched
by
The application can run
without parameters (a configuration.properties file is searched in the application /conf
directory) and a default database name (torque.database.default) is used. One can also
specify an optional parameter for a configuration file path. If it is specified, also
a database name can be provided as the second parameter. The sample startup command
can be:
java odra.wrapper.generator.SchemaGeneratorApp
conf/connection.properties postgres_employees
The schema generator application standard output is as below (Listing 16):
Listing 16 Sample schema generator output
Schema generation started...
Schema generation finished in 5875 ms...
As a result a schema description file is created in the application’s /conf
directory.
The
file
name
is
created
according
to
a
pattern:
<dbname>-schema.generated.xml, where <dbname> is a database name specified as an
application startup parameter or a default one in the properties file (e.g.
postgres_employees-schema.generated.xml).
Page 212 of 235
Appendix C
C.3.6 Server
The server (odra.wrapper.net.Server) is a multithreaded application (a separate
parallel thread is invoked for each client request). It can be launched as a standalone
application or as a system service.
C.3.6.1 Standalone Launch
The standalone launch should not be used in a production environment. In order to start
the server a system service, read the instructions in the next section.
If the server is launched without startup parameters, it searches for the
connection.properties file and schema description XML documents in the application
/conf directory and uses a default database name declared in this file. Other default
values
are
a
port
number
to
listen
on
(specified
as
2000
with
wrapper.net.Server.WRAPPER_SERVER_PORT) and a verbose mode (specified as
true with wrapper.net.Server.WRAPPER_SERVER_VERBOSE). If one needs to
override these values, use syntax as in the sample below:
odra.wrapper.net.Server -Ddbname -Vfalse –P2001 -C/path/to/config/
All the parameters are optional and their order is arbitrary:
•
-D prefixes a database name (to override a default one in a properties file),
•
-V toggles a verbose mode (true/false),
•
-P specifies a listener port,
•
-C specifies a path to server configuration files.
A path denoted with a -C parameter must be a valid directory where all the
configuration files are stored, including connection.properties and schema description
XML document(s).
The server output at a successful startup is shown in Listing 17 below:
Listing 17 Wrapper server startup output
Database model successfully build from schema in
'F:/eclipse.projects/EGB/conf/postgres_employees-schema.generated.xml'
SBQL wrapper listener started in JDBC mode on port 2000...
SBQL wrapper listener is running under Java Service Wrapper
Big thanks to Tanuki Software <http://wrapper.tanukisoftware.org>
C.3.6.2 Service Launch
Running the server as a system service is realised with the Java Service Wrapper (JSW)
[223]. The JSW can be downloaded as binaries or a source code. It can be run on
Page 213 of 235
Appendix C
different platforms (e.g., MS Windows, Linux, Solaris, MacOS X) and the appropriate
version must be installed in a system (binary download should be enough).
The following instructions refer to MS Windows environment (they are the same
on Linux). Detailed descriptions and examples of installation and configuration
procedures on various platforms are available at the JSW web site.
The main JSW configuration is defined in $JSW_HOME/conf/wrapper.conf
($JSW_HOME denotes a home directory of the JSW installation). The file example is
listed below in Listing 18.
Listing 18 Sample contents of wrapper.conf
#********************************************************************
# TestWrapper Properties
#
# NOTE - Please use src/conf/wrapper.conf.in as a template for your
#
own application rather than the values used for the
#
TestWrapper sample.
#********************************************************************
# Java Application
wrapper.java.command=java
# Java Main class. This class must implement the WrapperListener interface
# or guarantee that the WrapperManager class is initialized. Helper
# classes are provided to do this for you. See the Integration section
# of the documentation for details.
wrapper.java.mainclass=org.tanukisoftware.wrapper.WrapperSimpleApp
# Java Classpath (include wrapper.jar) Add class path elements as
# needed starting from 1
wrapper.java.classpath.1=../lib/wrapper.jar
wrapper.java.classpath.2=F:/eclipse.projects/EGB/dist/lib/odra-wrapper-1.0dev.jar
wrapper.java.classpath.3=F:/eclipse.projects/EGB/dist/lib/odra-commons-1.0dev.jar
wrapper.java.classpath.4=F:/eclipse.projects/EGB/lib/postgresql-8.1405.jdbc3.jar
wrapper.java.classpath.5=F:/eclipse.projects/EGB/lib/jaybird-full-2.1.1.jar
wrapper.java.classpath.6=F:/eclipse.projects/EGB/lib/jtds-1.2.jar
wrapper.java.classpath.7=F:/eclipse.projects/EGB/lib/jdom.jar
wrapper.java.classpath.8=F:/eclipse.projects/EGB/lib/zql.jar
wrapper.java.classpath.9=F:/eclipse.projects/EGB/lib/commons-configuration1.1.jar
wrapper.java.classpath.10=F:/eclipse.projects/EGB/lib/commons-collections3.1.jar
wrapper.java.classpath.11=F:/eclipse.projects/EGB/lib/commons-lang-2.1.jar
wrapper.java.classpath.12=F:/eclipse.projects/EGB/lib/commons-logging1.0.4.jar
# Java Library Path (location of Wrapper.DLL or libwrapper.so)
wrapper.java.library.path.1=../lib
# Java Additional Parameters
wrapper.java.additional.1=-ea
# Initial Java Heap Size (in MB)
#wrapper.java.initmemory=3
# Maximum Java Heap Size (in MB)
wrapper.java.maxmemory=512
Page 214 of 235
Appendix C
# Application parameters. Add parameters as needed starting from 1
wrapper.app.parameter.1=odra.wrapper.net.Server
wrapper.app.parameter.2=-C"F:/eclipse.projects/EGB/conf/"
wrapper.app.parameter.2.stripquotes=TRUE
#wrapper.app.parameter.3=-Dpostgres_employees
#wrapper.app.parameter.4=-P2000
#wrapper.app.parameter.5=-Vtrue
#********************************************************************
# Wrapper Logging Properties
#********************************************************************
# Format of output for the console. (See docs for formats)
wrapper.console.format=PM
# Log Level for console output.
wrapper.console.loglevel=INFO
(See docs for log levels)
# Log file to use for wrapper output logging.
wrapper.logfile=../logs/wrapper.log
# Format of output for the log file.
wrapper.logfile.format=LPTM
# Log Level for log file output.
wrapper.logfile.loglevel=INFO
(See docs for formats)
# Maximum size that the log file will be allowed to grow to before
# the log is rolled. Size is specified in bytes. The default value
# of 0, disables log rolling. May abbreviate with the 'k' (kb) or
# 'm' (mb) suffix. For example: 10m = 10 megabytes.
wrapper.logfile.maxsize=1m
# Maximum number of rolled log files which will be allowed before old
# files are deleted. The default value of 0 implies no limit.
wrapper.logfile.maxfiles=10
# Log Level for sys/event log output.
wrapper.syslog.loglevel=NONE
#********************************************************************
# Wrapper Windows Properties
#********************************************************************
# Title to use when running as a console
wrapper.console.title=ODRA wrapper server
#********************************************************************
# Wrapper Windows NT/2000/XP Service Properties
#********************************************************************
# WARNING - Do not modify any of these properties when an application
# using this configuration file has been installed as a service.
# Please uninstall the service before modifying this section. The
# service can then be reinstalled.
# Name of the service
wrapper.ntservice.name=ODRAwrapper 1
# Display name of the service
wrapper.ntservice.displayname=ODRA wrapper server 1
# Description of the service
wrapper.ntservice.description=ODRA relational database wrapper server 1
# Service dependencies. Add dependencies as needed starting from 1
wrapper.ntservice.dependency.1=
# Mode in which the service is installed.
wrapper.ntservice.starttype=AUTO_START
AUTO_START or DEMAND_START
Page 215 of 235
Appendix C
# Allow the service to interact with the desktop.
wrapper.ntservice.interactive=false
The most important properties in wrapper.conf are:
•
wrapper.java.command – which JVM use (depending on a system configuration
one might need to specify a full path to the java program),
•
wrapper.java.mainclass – an JSW integration method (with the value specified
in the above listing it does not require a JSW implementation, do not modify this
one),
•
wrapper.java.classpath.N – Java classpath elements (do not modify the first
classpath element, as it denotes a JSW JAR location, the other elements refer to
libraries used by the ODRA wrapper server, including JDBC drivers),
•
wrapper.java.additional.N – JVM startup parameters (in the example only -ea
used for enabling assertions),
•
wrapper.java.maxmemory – JVM heap size, probably it would require more than
the default 64 MB for real-life databases,
•
wrapper.app.parameter.1 – ODRA wrapper server main class (do not modify
this one),
•
wrapper.app.parameter.2 – a path to ODRA wrapper server configuration files
directory (i.e. connection.properties and <dbname>-schema.generated.xml)
passed as a server startup parameter,
•
wrapper.app.parameter.2.stripquotes – important when a parameter name
contains extra quotes,
•
wrapper.app.parameter.3 – database name passed as a server startup parameter,
•
wrapper.app.parameter.4 – server listener port passed as a server startup
parameter,
•
wrapper.app.parameter.5 – server verbose mode passed as a server startup
parameter,
•
wrapper.logfile.maxsize – a maximum size of a single log file before it is split,
•
wrapper.logfile.maxfiles – a maximum number of log files until the old ones are
deleted.
Notice that wrapper.app.parameter.[2...5] conform server application startup
parameters described above. They are optional and their order is arbitrary. Other
configuration properties' descriptions are available at the JSW web site.
Page 216 of 235
Appendix C
In order to test a configuration one can run $JSW_HOME/bin/test.bat. The JSW
is launched as a standalone application and runs the ODRA wrapper server (any
misconfiguration can be easily detected). If a test succeeds, a JSW is ready to install as
a system service. A service is installed with install.bat and deinstalled with
uninstall.bat. A sample preconfigured JSW installation for MS Windows can be
downloaded from http://jacenty.kis.p.lodz.pl/jsw.win.zip – only some paths need to be
adjusted. Also a sample JSW installation for MS Windows is stored in the CD included.
C.3.7 Client
The client (odra.wrapper.net.Client) cannot be launched directly – its instance is
a component of odra.wrapper.Wrapper and is used in the background. If one needs
to use a verbose client (with a console output), set the wrapper.verbose property to true
in /conf/odra-server.properties. Then the wrapper client output is displayed in the
standard ODRA CLI console.
C.4 Prototype Testing
The ODRA server and the client (CLI) can be started up by /dist/easystart.bat(sh). The
default database is created, the server started and the CLI console opened and ready for
input. If the wrapper server is not running yet, it can be started as an application with
/dist/wrapper-server.bat(sh) (the default parameters assume the postgres_employees
schema and listening on port 2000), provided the schema description file is available in
/dist/conf.
A wrapper module is added with the following command:
add module <modulename> as wrapper on <host>:<port>
where the module name has to be specified and the wrapper server listener socket
specified. A sample command can be:
add module test as wrapper on localhost:2000
The module is added (provided the wrapper server is available on the socket) and the
wrapper instantiated (procedures concerning the schema import and the metabase
creation are executed in background).
Now switch to the wrapper module by executing cm test. The test module
contains only primary wrapper views available for querying. The module contents can
be retrieved with ls.
Page 217 of 235
Appendix C
A sample CLI session is presented in Listing 19. The list of available CLI
commands is available with the help command.
Listing 19 Sample CLI session
Welcome to ODRA (J2)!
admin> add module test as wrapper on localhost:2000
admin> cm test
admin.test> compile .
admin.test> ls
D
$employees
D
$departments
D
$locations
V
employeesDef
VP
employees
V
locationsDef
VP
locations
V
departmentsDef
VP
departments
admin.test> deref((employees where salary = 500).(surname, name)) as emp;
<?xml version="1.0" encoding="windows-1250"?>
<RESULT>
<emp>Pawłowski
Stanisław</emp>
<emp>Kaczmarek
Anna</emp>
<emp>Zając
Zofia</emp>
</RESULT>
admin.test>
C.4.1 Optimisation Testing
For rewriting/optimisation testing use:
explain optimization viewrewrite | wrapperrewrite : <query>;
or
explain optimization viewrewrite | wrapperoptimize : <query>;
respectively. Other optimisation types can be switched off, only viewrewrite is
important if a query is based on wrapper views.
For evaluation test, it is necessary to set a current optimisation sequence for
a session. Sample syntax is shown below:
set optimization none | viewrewrite | wrapperoptimize
The none option here is necessary to reset (clear) a previous sequence (by default it is
empty as ODRA optimisers are still under constructions). In order to check the current
optimisation sequence, use:
show optimization
Another thing to configure is the test mode. There are three modes available:
•
off – no tests are performed (default),
Page 218 of 235
Appendix C
•
plain – a query is optimised with a current optimisation sequence and
optimisation results (times measured) are prepended to its actual result,
•
compare – no query actual result is retrieved, only a comparison of unoptimised
and optimised executions (“unoptimised” means that simple rewriting is applied
as otherwise a query wouldn't be evaluated via a wrapper),
•
comparesimple – the same as compare, but dereferenced results are compared.
A full comparison would reveal errors for wrapper queries, as their results consists of
different references – each query creates its own set of temporary objects.
A test mode is set as an ordinary CLI variable, e.g.:
set test off
Similarly, it can be displayed with:
show test
Benchmarking optimisation results is available with the following syntax:
benchmark <n> <query>;
where <n> stands for a number of repeats. The results are written to CSV files in
the current CLI directory.
C.4.2 Sample batch files
There are sample batch files provided for wrapper testing in the project's
/res/wrapper/batch directory. The files require two wrapper servers running for
the employees and cars test schemata (subchapter 6.2.1) on ports 2000 and 2001 (the
port numbers can be changed).
•
init.cli – creates two wrapper module named wrapper1 and wrapper2, (for each
schema), creates a new module test, imports the wrapper modules and creates
views presented in Listing 2,
•
explain-rewrite.cli – shows rewritten sample queries,
•
explain-optimize.cli – shows optimised sample queries,
•
execute-rewrite.cli – executes sample queries after simple rewriting,
•
execute-optimize.cli – executes sample queries optimisation,
•
compare.cli – compares results of execution of rewritten and optimised sample
queries.
•
benchmark10times.cli – performs 10-repeat benchmark of the sample queries,
Page 219 of 235
Appendix C
•
benchmark10times.independent.cli – performs 10-repeat benchmark of the
sample multi-wrapper queries with SBQL optimisers,
•
mixed.cli – executes test mixed queries,
•
update.cli – executes test updates.
Additionally, views presented in subchapter 6.3 Sample Use Cases are available
from the batch files in /res/wrapper/batch/demo (they require views created from
init.cli).
The syntax for running batch files is as follows:
batch <path/to/batch>
The related commands are cd and pwd, corresponding to the same commands of
operating systems. The batch and cd commands interpret both absolute and relative
paths.
Page 220 of 235
Index of Figures
Fig. 1 Mediation system architecture.............................................................................. 25
Fig. 2 Distributed mediation in Amos II [26] ................................................................. 29
Fig. 3 eGov-Bus virtual repository architecture [205].................................................... 43
Fig. 4 Query flow through a RDBMS [124]................................................................... 48
Fig. 5 Abstract relational query optimiser architecture [124]......................................... 49
Fig. 6 Possible syntax trees for a sample query for selections and projections .............. 52
Fig. 7 Possible syntax trees for a sample query for cross products ................................ 53
Fig. 8 Possible syntax trees for a sample query for tree shapes...................................... 55
Fig. 9 Sample SBQL query syntax tree for example 1 ................................................... 68
Fig. 10 Sample SBQL query syntax tree for example 2 ................................................. 69
Fig. 16 Architecture of query processing in SBQL [179]............................................... 78
Fig. 17 Virtual repository general architecture ............................................................... 86
Fig. 18 Schema integration in the virtual repository ...................................................... 87
Fig. 19 Query processing schema ................................................................................... 88
Fig. 20 Relational schema for the conceptual example .................................................. 91
Fig. 21 Object-oriented view-based schema for the conceptual example ...................... 92
Fig. 22 Conceptual example Input query syntax tree ..................................................... 94
Fig. 23 Conceptual example query syntax tree with dereferences ................................. 95
Fig. 24 Conceptual example query syntax tree after removing auxiliary names............ 96
Fig. 25 Conceptual example query syntax tree after SBQL optimisation ...................... 97
Fig. 26 Conceptual example query syntax tree after wrapper optimisation ................... 98
Fig. 27 Query analysis algorithm.................................................................................. 101
Fig. 28 Selecting query processing algorithm .............................................................. 102
Fig. 29 Deleting query processing algorithm................................................................ 104
Fig. 30 Updating query processing algorithm .............................................................. 105
Fig. 31 SQL generation and processing algorithm for mixed queries .......................... 107
Fig. 32 The "employees" test relational schema........................................................... 109
Fig. 33 The "cars" test relational schema...................................................................... 109
Fig. 34 The resulting object-oriented schema............................................................... 110
Fig. 35 Raw (parsed) query syntax tree for example 1 ................................................. 117
Fig. 36 Typechecked query syntax tree for example 1 ................................................. 117
Fig. 37 View-rewritten query syntax tree for example 1 .............................................. 118
Fig. 38 Optimised query syntax tree for example 1...................................................... 118
Fig. 39 Simply-rewritten query syntax tree for example 1 ........................................... 120
Fig. 40 Raw (parsed) query syntax tree for example 2 ................................................. 121
Page 221 of 235
Appendix C
Fig. 41 Typechecked query syntax tree for example 2 ................................................. 121
Fig. 42 View-rewritten query syntax tree for example 2 .............................................. 122
Fig. 43 Optimised query syntax tree for example 2...................................................... 122
Fig. 44 Simply-rewritten query syntax tree for example 2 ........................................... 124
Fig. 45 Raw (parsed) query syntax tree for example 1 (imperative query) .................. 130
Fig. 46 Typechecked query syntax tree for example 1 (imperative query) .................. 130
Fig. 47 View-rewritten query syntax tree for example 1 (imperative query) ............... 131
Fig. 48 Optimised query syntax tree for example 1 (imperative query)....................... 131
Fig. 50Typechecked query syntax tree for example 2 (imperative query) ................... 132
Fig. 54 Typechecked query syntax tree for example 3 (imperative query) .................. 134
Fig. 57 Raw (parsed) query syntax tree for example 1 (mixed query) ......................... 138
Fig. 58 Typechecked query syntax tree for example 1(mixed query) .......................... 138
Fig. 59 View-rewritten query syntax tree for example 1(mixed query) ....................... 139
Fig. 60 Optimised query syntax tree for example 1(mixed query) ............................... 139
Fig. 61 Raw (parsed) query syntax tree for example 3 (mixed query) ......................... 140
Fig. 62 Typechecked (parsed) query syntax tree for example 3 (mixed query) ........... 140
Fig. 63 View-rewritten query syntax tree for example 3 (mixed query) ...................... 141
Fig. 64 Optimised query syntax tree for example 3 (mixed query) .............................. 141
Fig. 65 Raw query syntax tree for example 1 (multi-wrapper query) .......................... 145
Fig. 66 Typechecked query syntax tree for example 1 (multi-wrapper query) ............ 146
Fig. 67 SBQL-optimised query syntax tree for example 1 (multi-wrapper query) ...... 147
Fig. 68 Raw query syntax tree for example 2 (multi-wrapper query) .......................... 148
Fig. 69 Typechecked query syntax tree for example 2 (multi-wrapper query) ............ 149
Fig. 70 SBQL-optimised query syntax tree for example 2 (multi-wrapper query) ...... 150
Fig. 71 Department’s location distribution ................................................................... 157
Fig. 72 Employee’s department distribution................................................................. 157
Fig. 73 Employee’s salary distribution ......................................................................... 157
Fig. 74 Employee’s info’s length distribution .............................................................. 157
Fig. 75 Female employee’s first name distribution ...................................................... 158
Fig. 76 Female employee’s surname distribution......................................................... 158
Fig. 77 Male employee’s first name distribution.......................................................... 158
Fig. 78 Male employee’s surname distribution ............................................................ 159
Fig. 79 Evaluation times and optimisation gain for query 1......................................... 160
Fig. 88 Evaluation times and optimisation gain for query 10....................................... 164
Page 222 of 235
Appendix C
Fig. 100 Evaluation times and optimisation gain for query 22..................................... 171
Fig. 120 Evaluation times and optimisation gain for query 1 (SBQL optimisation).... 182
Fig. 121 Evaluation times and optimisation gain for query 2 (SBQL optimisation).... 182
Fig. 122 Wrapper architecture ...................................................................................... 195
Fig. 123 Wrapped legacy relational schema ................................................................. 200
Fig. 124 Lowest-level object-oriented wrapper schema ............................................... 200
Fig. 125 Primary wrapper views................................................................................... 201
Page 223 of 235
Index of Listings
Listing 1 Simplified updateable views for the conceptual example ............................... 92
Listing 2 Code of views for the test schemata .............................................................. 110
Listing 3 SBQL view code for retrieving “rich employees” ........................................ 151
Listing 4 SBQL view code for retrieving employees with their departments .............. 152
Listing 5 SBQL view code for retrieving employees with their cars ........................... 153
Listing 6 SBQL view code for retrieving rich employees with white cars................... 154
Listing 7 Client-server communication example (server side)..................................... 196
Listing 8 Client-server communication example (client side)...................................... 196
Listing 9 Contents of relational-schema.dtd ................................................................. 201
Listing 10 Sample schema description ......................................................................... 202
Listing 11 Sample XSD for metabase........................................................................... 204
Listing 12 Sample result XML document..................................................................... 207
Listing 13 Sample result pattern string ......................................................................... 208
Listing 14 Sample contents of connection properties................................................... 210
Listing 15 Sample inserter output................................................................................. 212
Listing 16 Sample schema generator output................................................................. 212
Listing 17 Wrapper server startup output ..................................................................... 213
Listing 18 Sample contents of wrapper.conf ................................................................ 214
Listing 19 Sample CLI session ..................................................................................... 218
Page 224 of 235
Index of Tables
Table 1 Optimisation testbench configuration.............................................................. 156
Table 2 Test data for cars.............................................................................................. 159
Table 3 Wrapper protocol server commands and messages ......................................... 197
Table 4 Wrapper protocol client commands and messages .......................................... 198
Table 5 Type mapping between SQL, XSD and SBQL ............................................... 206
Page 225 of 235
Bibliography
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Kuliberda K., Wiślicki J., Adamus R., Subieta K.: Object-Oriented Wrapper for
Relational Databases in the Data Grid Architecture, On the Move to Meaningful
Internet Systems 2005 Proceedings. LNCS 3762, Springer 2005, pp. 367-376
Wiślicki J., Kuliberda K., Adamus R., Subieta K.: Relational to Object-Oriented
Database Wrapper Solution in the Data Grid Architecture with Query Optimization
Issues, IBM Research Report RC23820 (W0512-007), Proceedings SOBPI'05
(ICSOC'05), Amsterdam, Holland, 2005, pp. 30-43
Adamus R., Kuliberda K., Wiślicki J., Subieta K.: Wrapping Relational Data
Model to Object-Oriented Database in the Data Grid Architecture, SOFSEM SRF
2006 Proceedings, Merin, Czech Republic, 2006, pp. 54-63
Wiślicki J., Kuliberda K., Kowalski T., Adamus R.: Integration of Relational
Resources in an Object-Oriented Data Grid, SiS 2006 Proceedings, Łódź, Poland,
2006, pp. 277-280
Wiślicki J., Kuliberda K., Kowalski T., Adamus R.: Implementation of a
Relational-to-Object Data Wrapper Back-end for a Data Grid, SiS 2006
Proceedings, Łódź, Poland, 2006, pp. 285-288
Wiślicki J., Kuliberda K., Kowalski T., Adamus R.: Integration of relational
resources in an object-oriented data grid with an example, Journal of Applied
Computer Science (2006), Vol. 14 No. 2, Łódź, Poland, 2006, pp. 91-108
Wislicki J., Kuliberda K., Adamus R., Subieta K.: Relational to object-oriented
database wrapper solution in the data grid architecture with query optimization
issues, International Journal of Business Process Integration and Management
(IJBPIM), 2007/2008 (to appear)
Atkinson M., Bancilhon F., DeWitt D., Dittrich K., Maier D., Zdonik S.: The
Object-Oriented Database System Manifesto, Proc. of 1st Intl. Conf. on Deductive
and Object Oriented Databases 89, Kyoto, Japan, 1989, pp. 40-57
Wiederhold G.: Mediators in the Architecture of Future Information Systems, IEEE
Computer, 25(3), 1992, pp. 38-49
Bergamaschi, S., Garuti, A., Sartori, C., Venuta, A.: Object Wrapper: An ObjectOriented Interface for Relational Databases, EUROMICRO 1997, pp. 41-46
Subieta K.: Obiektowość w bazach danych: koncepcje, nadzieje i fakty. Część 3.
Obiektowość kontra model relacyjny, Informatyka, Marzec 1998, pp. 26-33
Object-Relational Impedance Mismatch,
http://www.agiledata.org/essays/impedanceMismatch.html
Neward, T.: The Vietnam of Computer Science,
http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx
ADO.NET, http://msdn2.microsoft.com/en-us/data/aa937699.aspx
Page 226 of 235
Bibliography
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Stonebraker M.: Future Trends in Database Systems, IEEE Data Engineering
Conf., Los Angeles, 1988
Ahmed R., Albert J., Du W., Kent W., Litwin W., Shan M-C.: An overview of
Pegasus, In: Proceedings of the Workshop on Interoperability in Multidatabase
Systems, RIDE-IMS’93, Vienna, Austria, 1993
Albert J., Ahmed R., Ketabchi M., Kent W., Shan M-C.: Automatic importation of
relational schemas in Pegasus, In: Proceedings of the Workshop on Interoperability
in Multidatabase Systems, RIDE-IMS’93, Vienna, Austria, 1993
Fishman D.H. et al: Overview of the Iris DBMS, Object-Oriented Concepts,
Databases, and Applications, Kim and Lochovsky, editors, Addison-Wesley, 1989
Important Features of Iris OSQL, Computer Standards & Interfaces 13(1991)
(OODB Standardization Workshop, Atlantic City, May 1990).
Ahmed R., DeSmedt P., Du W., Kent W., Ketabchi M., Litwin W., Rafii A., Shan
M-C.: Using an Object Model in Pegasus to Integrate Heterogeneous Data, April
1991
Fahl G., Risch T.: Query processing over object views of relational data, The
VLDB Journal (1997) 6: 261–281
Saltor F., Castellanos M., Garcia-Solaco M.: Suitability of data models as canonical
models for federated databases, SIGMOD RECORD 20:4, 1991
Fahl G., Risch T., Sköld M.: AMOS – An architecture for active mediators, In:
Proc. Int. Workshop on Next Generation Information Technologies and Systems,
NGITS ’93, Haifa, Israel, 1993
Jarke M., Koch J.: Query optimization in database systems, ComputSurv 16:2, 1984
Amos II, http://user.it.uu.se/~udbl/amos/
Risch T., Josifovski, V., Katchaounov, T.: Functional data integration in a
distributed mediator system, In Functional Approach to Computing with Data,
P.Gray, L.Kerschberg, P.King, and A.Poulovassilis, Eds. Springer, 2003
Josifovski V., Risch T.: Query Decomposition for a Distributed Object-Oriented
Mediator System, Distributed and Parallel Databases J., 11(3), pp 307-336, Kluwer,
May 2002
Litwin W., Risch T.: Main Memory Oriented Optimization of OO Queries using
Typed Datalog with Foreign Predicates, IEEE Transactions on Knowledge and
Data Engineering, 4(6), 517-528, 1992
Datalog and Logic-Based Databases, http://cs.wwc.edu/~aabyan/415/Datalog.html
Datalog, http://en.wikipedia.org/wiki/Datalog
Tomasic A., Amouroux R., Bonnet P.: The Distributed Information Search
Component Disco and the World Wide Web, SIGMOD Conference 1997, pp. 546548, 1997
Tomasic A., Raschid L., Valduriez P.: Scaling Access to Heterogeneous Data
Sources with DISCO, IEEE Transactions on Knowledge and Data Engineering,
Volume 10, pp 808-823, 1998
Czejdo B., Eder J., Morzy T., Wrembel R.: Designing and Implementing an
Object–Relational Data Warehousing System, DAIS´01 Proceedings, Volume 198,
2001, pp. 311-316
Fussell M. L.: Foundations of Object-Relational Mapping,
http://www.chimu.com/publications/objectRelational/
ORM, http://en.wikipedia.org/wiki/Object-relational_mapping
Mapping Objects to Relational Databases: O/R Mapping In Detail,
http://www.agiledata.org/essays/mappingObjects.html
Page 227 of 235
Bibliography
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
Object-relational mapping articles, http://www.service-architecture.com/objectrelational-mapping/articles/
DAO, http://en.wikipedia.org/wiki/Data_Access_Object
DAO,
http://java.sun.com/blueprints/corej2eepatterns/Patterns/DataAccessObject.html
Bergamaschi S., Garuti A., Sartori C., Venuta A.: Object Wrapper: An ObjectOriented Interface for Relational Databases, 23rd EUROMICRO Conference '97
New Frontiers of Information Technology, 1997, pp. 41-46
Keller W.: Object/Relational Access Layers, A Roadmap, Missing Links and More
Patterns, EuroPLoP 1998
Keller W.: Mapping Objects to Tables: A Pattern Language, in Proceedings of the
1997 European Pattern Languages of Programming Conference, Irrsee, Germany,
Siemens Technical Report 120/SW1/FB 1997
Keller W., Coldewey J.: Relational Database Access Layers: A Pattern Language,
in Collected Papers from the PLoP’96 and EuroPLoP’96 Conferences, Washington
University, Department of Computer Science, Technical Report WUCS 97-07,
February 1997
Grove A.: Data Access Object (DAO) versus Object Relational Mapping (ORM),
http://www.codefutures.com/weblog/andygrove/archives/2005/02/data_access_obj.
html
ORM software list, http://en.wikipedia.org/wiki/List_of_objectrelational_mapping_software
Matthes, F., Rudloff A., Schmidt, J.W., Subieta, K.: A Gateway from DBPL to
Ingres, Proc. of Intl. Conf. on Applications of Databases, Vadstena, Sweden,
Springer LNCS 819, pp. 365-380, 1994
Modula-2, http://www.modula2.org/
Ingres, http://www.ingres.com/
Oracle, http://www.oracle.com/
EOF, http://en.wikipedia.org/wiki/Enterprise_Objects_Framework
OpenStep, http://en.wikipedia.org/wiki/OpenStep
The Objective-C Programming Language,
http://developer.apple.com/documentation/Cocoa/Conceptual/ObjectiveC/ObjC.pdf
Apache Cayenne, http://cayenne.apache.org/
Ajax, http://www.ajaxgoals.com/
Velocity, http://velocity.apache.org/
IBM JDBC wrapper, http://www-128.ibm.com/developerworks/java/library/jjdbcwrap/
JDO, http://java.sun.com/products/jdo/
JDO 2.0 specification, http://www.jcp.org/en/jsr/detail?id=243
JDO, http://en.wikipedia.org/wiki/Java_Data_Objects
Apache JDO, http://db.apache.org/jdo/index.html
Apache OJB, http://db.apache.org/ojb/
XORM, http://www.xorm.org
Speedo, http://speedo.objectweb.org/
JDO implementations, http://db.apache.org/jdo/impls.html
EJB, http://java.sun.com/products/ejb/
EJB JSR 220, http://jcp.org/en/jsr/detail?id=220
EJB, http://en.wikipedia.org/wiki/Enterprise_Java_Beans
JPA, http://java.sun.com/javaee/technologies/persistence.jsp
Page 228 of 235
Bibliography
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
The Java Persistence API - A Simpler Programming Model for Entity Persistence,
http://java.sun.com/developer/technicalArticles/J2EE/jpa/
Hibernate, http://www.hibernate.org/
.NET Framework, http://msdn2.microsoft.com/en-us/netframework/default.aspx
Apache Torque, http://db.apache.org/torque/
CORBA, http://www.omg.org/gettingstarted/corbafaq.htm
CORBA, http://en.wikipedia.org/wiki/CORBA
OMG, http://www.omg.org/
ORB/ODBMS Integration, http://www.ime.usp.br/~reverbel/orb_odbms.html
Liang K-C., Chyan D., Chang Y-S., Lo W., Yuan S-M.: Integration of CORBA and
Object Relational Databases, Computer Standards and Interfaces, Vol. 25, No. 4,
Sept. 2003, pp. 373-389
Sandholm T.: Object Caching in a Transactional, Object-Relational CORBA
Environment, Master's Thesis, Stockholm University, October 1998, http://cis.cs.tuberlin.de/Dokumente/Diplomarbeiten/1998/sandholm.ps.gz
XML, http://www.w3.org/XML/
XQuery, http://www.w3.org/XML/Query/
XQuery 1.0: An XML Query Language, http://www.w3.org/TR/xquery/
Carey M., Kiernan J., Shanmugasundaram J., Shekita E., Subramanian S.:
XPERANTO: A Middleware for Publishing Object-Relational Data as XML
Documents
Carey M., Florescu D., Ives Z., Lu Y., Shanmugasundaram J., Shekita E.,
Subramanian S.: XPERANTO: Publishing Object-Relational Data as XML
Shanmugasundaram J., Kiernan J., Shekita E., Fan C., Funderburk J.: Querying
XML Views of Relational Data
Funderburk J. E., Kiernan G., Shanmugasundaram J., Shekita E., Wei C.:
XTABLES: Bridging relational technology and XML, IBM Systems Journal. 41, No.
4, 2002
Braganholo V. P., Davidson S. B., Heuser C. A.: From XML view updates to
relational view updates: old solutions to a new problem
Braganholo V. P., Davidson S. B., Heuser C. A.: UXQuery: Building Updatable
XML Views over Relational Databases
Valikov A., Kazakos W., Schmidt A.: Building updateable XML views on top of
relational databases
CoastBase, http://www.netcoast.nl/tools/rikz/COASTBASE.htm
Kazakos W., Kramer R. Schmidt A.: Coastbase – The Virtual European Coastal
and Marine Data Warehouse; Computer Science for Environmental Protection
2000, Vol 2, (ed. A. Cremers, K. Greve) Metropolis-Verlag, 2000, pp. 646-654
Shao F., Novak A., Shanmugasundaram J.: Triggers over XML Views of Relational
Data
DB2, http://www-306.ibm.com/software/data/db2/
RDF, http://www.w3.org/RDF/
SWARD, http://user.it.uu.se/~udbl/sward.html
Petrini J., Risch T.: SWARD: Semantic Web Abridged Relational Databases,
http://user.it.uu.se/~udbl/sward/SWARD.pdf
RDQL, http://www.w3.org/Submission/RDQL/
SPARQL query language, http://www.w3.org/TR/rdf-sparql-query/
SPARQL protocol, http://www.w3.org/TR/rdf-sparql-protocol/
SPARQL XML result format, http://www.w3.org/TR/rdf-sparql-XMLres/
Page 229 of 235
Bibliography
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
Cyganiak R.: A relational algebra for SPARQL, HP Labs, Bristol, UK
Dokulil J.: Evaluation of SPARQL queries using relational databases
Harris S.: SPARQL query processing with conventional relational database systems
Perez de Laborda C., Conrad S.: Bringing Relational Data into the SemanticWeb
using SPARQL and Relational.OWL, Data Engineering Workshops, 2006.
Proceedings. 2006, pp. 55-55
Newman A.: Querying the Semantic Web using a Relational Based SPARQL,
http://jrdf.sourceforge.net/RelationalBasedSPARQL.pdf
ICONS, http://www.icons.rodan.pl/
Staniszkis E., Nowicki B.: ICONS based Knowledge Management in the Process of
Structural Funds Projects Preparation,
http://www.rodan.pl/badania/publikacje/publications/%5BStaniszkis2004a%5D.pdf
Staniszkis W., Staniszkis E.: Intelligent Agent-based Expert Interactions in a
Knowledge Management Portal, http://www.icons.rodan.pl/presentations/S03.ppt
OMG UML, http://www.uml.org/
WfMC, http://www.wfmc.org/
Staniszkin W., Nowicki B.: Intelligent CONtent management System Presentation
of the IST ICONS project, 4-th International Workshop on Distributed Data and
Structures WDAS 2002, Paris, March 2002.
Staniszkis W., Nowicki B.: Intelligent CONtent management System. Presentation
of the IST ICONS Project, Conference TELBAT Teleworking for Business,
Education, Research and e-Commerce, October 2002, Vilnus, Lithuania
Staniszkis W.: ICONS Knowledge Management for Structural Fund Projects. A
Case Study, DIESIS – Driving Innovative Exploits for Sardinian Information
Society Knowledge Management Case Study, Calgiari, Sardinia 11-12 September
2003
Beatty J., Brodsky S., Nally M., Patel R.: Next-Generation Data Programming:
Service Data Objects, A Joint Whitepaper with IBM and BEA, 2003,
http://ftpna2.bea.com/pub/downloads/commonj/Next-Gen-Data-ProgrammingWhitepaper.pdf
Beatty J., Brodsky S., Ellersick R., Nally M., Patel R: Service Data Objects,
http://ftpna2.bea.com/pub/downloads/commonj/Commonj-SDO-Specificationv1.0.pdf
Portier B., Budinsky F.: Introduction to Service Data Objects,
http://www.ibm.com/developerworks/java/library/j-sdo/
EMF, http://www.eclipse.org/emf/
SDO specification, http://www128.ibm.com/developerworks/library/specification/ws-sdo/
Codd E. F.: A Relational Model of Data for Large Shared Data Banks,
Communications of the ACM, Vol. 13, No. 6, June 1970, pp. 377-387
Selinger P. G., Astrahan M. M., Chamberlin D. D., Lorie R. A., Price T. G.: Access
path selection in a relational database management system, SIGMOD Conference
1979, pp. 23-34
Astrahan M. M.: System R: A relational approach to data management, ACM
Transactions on Database Systems, 1(2), pp. 97-137, June 1976
Chamberlin D.: Bibliography of the System R Project,
http://www.mcjones.org/System_R/bib.html
Jarke M., Koch J.: Query Optimization in Database Systems, ACM Computing
Surveys 16(2), 1984, pp. 111-152
Page 230 of 235
Bibliography
123 Chaudhuri S.: An Overview of Query Optimization in Relational Systems,
Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on
Principles of database systems, Seattle, Washington, United States , pp. 34-43,
1998
124 Ioannidis Y. E.: Query Optimization, ACM Computing Surveys, symposium issue
on the 50th Anniversary of ACM, Vol. 28, No. 1, March 1996, pp. 121-123
125 Andler S., Ding I., Eswaran K., Hauser C., Kim W., Mehl J., Williams R.: System
D: A distributed system for availability, In Proceedings of the 8th International
Conference on Very Large Data Bases (Mexico City). VLDB Endowment,
Saratoga, 1982, pp. 33-44
126 Apers P. M. G., Hevner A. R., Yao S. B.: Optimization algorithms for distributed
queries, IEEE Trans. Softw. Eng. SE-g 1,5768, 1983.
127 Bernstein P. A., Goodman N.: Concurrency control in distributed database
systems, ACM Comput. Surv. 13, 2 (June), 1981, pp. 185-221
128 Bernstein P. A., Goodman N., Wong E., Reeve C. L., Rothine J. B., JR.: Query
processing in a system for distributed databases (SDD-1), ACM Trans. Database
Syst. 6, 4 (Dec.), 1981, pp. 602-625
129 Ceri S., Pelagattin G.: Allocation of operations in distributed database access,
IEEE Trans. Comput. C-31, 2, 1982, pp. 119-128.
130 Chang J.-M.: A heuristic approach to distributed query processing, In Proceedings
of the 8th International Conference on Very Large Data Bases (Mexico City).
VLDB Endowment, Saratoga, Calif., 1982, pp. 54-61
131 Cheung T.-Y.: A method for equijoin queries in distributed relational databases,
IEEE Trans. Comput. C-31,8, 1982, pp. 746-751
132 Chiu D. M., Bernstein P. A., Ho Y. C.: Optimizing chain queries in a distributed
database system, Tech. Rep. TR-01-81, Computer Science Dept., Harvard
University, Cambridge, Mass, 1981
133 Chu W. W., Hurley P.: Optimal query processing for distributed database systems,
IEEE Trans. Comput. C-31,9, 1982, pp. 835-850
134 Epstein R., Stonebraker M.: Analysis of distributed data base processing strategies,
In Proceedings of the 6th International Conference on Very Large Data Bases
(Montreal, Oct. l-3). IEEE, New York, 1980, pp. 92-101
135 Epstein R., Stonebraker M., Wong E.: Distributed query processing in a relational
data base system, In Proceedings of the ACM-SIGMOD International Conference
on Management of Data (Austin, Tex., May 1l-June 2). ACM, New York, 1978, pp.
169-180
136 Forker H. J.: Algebraical and operational methods for the optimization of query
processing in distributed relational database management systems, In Proceedings
of the 2nd International Symposium on Distributed Databases (Berlin). Elsevier
North-Holland, 1982, pp. 39-59
137 Gavish B., Segev A.: Query optimization in distributed computer systems, In
Management of Distributed Data Processing, J. Akoka, Ed. Elsevier North-Holland,
New York, 1982, pp. 233-252
138 Hevner A. R.: The optimization of query processing on distributed database
systems. Ph.D. dissertation. Computer Science Dent. Purdue University, West
Lafayette, 1979
139 Kambayashi Y., Yoshikawa M., Yajima S.: Query processing for distributed
databases using generalized semi-joins, In Proceedings of the ACM-SZGMOD
Page 231 of 235
Bibliography
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
International Conference on Management of Data (Orlando, Fla., June 2-4). ACM,
New York, 1982, pp. 151-160
Sacco G. M., Yao S. B.: Query optimization in distributed database systems, In
Advances in Computers, vol. 21. Academic Press, New York, 1982, pp. 225-273
Wong E.: Dynamic rematerialization: Processing distributed queries using
redundant data, IEEE Trans. Softw. Eng. SE-g, 3, 1983, pp. 228-232.
Yu C. T., Chang C. C.: On the design of a query processing strategy in a
distributed database environment, In SIGMOD 83, Proceedings of the Annual
Meeting (San Jose, California, May 23-25), ACM, New York, 1983, pp. 30-39
Codd E. F.: A database sublanguage founded on the relational calculus, In
Proceedings of the ACM-SIGFIDET Workshop, Data Description, Access, and
Control (San Diego, Calif., Nov. ll-12). ACM, New York, 1971, pp. 35-68.
Codd E. F.: Relational completeness of data base sublanguages, In Courant
Computer Science Symposia No. 6: Data Base Systems. Prentice-Hall, New York,
1972, pp. 67-101
Lacroix M., Pirotte A.: Domain-Oriented Relational Languages, VLDB 1977, pp.
370-378
Codd E. F.: Relational Completeness of Data Base Sub-languages, In R. Rustin,
editor, Data Base Systems. Prentice Hall, 1972
Ono K., Lohman G.: Measuring the complexity of join enumeration in query
optimization, In Proceedings of the 16th Int. VLDB Conference, Brisbane,
Australia, August 1990, pp. 314-325
Nahar S., Sahni S., Shragowitz E.: Simulated annealing and combinatorial
optimization, In Proc. 23rd Design Automation Conference, 1986, pp. 293-299
Swami A., Gupta A.: Optimization of large join queries, In Proc. ACM-SIGMOD
Conference on the Management of Data, Chicago, 1988, pp. 8-17
Swami A.: Optimization of large join queries: Combining heuristics and
combinatorial techniques, In Proc. ACM-SIGMOD Conference on the
Management of Data, Portland, 1989, pp. 367-376
Kirkpatrick S., Gelatt C. D., Jr., Vecchi M. P.: Optimization by simulated
annealing, Science, 220(4598), 1983, pp. 671-680
Ioannidis Y., Wong E.: Query optimization by simulated annealing, In Proc. ACMSIGMOD Conference on the Management of Data, San Francisco, 1987, pp. 9-22
Y. Ioannidis and Y. Kang. Randomized algorithms for optimizing large join
queries, In Proc. ACM-SIGMOD Conference on the Management of Data, Atlantic
City, 1990, pp. 312-321
Mannino M. V., Chu P., Sager T.: Statistical profile estimation in database systems,
ACM Computing Surveys, 20(3), 1988, pp. 192-221
Christodoulakis S.: On the estimation and use of selectivities in database
performance evaluation, Research Report CS-89-24, Dept. of Computer Science,
University of Waterloo, 1989
Olken F., Rotem D.: Simple random sampling from relational databases, In Proc.
12th Int. VLDB Conference, Kyoto, 1986, pp. 160-169
Lipton R. J., Naughton J. F., Schneider D. A.: Practical selectivity estimation
through adaptive sampling, In Proc. of the 1990 ACM-SIGMOD Conference on the
Management of Data, Atlantic City, 1990, pp. 1-11
Haas P., Swami A.: Sequential sampling procedures for query size estimation. In
Proc. of the 1992 ACM-SIGMOD Conference on the Management of Data, San
Diego, 1992, pp. 341-350
Page 232 of 235
Bibliography
159 Haas P., Swami A.: Sampling-based selectivity estimation for joins using
augmented frequent value statistics, In Proc. of the 1995 IEEE Conference on Data
Engineering, Taipei, 1995
160 Christodoulakis S.: Implications of certain assumptions in database performance
evaluation. ACM TODS, 9(2), 1984, pp. 163-186
161 Ioannidis Y., Christodoulakis S.: On the propagation of errors in the size of join
results, In Proc. of the 1991 ACM-SIGMOD Conference on the Management of
Data, Denver, 1991, pp. 268-277
162 Kooi R. P.: The Optimization of Queries in Relational Databases. PhD thesis, Case
Western Reserve University, 1980
163 Piatetsky-Shapiro G., Connell C.: Accurate estimation of the number of tuples
satisfying a condition, In Proc. 1984 ACM-SIGMOD Conference on the
Management of Data, Boston, 1984, pp. 256-276
164 Muralikrishna M., DeWitt D. J.: Equi-depth histograms for estimating selectivity
factors for multi-dimensional queries, In Proc. of the 1988 ACM-SIGMOD
Conference on the Management of Data, Chicago, 1988, pp. 28-36
165 Ioannidis Y., Christodoulakis S.: Optimal histograms for limiting worst-case error
propagation in the size of join results, ACM TODS, 18(4), 1993, pp. 709-748
166 Ioannidis Y.: Universality of serial histograms, In Proc. 19th Int. VLDB
Conference, Dublin, 1993, pp. 256-267
167 Ioannidis Y., Poosala V.: Balancing histogram optimality and practicality for query
result size estimation, In Proc. of the 1995 ACM-SIGMOD Conference on the
Management of Data, San Jose, 1995, pp. 233-244
168 Haas L., Freytag J.C., Lohman G.M., Pirahesh H.: Extensible Query Processing in
Starburst, In Proc. of ACM SIGMOD, Portland, 1989
169 Starburst, http://www.almaden.ibm.com/cs/starwinds/starburst.html
170 Pirahesh H., Hellerstein J.M., Hasan W.: Extensible/Rule Based Query Rewrite
Optimization in Starburst, In Proc. of ACM SIGMOD, 1992
171 Lohman G.M.: Grammar-like Functional Rules for Representing Query
Optimization Alternatives, In Proc. of ACM SIGMOD, 1988
172 Graefe G., McKenna W.J.: The Volcano Optimizer Generator: Extensibility and
Efficient Search, In Proc. of the IEEE Conference on Data Engineering, Vienna,
1993
173 Graefe G.: The Cascades Framework for Query Optimization, In Data Engineering
Bulletin. 1995
174 Graefe G., DeWitt D.J.: The Exodus Optimizer Generator, In Proc. of ACM
SIGMOD, San Francisco, 1987
175 Kozankiewicz H., Leszczyłowski J., Subieta K.: Implementing Mediators through
Virtual Updateable Views, Engineering Federated Information Systems,
Proceedings of the 5th Workshop EFIS 2003, July 17-18 2003, Coventry, UK,
pp.52-62
176 Kozankiewicz H., Leszczyłowski J., Subieta K.: Updateable Views for an XML
Query Language, CAiSE FORUM 2003, Klagenfurt/Velden, Austria
177 Kozankiewicz H., Leszczyłowski J., Subieta K.: Updateable XML Views,
ADBIS'03, Dresden, Germany, 2003
178 D.C. Tsichritzis, A. Klug (eds.): The ANSI/X3/SPARC DBMS Framework: Report
of the Study Group on Data Base Management Systems, Information Systems 3,
1978.
Page 233 of 235
Bibliography
179 Plodzien J.: Optimization Methods In Object Query Languages, PhD Thesis.
IPIPAN, Warszawa 2000
180 Płodzień J., Kraken A.: Object Query Optimization in the Stack-Based Approach.
Proc. ADBIS Conf., Springer LNCS 1691, 1999, pp. 303-316
181 Płodzień J., Subieta K.: Optimization of Object-Oriented Queries by Factoring Out
Independent Subqueries, Institute of Computer Science Polish Academy of
Sciences, Report 889, 1999
182 Płodzień J., Kraken A.: Object Query Optimization through Detecting Independent
Subqueries, Information Systems, Pergamon Press, 2000
183 Płodzień J., Subieta K.: Applying Low-Level Query Optimization Techniques by
Rewriting, Proc. DEXA Conf., Springer LNCS 2113, 2001, pp. 867-876
184 Płodzień J., Subieta K.: Query Optimization through Removing Dead Subqueries,
Proc. ADBIS Conf., Springer LNCS 2151, 2001, pp. 27-40
185 Płodzień J., Subieta K.: Static Analysis of Queries as a Tool for Static Optimization,
Proc. IDEAS Conf., IEEE Computer Society, 2001, pp. 117-122
186 Płodzień J., Subieta K.: Query Processing in an Object Data Model with Dynamic
Roles, Proc. WSEAS Intl. Conf. on Automation and Information (ICAI), Puerto de
la Cruz, Spain, CD-ROM, ISBN: 960-8052-89-0, 2002
187 Shaw G. M., Zdonik S. B.: An object-oriented query algebra, Proceedings of DBPL
Workshop, 1989, pp. 103-112
188 Beeri C., Kornatzky Y.: Algebraic optimization of object-oriented query languages,
ICDT, 1990, pp. 72-88
189 Mitchell G., Zdonik S.B., Dayal U.: Object-oriented query optimization: what's the
problem?, Depertmant of Computer Science, Brown University, USA, Technical
Report No. CS-91-41, 1991
190 Rich C., Scholl M. H.: Query optimization in an OODBMS, BTW, Informatik
Aktuell, Springer, Heidelberg, 1993
191 Cluet S., Delobel C.: Towards a unification of rewrite-based optimization
techniques for object-oriented queries, Query Processing for Advanced Database
Systems, Morgan Kaufmann, 1994, pp. 245-272
192 Kemper A., Moerkotte G.: Query optimization in object bases: exploiting relational
techniques, (in) Query Processing for Advanced Database Systems, Morgan
Kaufmann, 1994, pp. 101-137
193 Leu T. W.: Compiling object-oriented queries, Department of Computer Science,
Brown University, USA, Technical Report No. CS-94-05, 1994
194 Cherniack M., Zdonik S. B., Nodine M. H.: To form a more perfect union
(intersection, difference), International Workshop on Database Programming
Languages, Gubbio, Italy, 1995
195 Cherniack M., Zdonik S. B.: Rule languages and interval algebras for rule-based
optimizer, Proceedings of SIGMOD, 1996, pp. 401-412
196 Hauer A., Kröger J.: Query optimization in CROQUE project, Proceedings of
DEXA, Springer LNCS 1134, 1996, pp. 489-499
197 Abbas I., Boucelma O.: A framework for algebraic optimization of object-oriented
query languages, Proceedings of DEXA, Springer LNCS 1308, 1997, pp. 478-487
198 Grust T., Kröger J., Gluche D., Heuer A., Scholl M. H.: Query evaluation in
CROQUE – calculus and algebra coincide, Proceedings of 14th British National
Conference on Databases, Springer LNCS 1271, 1997, pp. 84-100
199 Cherniack M., Zdonik S. B.: Changing the rules: transformations for rule-based
optimizers, Proceedings of SIGMOD, 1998, pp. 61-72
Page 234 of 235
Bibliography
200 Kröger J., Illner R., Rost S., Heuer A.: Query rewriting and search in CROQUE,
Proceedings of ADBIS, Springer LNCS 1691, 1999, pp. 288-302
201 Litwin W.: Linear Hashing : a new tool for file and tables addressing. Reprinted
from VLDB-80 in READINGS IN DATABASES. 2nd ed. Morgan Kaufmann
Publishers, Inc., Stonebraker M.(Ed.), 1994
202 Litwin W., Nejmat M. A., Schneider D. A.: LH*: Scalable, Distributed Database
System, ACM Trans. Database Syst., 21(4), 1996, pp. 480-525.
203 Zql, Pierre-Yves Gibell, http://www.experlog.com/gibello/zql/
204 Sahri S., Litwin W., Schwartz T.: SD-SQL Server: a Scalable Distributed Database
System, CERIA Research Report 2005-12-13, December 2005
205 eGov-Bus, http://www.egov-bus.org/web/guest/home
206 ODRA White Paper, http://iolab.pjwstk.edu.pl:8081/forum/image.aspx?a=95
207 R.G.G.Cattell, D.K.Barry (Eds.): The Object Data Standard: ODMG 3.0. Morgan
Kaufmann 2000
208 Cook W.R., Rosenberger C.: Native Queries for Persistent Objects: A Design White
Paper, http://www.db4o.com/about/productinformation/whitepapers/
Native%20Queries%20Whitepaper.pdf, 2006
209 Hibernate - Relational Persistence for Java and .NET, http://www.hibernate.org/,
2006
210 Subieta K.: Theory and Construction of Object-Oriented Query Languages. PJIIT Publishing House, ISBN 83-89244-28-4, 2004, 522 pages (in Polish)
211 Subieta K.: Stack-Based Approach (SBA) and Stack-Based Query Language
(SBQL). http://www.sbql.pl, 2006
212 Albano A., Bergamini R., Ghelli G., Orsini R.: An Object Data Model with Roles,
Proc. VLDB Conf., 39-51, 1993
213 Jodlowski A., Habela P., Plodzien J., Subieta K.: Objects and Roles in the StackBased Approach, Proc. DEXA Conf., Springer LNCS 2453, 2002.
214 Kozankiewicz H.: Updateable Object Views. PhD Thesis, 2005,
http://www.ipipan.waw.pl/~subieta/ -> Finished PhD-s -> Hanna Kozankiewicz
215 Kozankiewicz H., Leszczylowski J., Subieta K.: Updateable XML Views. Proc. Of
ADBIS’03, Springer LNCS 2798, 2003, 385-399
216 Kozankiewicz H., Stencel K., Subieta K.: Integration of Heterogeneous Resources
through Updatable Views, ETNGRID-2004, Proc. published by IEEE
217 Torque DTD, http://db.apache.org/torque/releases/torque3.2/generator/database.dtd.txt
218 Apache Torque, http://db.apache.org/torque/
219 PostreSQL, http://www.postgresql.org/
220 Firebird, http://www.firebirdsql.org/
221 MS SQL Server 2005, http://www.microsoft.com/sql/default.mspx
222 Apache Ant, http://ant.apache.org/
223 Java Service Wrapper,
http://wrapper.tanukisoftware.org/doc/english/introduction.html
Page 235 of 235

An object-oriented wrapper to relational databases with

Transcription

Similar documents

- Lab for Media Search - National University of Singapore

- Lab for Media Search - National University of Singapore

Article (Published version)

INF3800/INF4800 Søketeknologi

nassi-shneiderman diagrams and tabletalk

Photo search by face positions and facial attributes on

P2P Gnutella analysis

Communication Interface

Bolthouse 10.23.2015.cdr