Heterogeneous Distributed Database Management: The HD-DBMS

Transcription

Heterogeneous Distributed Database Management: The HD-DBMS
Heterogeneous Distributed Database
Management: The HD-DBMS
ALFONSO F. CARDENAS
Invited Paper
The proliferation of different D B M S and advances in computer
networking and communicationshave led to increasing heterogeneous distributed D B M S network scenarios. Major heterogeneity
problems and challenges include: different database models, syntactically and semantically different DBMS, different types of controls (recovery, etc.), etc. We addressherein the long-range goal for
a heterogeneous distributed DBMS (HD-DBMS) to be able to s u p
port a network in which any user in any node can be given an
integrated and tailored view or schema, while in reality the data
may reside in one singledatabase or in physicallyseparated databases, managed individually by the same type of D B M S (by the
only one theuser understands) or by different DBMS.
We cite the major approaches to data sharing and accessing:
from the primitive commercial file and database unload/load and
PC download, to common interfaces on top ofexisting DBMS, to
the R&D and prototypeefforts toward thelong-range desires. Commercial availability of the more encompassing thrusts may become
a reality withthemounting
problems, opportunity costs, and
demand for data sharing inthe heterogeneous world. Major
research and development projects in this arena areleading toward
some partial attainment of thelong-range objective.
The UCLA HD-DBMS project is highlighted herein, with a presentation of its status, progress, andplans. Itis a longer range project, with the uniquefeature of allowingany user in the networkto
use a preferred database model and D M 1 to access or update any
data in theheterogeneous network. H D D B M S is to provide a multilingual interface to heterogeneous distributed databases.
I. INTRODUCTION
The use of different generalized database management
systems (DBMS) has
proliferated in
recentyears.As a result,
the heterogeneous distributed database management system scenario has emerged. An example is shown in Fig. 1.
A variety of large and small computers and even personal
computers, mostof them with their own
and incompatible
DBMS, may betied togetherin a networkas shown. Satellite
communication may be involved between distant nodes.
Local networks of computers might involved,
be
such as at
Manuscript received October 22,1985; revised November 26,
1986.
The author is with the ComputerScience Department, University of California, Los Angeles, CA 90024, USA, and with Computomata International Corp., Los Angeles, CA 90025, USA.
IEEE Log Number 8714292.
location Xin thefigure. Database machines may
be involved
in managing the databaseb) at a node, or ina local network.
The heterogeneous database environment has emerged
in many organizations, governmental environments, and
computer networks dueto
a) the proliferation of databases
b) the proliferation of different DBMS
c) the proliferation of a variety of minicomputers and
personal computers
d) the emergence of networks tying together heterogeneous hardware and software
e) advances in data communications
f ) distributed databases
g) lack of overall (not justlocal) database planning and
control.
This environment adds to all the challenges and problems for the homogeneous distributed environment the
problems of heterogeneity
of DBMS: different data models
(network, hierarchical, relational,
etc.), syntactically and
semantically different DBMS (e.g., even within the relationalmodelfamilythere
are significantdifferences
between SQL and QBE), different types of controls in each
GDBMS (e.g., backup and recovery, locking and synchronization, etc.). It is desired thatafuture heterogeneous distributed DBMS (HD-DBMS) provide not only distribution
transparency but also heterogeneity transparency.
The example, in Fig. 1, shows four databases involved: at
IocationXthere isadatabase managed bya relational DBMS
and another managed by a network DBMS (e.g., a CODASYL System) on another local computer, and at two other
remote locations there are two separate databases, each
managed bya hierarchical DBMS such as IMS. With current
is expected
technologies every user accessing any database
to use the facilities and abideby thesyntactic and semantic
regulations of the DBMS which created eachdatabase,
unless some interface software is developed by theinstallation. Although some such interface softwareis, of necessity, being developed frequently by
user installations, thus
far it allows only cosmetic variations from the syntax and
semantics of the DBMS managing the particulardatabase.
PROCEEDINGS OF THE IEEE, VOL. 75, NO. 5, M A Y 1987
u)uIKH
x
Fig. 1. Heterogeneous database management system scenario-an example.
What would be greatly desired to enhance the attractiveness and usefulnessof sharingdata resources in a heterogeneous network, as shown in Fig. 1, is the ability for
a user to access any database as if it were managed under
any one of theDBMS at one central location. Thus a user
could have accessto any databasethrough a relational view
at oneof the minicomputers in local
the networkat location
X, while anotherset of users, at nodeswhere IMS databases
reside, could have accessto any databaseas if it were managed by IMS. Ideally, a user anywhere could look at any
database through his favoriteDBMS, whether or notit was
the preferred one at his site.
Therewill be,ofcourse,manyuserswhowillconfinetheir
database accessesto a localdatabase managedby thelocal
DBMS. In fact, they will undoubtedly constitute the majority of the bulk applications. However, there is a growing
population of usersacross the heterogeneous scenario
whose needs we address herein.
In a nutshell, theideal long-range goals would be for an
HD-DBMStobeabletosupportanetworkinwhichanyuser
in any node can be given an integrated and tailored view
or schema, while inreality thedata mayreside in one single
database or in physically separateddatabases,managed
individually by the
same type ofDBMS (by the only one the
user understands) or by a different DBMS. No HD-DBMS
with such full capabilities is available today. There are many
unsolved problems, and others remain to be uncovered.
However, major research and development projectsin this
arena areleading towardsome partial attainment of the previous long-range objectives.
Section II outlines the range of approaches to the heterogeneous challenge, fromtheextremeof
database
unload/load, to a common interface for DBMS, to the top
of the lineand long-range R&D and prototypeefforts. Section Ill outlines the UCLA HD-DBMS project and progress
striving for the longerrange goals.
11. APPROACHES
TO COMMUNICATION
IN
ENVIRONMENT
A
HETEROGENEOUS
A. File and Database Unload/Load
One extreme and simplistic approach t o accessing data
in a heterogeneous environment i s to physically
CARDENAS: HETEROGENEOUS DISTRIBUTED DATABASE MANAGEMENT
unload the data from the source hardware/software
environment, then
store them in a common format understood and handled by both source and target environments, and
load them into the target environment.
This approach in fact has been used to unload/load data
files across heterogeneous environments for several years.
The common format has been usually ASC 11. In a number
of cases, specialized types of data are unloadedlloaded via
common formats specially designed and tailored to carry
data descriptionsandother
semantic information from
source to target. Examples aresatellite telemetrydata, geographical data types, etc.
comWith the emergenceand proliferation1)personal
of
putersandthe
many different types (IBM PC, Apple’s
McIntosh, etc.), 2) LocalArea Networks (LANs), and 3)
incompatible software packages (spread sheets,word processors, file and databasemanagers,etc.), the need for
unloadlload has increased. There is an increasing number
of commercial “file transfer” programs whose
task is to help
transferfiles fromonemachine
to another, providing
increasing levels of help andtransparency over the many
details that heterogeneity springs
onto the unload/load
process. An examination of the generalized file transfer
technologycommerciallyavailable shows that thedata that
can be easily transferred are essentially sequential files.
Even random-access files are not easily transferable. The
transfer ofmore
sophisticatedfiles
such as indexed
sequential files, e.g., VSAM in large IBM operating systems,
is not transparent and usually is not automated (try, for
example,totransfer such files between Honeywell and IBM
environments). Ausual approach is to unloadsuch indexed
files to sequential files, stripping all indexing
and othervendor-specific control information, use the sequential ASC II
file transfer route, and load
into the equivalent version
(including indexing) on the target environment. Conversion software houses and specialists are usually necessary
to dothis.
Simplistic unload/load or file transfer programs are of little help in a database environment. All the crucial relatability know-how, indexing, and/orhashing, would be lost
in converting the database into a number of individual
589
and attractiveness of relational data management has led
sequential ASC I1files. The process of loading the
database
into the target environment would involve new database
to many commercial relational DBMS. Relationalmicrodefinitions, new indexing definitions, invocation of loadingDBMS now predominateatthePC level. Furthermore, there
is a tendency for various vendors to provide a relational
utilities that might requirespecial formating over the files
being transferred,etc. In all, it is a practically most difficult
interface or view on top of their
existing nonrelational
DBMS. Examples of this are Cullinet’s IDMSlR providing a
process. Try converting a CODASYL database schemaand
relational interface to the internal CODASYL IDMS datacontents from any vendor you select to IMS or vice versa.
base [Ill, andHoneywell’s PDQ permitting a relational
Specialized Database
LoadlUnload:
A number
of
interface to operate directly on thenative CODASYL IDS/
pioneering efforts on thesubject of “file description and
translation” in a true database environment were started
D M IV database (or, as another option, on
a copy of IDS/
the
in the 1970s,[38] and others. Other more recent efforts
D M IV database) [19]. A few vendors have first provided a
relational interface on a native nonrelational system and
include IBM’s Express [42]. Due to commercial interests, a
numberof specialized database unloadlload packages have
then have redone it into a more native relational system.
been developed by number
a
ofvendors. The predominant
Unfortunately, there is no relational standard. Although
ones are
the relational structure and relational
calculus andalgebra
are the common thread, and IBM‘s SQL and QBE may be
1) those that unload froma nonrelational database syslargely takenas de facto standards, the fact is that thereare
tem and load to a relational database system;
manyvariationsof SQLandQBE. Exceptforthesimpler read2) those that “download” a database or portions of a
only commandSELECT. FROM. WHERE. ,there are
database from a mainframe computer to a smaller
noticeable variations in other areas, such as o n updating
computer or PC.
commands whose semantics and integrity controls vary
The subtle difference between thesetwo types
of packages
greatly among implementations.
Thus a standard relational
is that the latter are more numerous, usually less sophisinterface among DBMS vendors has not evolved, and it
ticated, and generallydownload intosequential ASC IIfiles
appears that it will not evolve. Nevertheless, there will be
for input to simple file handlers such as spread sheets,
a common general way of structuring databases and s u p
graphics packages, etc.
porting operations, specifically project and join. FurtherAmong the most frequently cited mainframe database
more, relational interfacesfor DBMS and mainframeto PC
bridges may be exercised jointly in some cases. For examloadlunload software bridges are IMS’s Extract to unload
portions froman IMS database and loadit as an equivalent
ple, download data from the mainframe nonrelational
dataSQUDS or DB2 database[22], and Honeywell’sPDQ facility
base via its relational interface into the PC environment,
to unload portions from an IDSlDM IV database to a relawhere the copy might be manipulated
via another relational IQ database (essentially SQL) [19].
tional DBMS (like the dBase family).
In spite of the relational DBMS differences, the interThere is a growing number of mainframePC
data downloading packages. In fact, a growing numberof DBMS venconnection of DBMS with relational interfacesmay be augdors now offer
such capabilityfrom theirDBMS to sequenmented furtherbya network-wide”genera1ized relational”
interface thatmay provide a user at anynode in Fig. 1transtial files for use at the PC level. In a number of cases the
parency over the relational DBMS differences. Such “gendownload may be invoked from the
PC, and data are then
downloaded from thedatabase into the following:
eralized relational“ interfacewill nothave the challenge of
a)ASC II or DIF format for use with popular spreadmapping schemas between different models, and translating between widely different
database access languages;
sheets, word processors, and even the dBase relational
it only has to be concerned with relatively simpler differmicro DBMS; an example is Informatics’ AnswerlDB for
downloading IMS data [23]-[25], and its 123/Answer, and
ences between the various relational interfaces of each
dBase/Answerpackages thattranslate the ASC 11 files
DBMS in the network.Such generalized relational interface
retrieved by AnswerlDBinto the proper internal format of or front-end t o a distributed relational DBMS network is
exemplified by the SDC project outlined in thearticle by
these packages.
Templeton et al. in this issue.
b) Special vendor format foruse with thevendor’s own
PC software packages; an example is Cullinet’s facility to
download data from its IDMSlR database into its PC GolC. Research and Prototype Projects
dengatesoftware packages (including graphics,spreadA number of longer range R&D and prototype projects
sheets, etc.) [12].
are aimed at achievingthe goals cited in the Introduction.
A fundamental problem or challenge with downloaded
They do not entail data unloadlload or download, nor the
data is, since it is a redundant copy of mainframe
data, the
existence of relational DBMS or relational interfacesto every
maintenance of consistency orsynchronization
in an
DBMS in the network. Such long-range projects address
updatingenvironment.Theusualcurrentcommercial
and
perform in various ways the mapping or translation
of
approach is t o download andpropagate pertinent updates
database structures and corresponding
data-accessing lanfrom the maindatabase periodically, and either a) not perguage commandsillustrated
in Fig. 2. Mostprojects
mit updating from the
PC level orb) permit updatingon the
approach this by introducing intermediate
database model
PC level and not reflect the updates “upstream.”
and databaseaccess language levels.
Both thetypesof intermediate models and languages and the number of levels
B. Relational Interfaces to DBMS
vary, with the number of
levels usually ranging from three
to five depending on the project. Major efforts include
One of the
hopes of theadvocates of relational database
UClA’s HD-DBMS project [4] the mainfocus of Section Ill;
management is that it will be widelyadopted. The success
..
..
..
PROCEEDINGS OF THE IEEE, VOL. 75, NO. 5, M A Y 1987
m
lications
it on have appeared
literature
open
the in
[4]-[6],
[20], [35].This section-provides astatus of the project, progress, and near-term.plans.
The HD-DBMS strives to achieve the major long-range
goals cited inSection I, not constraining theuser to acommon arbitrarylanguage nor t o read-only queries; however,
it is a very-long-range possibility, beyond the more
achievable MULTIBASE and SIRIUS-DELTAtasks. Its primaryfocus
is on the heterogeneity challenge, not on the database
physical distribution challenge taken up by other efforts
assuminga homogeneousorcommon DBMSenvironment.
The HD-DBMS approach entails a global (network-wide)
conceptual model of data and a global internal model of
data. The global conceptual modeis a highly logical model
I
I
I
of the information content of the integrated system. It is
used as avehicle in the processof understanding userquerFig. 2. Relationship between schema translation and DML
ies and decomposing them to extract information from
translation.
is the
individual databases. The globalinternalmodel
access-path oriented model of the structure of the
integrated system showing precisely the data structures and
Computer Corporation ofAmerica’s MULTIBASE [13], [14],
access paths actually available (e.g., network-wide access
[29],[31],[MI; INRIA’s heterogeneous SIRIUS-DELTA [Iq;
routes, local database relationships, inter-database relaand Informatics’ MARK V DAG [24]. In addition to these
tionships, etc.), but independent of a specific implemenprojects; a number of authors
have alsoaddressed thechaltation.The global internal model
is the union of the internal
lenge [I], [181, [261-[281, [301, [331, WI, I451, [461.
models ofeach participating database. It is used as avehicle
The majority of the current research and development
inthe processof identifyingthespecific
access paths
efforts and initial commercial support expected simplify
through the differentdatabases that should be followed
to
the task by requiring every user to communicate using a
answer
userqueries,
while
shielding
user
the
from
the
need
common language and data
model [MULTIBASE, DAG, SIRto know the intricacies of the
access path implementation
IUS-DELTA]. A frequent choice is a relational model [SIRand physical storage of data.The global internal model
IUS-DELTA]. MULTIBASE further simplifies the task for a
identifies major elements outside the realm
or interests of
more near-term achievable system by handling only read
each
local
DBMS:
relationships
between
entities
in differtype of globaldatabase requests; all updates are managed
ent DBMS, logical replication, and perhaps physical replocally by individual sites. The complexity and restrictions
lication of entities and relationships
in heterogeneous dataof updatingthrough user views in relational DBMS is
bases.
acknowledged. The initial commercial version of MULTIAn extension of the ER model proposed by Chen [q is
BASE may be available in thenear future. It will provide distribution transparency and heterogeneity transparency for fundamentally used for the conceptual level, rather than
other models [I], [2]. Our model for the internal level[ZO]
read-only global queries using DAPLEX as a common lanis an evolution of our earlier proposal[35]; it was inspired
guage and data model [43]. (See the article by Chan et al.
by
and includes ingredientsfrom
DlAM (Data Independent
in thisissue which includes asynopsis of DAPLEX.) In conAccessing Model) [32], [39], and [40].
trast, HD-DBMS provides a multilingual interface t o hetOther significant efforts toward heterogeneous DBMS
erogeneous distributed databases, while theseother
networks propose providingusers with either anew model
systems provide only a monolingual interface to heteroview, typicallya relational view (MARKV DAG
a hierarchical
geneous distributed databases.
view), of every database, and one query language to be
DAG (Distributed Application Generator)(241 intends to
eventually translated into search programs to access the
be a generator of applications and also of the necessary
DBMS commands embeddedin the application program to actual databases. A crucial difference between our project
and others is that we wish to permit each user or program
accessdatabases managed by IMS andlor SQUDS.The
at a node to view andaccess data in thedatabase model and
database view t o the application is a logically integrated
language desired rather than force learning another lanhierarchical IBM database, although it may be composed
guage or reprogramming for another model and
language.
of portions residing in
several separate IMS and/or SQUDS
The desired languages would be constrained to a few, of
databases at different sites and under different IBM opercourse, but not to only one in a given
database model.
ating systems and data communications software (CICS,
Fig. 3 shows the proposedsystem architecture. Theglobal
IMSIDC).
query translator processes the query initially submittedby
1
----
I l l . THE UCLA HD-DBMS PROJECT
A. Overall Architecture
The UCLA HD-DBMS project is a multi-year, long-range
project startedin thelate 1970s. Since 1983 part of the project has involvedcollaborationandsupportfromInformatics General Corp. (now Sterling
Software). Several pub-
GENEOUSCARDENAS:
auserand,withtheknowledgeofthevirtualdatabasemodel
associated to that query,translates it to the formacceptable
by the global conceptual model (an ER model) and global
internal model.The query is then decomposed bya query
decomposer andaccess path selector, a translator,into the
appropriate subquery(ies).
The subquery(ies)will thenhave
to be translated into the query language or data manipulation language of a specific DBMS, so as to then be pro-
591
~ I l u t l o F'rogrmn
n
$
a
=
a
v i a l Layer
Un1114 Vimal Lsyw
U n iG l o w l a y e r
1I
I
I
I
,
L o u l D N h
Fig. 4. Layered architecture for the HD-DBMS.
MODEL 1
AREA
AREA
MODEL 2
bREA
Fig. 3. System architecture and building blocks to support
communication in a heterogeneous database environment.
an ER representation of the application program's virtual
layer view.
3) The UVL query is then mapped into a unified global
layer (UGL) query. The UGL is an ER conceptual representation of the entire heterogeneous database. It represents
the union of individual unified local layer (ULL) database
views.
4) The UGLquery is transformed into a set of one or more
ULL queries andan access plan. A ULL definition exists for
each physical database. Externally, a ULL definition of a
physical database is an ERviewof thatdatabase. Internally,
ULL accesspath specifications existfor data within a single
physical database and for each interdatabase relationship
between two or more physical databases.
5) A ULLquery istransformed intoa
local layer (LL) DBMS
dependentquery, and then sentto local
the DBMS.TheULL
queries are performed according to theprecedence established by theaccess plan.
Once theresults of the original queryare obtained, the
data are translated back through the layered architecture
cessed by the corresponding node(s) t o extract the information from the
specific physicaldatabase(s) involved. The
answers to thesubquery(ies1 arethen joined togetherand
reformatted by the
query composer, a translator, according
to thevirtual database model. The result is the answer to
the original querybased on the user's virtual model.
Therewill be,ofcourse,manyuserswhowiIIconfinetheir
queries locally to a given physical database managed by a
given DBMS.TheywiII undoubtedlyconstitute the majority
of the bulk volume
applications. In this case, the local DBMS
will process their queries directly and completely. The
global query translator, the query decomposer and
access
path selector, and the querycomposer will notbe needed
for such cases.
HD-DBMS Layered Architecture: A number of important
-w NUMBER
r PAR?
WULL
r PART MSCRlPMN
catalogs or directories and mapping or translation procedures for data structures and data access commands are
NON-NULL
r PAR?CLISSIFlCAllON
necessary. Fig.4 shows the five different layers of our architecture and their associated models. The local layer conWULL
r WAREHOUSENUMBER
tains the physical databases actually stored.The outermost
DESCRIPTK)N
WOKWULL r WAREHOUSE
layer is the collection of virtual databases as seen by the
users of the heterogeneous database network. The outWULLNUMBER
r PAR?
ermost layer is the database network. The user deals with
NUYBER
~OK~R
r LWAUEHOUSE
the outermostlevel, called the virtual model(VM),and the
HAND
ON
WULL
r WANW
system should handle all the
necessary mapping to extract
information from the localphysical databases.
NOKNULL
r
Following Fig. 4:
NON-NULL r
1) An application program databaseview
is defined using
r
the data definition language of a host DBMS. This view is
".NULL
r
defined to the HD-DBMS at the virtual layer (VL).
r
2) An application program query (DML or query comr
mand) entersthevirtual layer and is transformed by the HDFig. 5. DB1 definition: An SQL relational database.
DBMS into a unified virtual layer (UVL) query. This layer is
592
I
'1
't
*'
I
*I
I
PROCEEDINGS OF THE IEEE, VOL. 75, NO. 5, M A Y 1987
SCHEMA NAME IS PART-WAREHOUSE
into the form expected by the application program. This
involves both structural and data translation.
AREA NAME IS DATA-AREA
RECORD NAME IS PART
LOCATION MODE IS CALC HASH. P t USING
B. Example Heterogeneous Database Network
P# IN PART DUPLICATES ARE NOT ALLOWED
The following is an example of a close-to-reality heterogeneous database network. It will be used in subsequent
sections.
The scenario consists of four databases under different
DBMS: SQL (two databases), CODASYL, and IMS. Each of
the databases i s defined inFigs. 5-8.Fig. 9 presents the unified global conceptual ER model (UGCM) that covers the
four databases; note that the partitioned global conceptual
model shows the contribution each
of of the fourdatabases
to the UGCM.
WITHIN DATA-AREA
02 PX
: TYPE IS CHAR 16
:TYPEISCHAR%
02PD
02 CLASS :
TYPE IS CHAR 1
RECORD NAUE IS WH
02 WX
: TYPE IS CHAR 5
02 WD
:TYPE ISCHAR 16
020TY
:TYF€ISDEC6
SET NAME IS INVENTORY
OWNER IS PART
MEMBER IS Wn
SCHEMA
NAME IS DB2.
AREA
NAUE Is DB-AREA
MANDATORY AUTOMATIC
ASCENDING KEY IS W
I
NAME IS PART.
LOCATION UODE IS CALC H A W . P#
USING P t IN PART.
WPLICATES NOT ALLOWED.
RECORD
mw
DUPLICATES ARE ALLOWED
s n OCCURRENCE s E m n o N IS LOCATION
MODE OF OWNER
Fig. 8. DB4 definition: A CODASYL network database.
WITHIN DB-AR€A.
02 PX TYPE IS CHAR 5.
02 PD TYPE IS CHAR 25.
02 CL TYPE IS CHAR 2.
NAME IS WH.
RECORD
WlTnlN DB-AREA
02 W 1 TYPE IS CHAR 5.
02 WD TYPE IS CHAR 25.
NAME IS INVENTORY
SET
OWNER IS PART.
MEMBER IS
WH.
UANDATORY AUTOUAW.
ASCENDING KEY IS W I IN
WH.
DUPLICATES ARE NOT ALLOWED.
SET OCCIlRENCE SELECTION IS THRU
LOCATION MODE OF OWNER.
El:
E2
COMWSEW
R1:
COMPOSED OF
'4
Fig. 6. DB2 definition: A CODASYL network database.
P A R I T W E E GLOBAL CONCEPTUAL MODEL
DBD
NAME I 001. ACCESS = HISAM
DATASET
OD1 = DEPTDDI. DEVICE = 3380. OVFLW = DEPTOVF
SEGM
NAME E PART, BVTES I32
LCHILD
NAME = (COMP-ISSEM0. O W ) , PAIR = ASSEMI-COMP
FIELD
NAUE = (PI. SEOI. BYTES = 5. START = 1
FIELD
NAME = PD, BYTES
FIELD
NAME 5 CL. BYTES = 2, START = 31
SEGM
NAME = ASSEMB-COMP, BYTES = 10,
PART
WWOUSE
R2:
AVPWSl-IN
OBI,): I
h
*
b
El
'
P*,
P o , CL
= 25, START = 6
POINTER = (LPART. TWIN, LTWIN).
PARENT = ((PART). (PART, PHYSICAL, 003) )
FIELD
NAME = [PI. SEO). BYTES = 5, START = 1
FIELD
NAME z O N ,BYTES
SEGM
NAME = COUP-ASSEMB, BYTES = 10. POINTER =PAIRED,
= 5, START = 6
PARENT = PART, SOURCE = (ASSEMI-COMP, 003)
FIELD
NAME = (P#, SEO), BYTES = 5, START = 1
FIELD
NAME = O N . BYTES = 5. START = 6
i..... ......................
.
_
ASSEMB-COMP
PU, QTY
~
L U CONCEPTUAL MODEL FORMED
BY JOININGTHE ULCM OF DB(11 DM41
-
Fig. 9. Global conceptual model in the HD-DBMS.
Asampleof queriesissued atthe virtualconceptual model
isshowninFig.10,withatraceofthedataaccessedthrough
the various heterogeneous databases.
PART
PX, PD, CL
............................
UGCM
THE WKlED G
C. Database Mappingflranslation
...........................
COMP-ASSEMB
PX, QTY
Fig. 7. DB3 definition: An ISM/DB hierarchical database.
CARDENAS: HETEROGENEOUS DISTRIBUTED DATABASE MANAGEMENT
The UGCM is the conceptual model of the integrated
database. It is formed by the union of the ULCMs of the
participating databases, and any inter-database relationships. A V M can be derived from the UGCMso that a VM
591
QUERY 1:
FlNDTHESTOCKSTATUSOFTHEPARTWITHW=l12
NOTES
Rw SWIm( n!4y b.fwndIn DB(1), DB(2)
md DB(4), n
c
hunder s dWumnl GDBYS
QUERY 2
Fig. 10. Sample queries.
is independent of the organization orphysical disposition
of the underlying database(s). Thus a number of crucial
database mappings or translation procedures
are needed.
These translations in a few cases may be more like reformatting. The mappings should be kept at least in the network data dictionarykatalog, Fig. 3.
Thedata model (schema or subschema) mappings or
translations have been identified or developed thus far,
from theuser view through thevarious data model layers,
to the individualDBMS and back to theuser. We assessed
ourworkandworkbyothersinthefieldandoptedforusing
algorithms for the following
specific translations proposed
by Dumpala and Arora[IS]:
Mapping Relational Schema into ER Schema
Mapping Network Schema into ER Schema
Mapping Hierarchical Schema into ER Schema
Mapping ER Schema into Relational Schema
Mapping ER Schema into Network Schema
Mapping ER Schema into Hierarchical Schema
These algorithms are ready for implementation.
The following is just an example of the mapping between
relational andER schemas. A relationin a relationalschema
will correspond to one of the followingER constructs:
*
an entity
a k-ary relationship
a binary relationship with attributes (1:N or N : l )
an M : N binary relationship set without attributes
an entity, plus key attributes of some other entities.
Thus a relational query targetedat a relation will be translatedintodifferentquery
commands at the ER level,
depending on which of the
abovedataconstructs
are
involved. More on this in Section Ill-E.
D. Query/DML Translation
Theterms"datamanipulationlanguage(DML)"or"query
language" shall be used synonymously to refer to any of the
data access languagesof the major typesof DBMS: CODASYL DML, relational SQL, or IMS DUI. The terms"database
request" and "query" will also be used synonymously.
As per Fig. 4, the queriesmade by a user on a V M should
be translatedto theequivalent queriesat the UGCMlevel,
then at the UGlMlevel, then at the ULlMlevel, and finally
at the LLM level for processing by the particular DBMS
involved; the answer is then composed or reformatted to
adhere to theoriginal V M level. Thus a number of crucial
mappings or translation procedures
is needed. Fig.2 shows
therelationshipbetween
schema translationandDML
translation. We provide ourprogress in the followingsections.
I ) The ER DML Global Conceptual Language: The HDDBMS architecture uses an ER DML as the global conceptual language (GCL), atthe unified global conceptual
level.
all virtual layer DMLs are transThis is the DML into which
lated. This is also the DMLwhose queriesare decomposed
and distributedto various local physicaldatabases. Two of
the most important justifications for aGCL are the following. First, a GCL reduces the number oftranslations (both
schematranslationand DMLtranslations) necessarilywithin
a distributed database system.It is easy to understand that,
without a GCL, m x n translators would be needed in an
HD-DBMS that has n physical databases and supports rn
virtual model databases, while with a GCL, only m
n
translations would be needed. Secondly, a GCL allows for
a single, conceptual view of the whole
database, which, in
reality, consists of a group ofheterogeneous physicaldatabases.
Functional Requirements of a GCL: The single most
important functional requirement of a GCL is that it be
semantically "rich" enough to express queries fromall the
virtual level DMLs. This meansthat, for any existing virtual
level DML, any DML statement may find its equivalent in
the GCL. It is not necessary to have a one to one correspondence between the GCL and other virtual level DMLs
so long as the GCL is able to expressany statement
expressed by a virtual level DML.
How do we know
if a GCL
meets this requirement?There has not been a satisfactory
answer to thisquestion despite various attempts that
have
been made. One of them is the introduction of the term
"completeness" [3], [36]. Informally, a DMLis complete if,
for a database, any piece of informationstored in thedatabase can be retrieved using that DML. A GCL that is complete should meet this requirement. Unfortunately, there
is no consensus on the definition ofcompleteness for an
ER based DML.
In addition to the
above requirement, it is desirable for
a GCL to be as independent of the physical aspects of the
database as possible. The reason for this is that a CCL is a
DML against a conceptualdatabase only. This requirement
alone excludes the possibility of using a procedural type
DML (record-at-a-time DML) as CCL sincea procedural DML
ties itself too closely to the physical aspect of a database.
There are thus twochoices for a GCL: 1)an algebraictype
of DML, and 2 ) a calculus type of DML. There have been
several proposals for "ER algebra" in the literature[8],[36].
All those proposals are clearly inspired by the relational
algebra proposed by Codd[9],[IO]. However, the situation
is different in the ER model as opposed to the relational
model. In the relational
model, only thedata entity i s a relation. All the operations in the relational algebra apply to
relations only. The result of any relational algebraic operation i s also a relation.In contrast, there are two basic data
entities inthe ER data model: entityandrelationship.
Semantically, they are different. An algebra that applies
on
+
PROCEEDINGS OF THE IEEE, VOL. 75, NO. 5 , M A Y 1987
twodataentitiesisconsiderablymoredifficulttodefinethan
one that applies
on a single data entity
since as the number
of data entities increases the types ofthe outputdata entity
and their semantic meanings seems to grow rapidly. The
area of ER algebra is still at its infant stage. More research
is needed to find a good definition ofER algebra. Consequently, we have not adopted any existing ER algebra for
the GCL.
The €I? DML: Our choice forGCL is a calculus type language. Fig. 11 shows a summary of the GCL. We call it cal-
fiedcharacteristicsandrequirementsofalgorithms
to
translate between the various model and language layers
of Fig. 4. Our major approachlobjective is to develop a
“DDUDML compiler-compiler work bench” from which
we
can more easily develop the desired translations. Thus we
have completed translation algorithms for:
Hierarchical IMS D U I (except logical databaseandFast
Path commands) into the ER DML
CODASYL DML into the ER DML
Relational Algebra into the ER DML
Relational SQL into the ER DML.
Some of this work, that focusing on the translation from
SQL to ER DML, is presented in[6];examples of it are provided in the next section.
Translation algorithms for the following are now being
developed:
ER DML into relational SQL
ER DML into hierarchical D U I
ER DML into CODASYL DML.
Our next task is to start prototype implementation of a
subset of the following algorithms for proof of concept:
CODASYL DML into ER DML into SQL
SQL into ER DML into CODASYL DML.
Fig. 11. The ER DML global conceptual language.
culustype because there is a naturalcorrespondence
betweenthistypeofDMLandtherelationalcalculus.Afundamental aspect of a calculus-based DML is the notion of
the tuple variable: In relational calculus, tuple variable is
a variable that ranges over some named relation. In acalculus type ER DML (i.e., the proposed GCL), the ’a-list’ in
the GET statement plays the similar role. An ’a-list’ is a
variable that ranges over a specified set of ’paths,‘ where
a path is a traversal of an ER diagram. The results from our
research havedemonstrated that,with afew modifications,
most DML (DUI,CODASYL DML, SQL, and relationalalgebra) against the corresponding data model (hierarchical,
network, relational) find their equivalence in this
GCL.
Therefore, this GCL satisfies the first requirement posed
earlier. This GCL has little, if anythingat all, to do with the
physical aspects ofthe database, which is thesecond
requirement.
In arrivingat our requiredER DML, wealso analyzed four
earlier relational-type
languages proposed by otherauthors:
EAS-E [MI, GORDAS [16], ERL 1211, and DAPLEX[29],[43].
EAS-E is very English-like, but seems best suited foran interactive query language rather than
a good intermediary language. DAPLEX i s the query language based on the CCA
functional data model. GORDAS is a read-only query language. However, the GETcommand
it uses seemsvery powerful, andso our language patternsthe GETcommand after
GORDAS. Our language is very similar to ERL. ERL claims
to be a complete query language
(READ, INSERT, MODIFY,
DELETE), but there are a fewfeatures we dropped.The language presented will be seen to approach a relational language with the major addition of commands using interentity relations.
2) QueryIDML Translation Algorithms: Wehave identi-
GENEOUSCARDENAS:
Small programs in languages such as COBOL and C with
CODASYL DMLand SQL embedded in them would
be used
to test the translation paths.
E. SQL to ER DML Translation and Examples
Herein we provide some insight into the translations
involved, by outliningSQUDS to ER DML translation. The
translation environment and scheme from SQUDS to ER
DML has the following characteristics:
It is composed of a set of 10 basic rules.
Each SQL statement is one of six types of commands.
Each SQL statement appliesto one of five
types of relations.
A rule may, in turn, cause other rules to be invoked.
Fig. 12 outlines the translation matrix. It portrays the ten
rules that compose the overallSQL to ER DML translation
1
,
R W 4
1
RULE9
1
w
’
9+10+
CONNECT
1
Fig. 12. SQL to ER DML translation scheme matrix.
algorithm. Our translationcovers all SQL DML commands
except “groupby” and “aggregate” functions which we
may add later.
Let uslook at three example translations.Fig. 13 provides
two sample ER schemas and corresponding relational
sche-
595
Example 1:
SCHENA 1
R3 (P#, WX, On)
R4 (P#.l, PX.2. O
M
W.
O N
.
Single rmpping with mm than o n miation
*
A.scauing Mmpb Ubnm 1
.
Ualngrub3
*
s(xstatemmi
WD
SELECT
PI. Wt, WD
FROM
Rl,W,Rl
WHERE
R1.W = '1"
AND
R1.W
AND
R3.Wt = R2.Wt
R3.W
I
SCHEMA 2
GET (W,W#,WD) WHERE
(El RZ €2 6 E1.W
=
,100')
Fig. 14. Example of SQL to ER DML translation.
Example 2:
-
R (EMPX, DER#, NAME, BIRTH DATE,
Fig. 13. Sampleschemaandcorrespondingrelational
schema.
mas. Figs. 14-16 provide three DML translationexamples.
The translation scheme for SQL read-type commands follows, explaining in detail the Examples in Figs. 14 and 15,
and much ofFig. 12. The translation detail forall SQL commands appears in [6].
In our data model translation
strategy, adapted from [15],
a relation in a relational
schema corresponds t o one of the
five following ER constructs:
1)An Entity
2) A k-ary relationship
3 A binary relationship with attributes (1 : N o r N:l)
4) An N: M binary relationship set without attributes
5) An entity, plus key attributes of some other relations.
We shall call therespectiverelationstype
1, type 2,
. . ,and type5 (see Fig. 13).We now discuss, for each type
of relation, how a single mapping involving
such a relation
can be mapped into the ER DML.
Type 7 Relation: In thiscase, a relation,R, with attributes
A l , A2,
,An, corresponds to exactly an entity, E, with
attributes A l , A2,
,An. Forexample, for the following
relation in a relational schema:
Relation: EMP(EMPNO,NAME,DNO,SAL)
SELECT
*
FROM
R1
WHERE
P#tN
SELECT
PI
F R o y R 3
WHERE
*
Wlr'Wl23'
ERttamnt
This InMI.dW m r m n i la gemmed first:
GET(P#) WHERE (R2 6 R z W C W 1 2 3 ' )
ThhbUmRulmumnt(~rimaapnviournaiemnl
nclMin the WHERE slur):
GET(P#,PD,CL) WHERE (El R2 6 E1.W I
GET(W) WHERE (R2 6 W W * W l Z 3 ' ) )
Fig. 15. Example of SQL to ER DML translation.
Example 3:
.
-
update
Type5 relation
.
ACCesaing Mmpb achema
*
Udng rub 9,lO
z
s(xrtaiemeni
ER schema having the following
there exists an entity in the
format:
Entity EMP(EMPNO,NAME,DNO,SAL).
The attribute names need not be exactly the same so long
as their semantics remainthe same, for example,SAL in the
relation versus SALARY in the entity.
Thetranslation ofan SQLquery involving thistypeof
relation into theER DML is straightforward since in both data
models only one data entity is involved (a relation in the
5%
UPDATE
R
SET
D E P T l s ' W 6 "TLGENO'
WHERE
E Y W E l W
ERrmtefnenl
DISCONNECT E2(EWP&'ElMS3') FROM E l P4 R12
MODIFY E 2 W E N G ) WHERE (ELEYPb'E10493')
CONNECT EZ(E"ElM93')TO EI(DEPTh'DZ3)
IN R l Z
Fig. 16. Example of SQL to ER DML translation.
PROCEEDINGS OF THEIEEE,
VOL. 75, NO. 5, M A Y 1987
relational model and entity in the
ER model). The following
rule is designed to guide such translation:
Rule 1: For a single mapping involving a type1 relation,
generate a GET statement in the ER DML. The
a-list in theGET statement takes the form of the
select-clause in the single mapping.The
WHERE
clause in theGET statement includes twoparts.
The first is the name of the entity involved.The
second takes the form of the WHERE-clause in
the single mapping.
Type2 Relation:A type 2 relation in the relational
schema
corresponds to a k-ary relationship in the ER schema. An
attribute of a type2 relation is either one of the attributes
of that k-ary relationship or one
of the key attributesof the
entities connected by the k-ary relationship. The rule for
translating a single mapping involving
a type 2 relation into
the ER DML is as follows:
2 relation,
Rule 2: For a single mapping involving a type
generate a GET statement in the ER DML. The
a-list in theGET statement takes the form of the
select-clause in the single mapping.
The WHERE
clause in theGET statement includes two parts.
The first consists of the corresponding relationship
name
and
the names of
the
k-entities connected by this relationship.
The second
part takes the form of the
WHERE-clause in the
single mapping.
rule 2 is used to guide the translation, otherwise rule 1 is
used.
Single Mapping Involving Morethan One Relation: Using
the rules developedso far, we are able to translate a single
ER DML. These
mapping involvinga single relation into the
rules alone have limited use since most queries, when
expressed in termsof SQL, involve more than one relation.
Let us discuss howthis kind of multi-relation mapping
can
be mapped into the ER DML.
To start, we note thatat the global conceptual model level
we have an ER schema which is a connected ER diagram.
By connected we mean that any two entity sets in the diagram are connected via some relationship sets and some
entity sets. This is important to our developing the translation rulessince this guarantees that thereis at least some
directed (through a single relationship set) or indirected
(through more than one relationship set and some entity
sets) relationship between any two relations in the relational schema. This suggeststhat we should try to
find such
relationship when wehave a single mapping that involves
more than one relation.
As we have indicated earlier,a relation in the relational
schema corresponds to one of the five
ER segments (part ofan ER diagram) in the
ER sechema. For
a single mapping involving more than one relation,
we first
find all ER segments in the ER schema corresponding to
those relations in the single mapping. Once
we haveall the
ER segments, we find a traversal of the ER diagram that
includesall theERsegments.Thistraversa1 will then contain
the relationship between the relations in the single m a p
ping.Thenextthingtodoistoconnectthequalifier(WHERE
clause) in the single mapping into the qualifier on trathe
versal (part of the ER diagram that encompasses all the ER
segments). The following rule summarizes the above and
can be used to guide the translations of a single mapping
involving more than one relation into the ER DML.
Type3 Relation:A type 3 relation in the relational
schema
comes from (being mapped from)
a binary relationship with
attributes in the ER schema. The binary relationship is of
either type 1: N o r type N: 1, but not of type
N:M, which is
mapped into a type 4 relation. An attribute of a type
3 relation i s either one of the attributes of the binary relationshipRule 3: For a single mapping involving more than one
or one of the key attributes of the two entities this binary
relation,eachofwhichisofoneofthefivetypes,
relationship connects. Clearly, the relationship in this
case
generate a GET statement in the ER DML. The
(binary) is a special case of that in the previouscase (k-ary).
a-list of theGET statement takes the form of the
Therefore, the translation of a single mapping involvinga
select-clause in the single mapping.
The WHERE
type 3 relation can be done by using rule 2.
clause of theGET statement includestwo parts.
Type4 Relation:A type 4 relation in the relational
schema
The first part contains the traversal of the ER
corresponds to an N:M binaryrelationshipinthe
ER
schema. The second part takes the form of the
schema. Again, this is a special case of a k-ary relationship.
WHERE-clause in the single mapping. The traThe translation of a single mapping involving
a type 4 relaversal of theER schema i s generated by first findtion can, therefore, also be done by using rule 2.
ing the corresponding
ER segments for therelaType 5 Relation: In order to understand the formation of
tions in the single mapping and then taking part
a type 5 relation, the conceptsof source and targetentities
of the ER diagram that includes all the ER segneed to be introduced. Let €1 and €2 be the entity sets
ments
involved in relationship
set R, of type 1:N. Then €1 is
referred toas the source entity set and €2, the target entity
Example: See the example in Fig. 15.
set. When an ER schema is mapped intoa relational schema,
Nested Mapping: With SQL it is possible to use the result
for each type 1:N relationship set without attributes,a type
of a mapping
in the
WHERE clause of another mapping.
This
5 relation is created in the relational
schema. Theattributes
operation is called nested mapping. Nested mappings are
ofthetype5relationconsistofalltheattributesofthetarget
not restricted to only levels.
two When processinga nested
entity plus thekey attribute of thesource entity. To transmapping, the innermost mappingis executed as though it
5 relation into the
ER
late a single mapping involving a type
were a single mapping; the result of the mapping
is passed
DML, the threeclauses (SELECT, FROM, and WHERE) of the
to the outer mapping and the outer mapping
proceeds
then
singlemapping areexaminedfirst; if thekeyattributeof the
as though itwere given set
a of constantsin place the inner
source entity appears in one or more of theclauses, then
mapping. This continues from the innermost mapping out
CARDENAS:HETEROGENEOUSDISTRIBUTEDDATABASEMANAGEMENT
597
until it reaches the outermost mapping. Similarly, the ER
DML(theGlobalConceptual
Language) allows forthe
embedment of a GET statement in the WHERE clause of
another GET statement. This nested GET statement feature
makes it possible to map a nested mapping
in SQL into the
ER DML. The following rule guides such a translation.
Rule 4
For each nested mapping, generatea nested
GET
statement in the following manner. Working
from the innermost mapping out, each
for mapping seen, which is a single mapping, generate
a GET statement using the rules describedearlier for single mappings. If the current single
mapping has a single mapping in its WHERE
clause, which should have been mapped into a
GET statement dueto thefact that wework from
insideout,thentheWHEREclauseofthecurrent
GET statement is combined with the innerGET
statement to form thenew WHERE clause. This
process continues until the outermost mapping
is mapped into the ER DML.
€xample: See the example in Fig. 16.
We stress that the overall translation approach in our
HD-DBMS effort will holdeven if thesource relational language and the target ER DML were tovary. This has been
one of our requirements. Thus the translation would be
extended t o other relational materializations. The same
holds for the other types
of DML and correspondingtranslation schemes within our scope.
F. View Update
While we havestated the ideal long-range
goals, we have
identified problems thatmay impose limits on the types of
user views of the databases and particularly on the types
of data accessing commands that may be issued from the
VM user level. We have sorted out the various problems,
assessed the possibility andcost of solution, identified the
limitation ontypes of commands and data
model mapping
if such problems are not solved, and outlined possible solution approaches. As an example,the magnitude of foreseen
and unsolved problemsappears to have led most effortsto
not to permit updating
database,
a
evenwhile forcingeach
user wishing access t o a heterogeneous database to abide
by a new or common model and query language.
The “view update” problem in relational systems is one
major problem in the distributed heterogeneous
case even
if relationalsystems are not involved; constraining the differences permitted betweeen user views and local logical
models alleviates the problemand makes it more solvable.
We have now formally identified the rules of the
game to
permit 1)updating commandsto various degrees and2) differences i n mapping between the
user view and the underlyingparticipating databaseschemas, whilepreserving
integrityconstraints. We first assessed actual view updating
in IMS, SQUDS, DB2, Oracle, Ingres, and QBE. We also analyzed paper approaches proposed by various authors.
We are now designing the
mechanisms for DBAsorusers
for logically and easily expressing various limitations or
controls on the types of user views, data accessing commands, and updates so as to preserve stated integrity controls and various degrees of transparency of distribution
598
and heterogeneity. The role of the Prolog language or of
some of its mechanisms as an internal mechanism to formally express such controls are being considered.We are
now identifying the translation of such controls to corresponding controls (DDL andlor application programs) on
specific DBMS.
We have identified the major issues referred t o as the
”view update problem”andalso mostof the required integrity controls or database update decisions that DBAs or
users must make to solve most, it not all, realistic view
update problems.
G. Futher Features
A very brief synopsis ofwork wehave donein twomajor
areas follows.
Protocols: We have identified the protocol information
needed to implement theHD-DBMS. In developingthese
protocols, the logical components
within the HD-DBMSto
implement these protocols were also defined. The protocols defined describe the information
exchange neededto
enable the various logical components of the HD-DBMS
handshake or communicateso as t o maintain data integrity
in thesystem and also to handle the translations.The protocols allowthe components to implement: queries andlor
updates on data within the system; aborts on queried
updates; delayed updates; broadcasting andhandling systems status (as in upldownlrecovering).
Inaddition todefiningthe protocols,theformat bywhich
the protocols travel between the logical componentswas
alsodefined. Ample example scenarios
of events within the
HD-DBMS have been created. Each scenariocontains
detailed illustration of the protocols needed to handle the
event and the sequence in whichthey are used.
lnternal Model: A major model of the HD-DBMS is the
internal model, both at the global and the local levels. A
generalized database access path model has been defined
for the purpose of representing relationships between data
entities in theHD-DBMS [20]. This data model, termed the
Generalized DataAccess Graph (GDAG), is a major architecturalcomponent.
The GDAG is maintainedbythe
HD-DBMS as part of the network data dictionary
(catalog).
It encompasses the capability of modeling the
access paths
of the three major data models, via a common data independent notation. A salient capability is the modeling of
inter-database relationships using an equivalent notation.
IV. CONCLUDING
REMARKS
We have outlined thelanguage desiderata for data
sharing and accessing in the increasing scenarios of heterogeneous databases. We have cited the major approaches
t o data sharing and accessing: from the primitive commercial file and
database unloadlload and PC download, to
common interfaces on topof existing DBMS, to the R & D
and prototype efforts toward the long-range goals. Commercial availabilityof the more encompassing thrusts
may
become a realitywith the mounting problems, opportunity
costs, and demand for data sharingin the heterogeneous
world.
The HD-DBMS project is highlighted herein, with a presentation of its status, progress, and plans. It i s a longer
range project,with the unique
feature of allowing any user
PROCEEDINGS
OF THE
IEEE, VOL. 75, NO. 5, MAY 1987
in the network
to use his preferred database model andDML
to access any data in the heterogeneousnetwork; another
distinguishing feature, thus far, is its support for updating,
not only for read-type accessing.
Prototype implementation of theHD-DBMS for proof of
concept will follow. The first thread probably will be to
translate:
from a CODASYL DML at the virtual levelinto ER DML
into SQL
from SQL at thevirtuallevel
into ER DML into
CODASYL DML.
Prototyping will first face read-only commands and immediately thereafter updating commands. A robust data dictionary will beused, undoubtedly extending its model,t o
implement the crucial network-wide dictionarykatalog.
We intend use
to graphical mouse-oriented tools
to paint
ER database models. ER data definitions and graphical ER
diagrams should eventually be generatedautomatically
from existing DDLs, and DDLs should be generated automatically also from ERdata definitions andgraphical ERdiagrams.Schema
integrationintotheglobalconceptual
modelshouldbe
semi-automated; the reverse process
should also be automated.
Although theflavor of presentation is “bottom-up,” that
is, starting with existing individually designed heterogeneous databases, the system is also targeted for new databases being designed globally from the
start, and then being
distributed in the heterogeneous environment. The latter
will be a growing case as the flexibility of heterogeneous
distributed systems becomes available.
ACKNOWLEDGMENT
The author wishes to acknowledge the contribution of
the followingpast and current members of theHD-DBMS
project: E. Nahouraii andM. H. Pirahesh (IBM Corp.), J. BenZvi and J. Horowitz (Informatics), G. Chen (Hughes Aircraft), W. Johnson(Lockheed), A. Chen, and G. Wang. The
collaboration and support of Informatics General Corporation is appreciated. Finally, he wishes to thank the two
anonymous reviewers for their comments.
REFERENCES
M. Adiba and D. Portal, “A cooperations system for heterogeneous data base management systems,” lnformat. Syst.,
vol. 3, no. 3, pp. 209-215, 1978.
I.R. Abrial, “Data semantics, in Coflf. Proc. lflf-TUWorking
Conf.onDataBaseManagement(Cargese,Corsica,Apr.
1974), J. W. Klimbie and L. Koffeman, Eds. Amsterdam, The
Netherlands:North-Holland, 1974.
P. Atzeni andP. P. Chen, “Completeness of query languages
for the entity-relationship model,” in Proc. Zndlnt. Conf. On
Entity-Relationship Approach, P.P. Chen,Ed., ER Institute,
1981.
A. F. Cardenas and M. H. Pirahesh, ”Database communication in a heterogeneous database management system network,” lnformat. Syst., vol. 5, no. 1, pp. 55-79, 1980.
-, “The E-R model in a heterogeneous data base management system network architecture,”in P. Chen, Ed., froc. lnt.
and
Conf. on Entity-Relationship Approach to System Analysis
Design. Amsterdam,The Netherlands: North-Holland, 1980,
pp. 577-583.
A. F. Cardenas and G. Wang, ”Translation of SQUDS data
accesshpdate into entity/relationship data accesshpdate,”
in Proc. 4th lnt. Conf. on the E-R Approach (Chicago, IL, Oct.
CARDENAS: HETEROGENEOUS DISTRIBUTED DATABASE MANAGEMENT
28-30, 1985).
P. P. Chen, “The entity-relationship model-Towarda unified
view of data,“ ACM Trans. Database Syst., vol. 1, no. 1, Mar.
1976.
-, “An algebra for a
directional binary entity-relationship
model,” in froc. 7st /€€E COMPDEC (Los Angeles, CA, Apr.
1984), pp. 37-40.
E. F. Codd, “A relational model of data for large shared data
banks,” Commun. ACM, vol. 13, no. 6,1970.
-, ”Relational completeness of data base sublanguages,”
in DataBaseSystems, R. Rustin,Ed.Englewood
Cliffs, NJ:
Prentice-Hall, 1972.
CullinetSoftwareInc.,”IDMSIR,summarydescription,”
Westwood, MA.
Cullinet Software Inc., ”Goldengate, summary description,”
Westwood, MA.
U.DayalandH. Y. Hwang,“View definition and generalization for database integration in a multidatabase system,”
/FEE Trans. Software Eng., vol. SE-10, no. 6,pp. 628-645, Nov.
1984.
U. Dayal, ”Query processing in a multidatabase system, in
Query Processingin Data Systems, W. Kim, D. Reiner, and D.
Batory,Eds.NewYork,NY:Springer-Verlag,
1985.
S. R. Dumpala andS. K. Arora, “Schema translation using the
entity-relationshipapproach,”
in froc.2nd lnt. Conf. on
Entity-Relationship Approach, P.P. Chen,Ed., ER Institute,
1981.
R. Elmasri and G. Wiederhold, “GORDAS:
Aformal high-level
query language for the entity-relationship model,” in froc.
2nd lnt. Conf. on Entity-Relationship Approach (Washington,
DC, 1981).
A. Ferrier and C. Stangret, “Heterogeneityin the distributed
database management systems SIRIUS-DELTA,” in Proc. 8th
lnt. Conf. on VeryLargeDataBases(MexicoCity,Mexico,
Sept. 8-10, 1982), pp. 45-53.
V. D. Gligor and G. L. Luckenbaugh, “Interconnecting heterogeneous data base management system,” /€€€Computer,
vol. 22, pp. 33-43, Jan. 1984.
Honeywell Information Systems, ”Relational queryhnteractive query reference manual,” Manual #DR52.
J. Horowitz and A. F. Cardenas, “Relationships in a heterogeneous distributed database environment,” submitted for
publication to lnformat. Syst.
H. Y. Hwangand U. Dayal, “Using the entity-relationship
model for implementing multiple model database system,’’
in Proc. 2nd lnt.Conf. on Entity-Relationship Approach,P.P.
Chen, Ed., 1981.
IBMCorp.,“SQUDS,conceptsandfacilities,”Reference
Manual GH24-5013.
Informatics General Corp., “Answer/DB reference manual,”
Canoga Park, CA.
Informatics General Corp., “Distributed application generator, technical system description,” Canoga Park, CA.
Informatics General Corp., “LotuslAnswer,” ”Visi/Answer,”
and “dBase II/Answer,” Reference Manuals, Canoga Park, CA.
j. lossiphidis, “A translation to convert the DDLof
ERMto the
DDL of System 2000,” in Proc. lnt. Conf. on Entity-Relationship Approach to System Analysis
and Design, P. P. Chen, Ed.
Los Angeles, CA, 1979).
B. E. Jacobs, “On database logic,” J. ACM, vol. 29, no. 2, pp.
310-332, Apr. 1982.
R. H. Katz, ”Database design and translation multiple
for
data
models,” Ph.D. dissertation, UC Berkeley, 1980.
R. Katz and N. Goodman, “View processingin multibase-A
heterogeneous
database
system,”
in Entity-Relationship
Approach to lnformation Modeling and
Analysis, P. P. Chen,
Ed., ER Institute, 1981.
R. H. Katz and E. Wong, “Decompiling CODASYL DML into
relational queries,” ACMTrans. Database Syst., vol. 7, no. 1 ,
pp. 1-23, 1982.
T.A.Landersand R. L. Rosenberg, “An overview of multibase,” in DistributedDatabases, H. j . Schneider, Ed. Amsterdam, The Netherlands: North-Holland, 1982.
M. Levin, “The DlAM theory of algebraic access graphics,”
Sterling Systems, Inc., Denver, CO, 1980.
Y. D. Lien, “Hierarchical schematafor relationaldatabases,”
ACM Trans. Database Syst., vol.6, no. 1, pp. 48-69, Mar. 1981.
1341 H. M. Markowitz, A. Mallhota, and D. P. Pazel, "The ER and
EAS formalisms for systemmodeling,and the EAS-E language," in Proc. 2nd Int. Conf. on Entity-Relationship
Approach (Washington, DC, 1981).
E. Z. Nahouraii, L. 0. Brooks,andA.
F. Cardenas,"An
approach to data
communication
between
different
GDBMS," in Proc. 2nd Int. Conf. on VeryLargeDataBases
(Brussels, Belgium, Sept. 1976).
C. Parent and S. Spaccapietra, "An entity-relationship algebra," in Proc. Ist /E€€ Conf. on Data Engineering (Los Angeles, CA, Apr. 24-27,1984), pp. 500-507.
L. S. Schneider "A relational query compilerfor distributed
heterogeneous databases," IFlP TC 2.6, NASWG, Jan. 1977.
Conf.Reston,VA:AFIPSPress,
pp. 487-499.
[45] G . Sockut,"Aframeworkfor logical-level changeswithin data
base systems,'' IEEE Computer, vol. 23, pp. 9-27, May 1985.
[46] E. Wong and R. H. Katz,"Logicaldesignandschema.conversion for relational andDBTG databases," in Proc. Int. Conf.
on Entity-RelationshipApproach to SystemAnalysis and
Design, P.P. Chen, Ed., Los Angeles, CA, 1979.
SDDTGofCODASYLSystemsCommittee,"Astoreddatadef-
inition language for the translationof data," Informat. Syst.,
vol. 2, no. 3, 1977.
M. E. Senko, E. 6. Altman, M. M. Astrahan, and P. L. Fehder,
"Data structures and accessing in database systems," ISM
Syst. I., vol. 12, no. 1, 1973.
M. E. Senko,"DIAMasadetailed exampleof theANSllSPARC
architecture," in Proc. IFIP-TC2 Working Conf. Modeling in
Data Base Mangement Systems (Freudenstadt, Germany, Jan.
1976), C. M. Nijssen, Ed. Amsterdam,TheNetherlands:
North-Holland, 1976.
N. Shu, B. Housel, and V. Lum, "CONVERT
highA level translation definition language for data conversion," IBM Corp.
Res.Rep. RJ 1500, San Jose, CA, Jan. 1975.
N. Shu et a/., "EXPRESS: A data extraction, processing and
restructuring system," ACM Trans. DatabaseSyst.,vol. 2, no.
2, June 1977.
D. W. Shipman, "The
functional data model and the language
DAPLEX," ACM Trans. Database Syst., vol. 6, no. 1, pp. 140173, Mar. 1981.
J. M. Smith eta/., "MULTIBASE-Integrating heterogeneous
distributed database systems," in Proc. 1981 Nat. Computer
PROCEEDINGS OF THEIEEE,
VOL. 75, NO. 5, M A Y 1987