Heterogeneous Distributed Database Management: The HD-DBMS
Transcription
Heterogeneous Distributed Database Management: The HD-DBMS
Heterogeneous Distributed Database Management: The HD-DBMS ALFONSO F. CARDENAS Invited Paper The proliferation of different D B M S and advances in computer networking and communicationshave led to increasing heterogeneous distributed D B M S network scenarios. Major heterogeneity problems and challenges include: different database models, syntactically and semantically different DBMS, different types of controls (recovery, etc.), etc. We addressherein the long-range goal for a heterogeneous distributed DBMS (HD-DBMS) to be able to s u p port a network in which any user in any node can be given an integrated and tailored view or schema, while in reality the data may reside in one singledatabase or in physicallyseparated databases, managed individually by the same type of D B M S (by the only one theuser understands) or by different DBMS. We cite the major approaches to data sharing and accessing: from the primitive commercial file and database unload/load and PC download, to common interfaces on top ofexisting DBMS, to the R&D and prototypeefforts toward thelong-range desires. Commercial availability of the more encompassing thrusts may become a reality withthemounting problems, opportunity costs, and demand for data sharing inthe heterogeneous world. Major research and development projects in this arena areleading toward some partial attainment of thelong-range objective. The UCLA HD-DBMS project is highlighted herein, with a presentation of its status, progress, andplans. Itis a longer range project, with the uniquefeature of allowingany user in the networkto use a preferred database model and D M 1 to access or update any data in theheterogeneous network. H D D B M S is to provide a multilingual interface to heterogeneous distributed databases. I. INTRODUCTION The use of different generalized database management systems (DBMS) has proliferated in recentyears.As a result, the heterogeneous distributed database management system scenario has emerged. An example is shown in Fig. 1. A variety of large and small computers and even personal computers, mostof them with their own and incompatible DBMS, may betied togetherin a networkas shown. Satellite communication may be involved between distant nodes. Local networks of computers might involved, be such as at Manuscript received October 22,1985; revised November 26, 1986. The author is with the ComputerScience Department, University of California, Los Angeles, CA 90024, USA, and with Computomata International Corp., Los Angeles, CA 90025, USA. IEEE Log Number 8714292. location Xin thefigure. Database machines may be involved in managing the databaseb) at a node, or ina local network. The heterogeneous database environment has emerged in many organizations, governmental environments, and computer networks dueto a) the proliferation of databases b) the proliferation of different DBMS c) the proliferation of a variety of minicomputers and personal computers d) the emergence of networks tying together heterogeneous hardware and software e) advances in data communications f ) distributed databases g) lack of overall (not justlocal) database planning and control. This environment adds to all the challenges and problems for the homogeneous distributed environment the problems of heterogeneity of DBMS: different data models (network, hierarchical, relational, etc.), syntactically and semantically different DBMS (e.g., even within the relationalmodelfamilythere are significantdifferences between SQL and QBE), different types of controls in each GDBMS (e.g., backup and recovery, locking and synchronization, etc.). It is desired thatafuture heterogeneous distributed DBMS (HD-DBMS) provide not only distribution transparency but also heterogeneity transparency. The example, in Fig. 1, shows four databases involved: at IocationXthere isadatabase managed bya relational DBMS and another managed by a network DBMS (e.g., a CODASYL System) on another local computer, and at two other remote locations there are two separate databases, each managed bya hierarchical DBMS such as IMS. With current is expected technologies every user accessing any database to use the facilities and abideby thesyntactic and semantic regulations of the DBMS which created eachdatabase, unless some interface software is developed by theinstallation. Although some such interface softwareis, of necessity, being developed frequently by user installations, thus far it allows only cosmetic variations from the syntax and semantics of the DBMS managing the particulardatabase. PROCEEDINGS OF THE IEEE, VOL. 75, NO. 5, M A Y 1987 u)uIKH x Fig. 1. Heterogeneous database management system scenario-an example. What would be greatly desired to enhance the attractiveness and usefulnessof sharingdata resources in a heterogeneous network, as shown in Fig. 1, is the ability for a user to access any database as if it were managed under any one of theDBMS at one central location. Thus a user could have accessto any databasethrough a relational view at oneof the minicomputers in local the networkat location X, while anotherset of users, at nodeswhere IMS databases reside, could have accessto any databaseas if it were managed by IMS. Ideally, a user anywhere could look at any database through his favoriteDBMS, whether or notit was the preferred one at his site. Therewill be,ofcourse,manyuserswhowillconfinetheir database accessesto a localdatabase managedby thelocal DBMS. In fact, they will undoubtedly constitute the majority of the bulk applications. However, there is a growing population of usersacross the heterogeneous scenario whose needs we address herein. In a nutshell, theideal long-range goals would be for an HD-DBMStobeabletosupportanetworkinwhichanyuser in any node can be given an integrated and tailored view or schema, while inreality thedata mayreside in one single database or in physically separateddatabases,managed individually by the same type ofDBMS (by the only one the user understands) or by a different DBMS. No HD-DBMS with such full capabilities is available today. There are many unsolved problems, and others remain to be uncovered. However, major research and development projectsin this arena areleading towardsome partial attainment of the previous long-range objectives. Section II outlines the range of approaches to the heterogeneous challenge, fromtheextremeof database unload/load, to a common interface for DBMS, to the top of the lineand long-range R&D and prototypeefforts. Section Ill outlines the UCLA HD-DBMS project and progress striving for the longerrange goals. 11. APPROACHES TO COMMUNICATION IN ENVIRONMENT A HETEROGENEOUS A. File and Database Unload/Load One extreme and simplistic approach t o accessing data in a heterogeneous environment i s to physically CARDENAS: HETEROGENEOUS DISTRIBUTED DATABASE MANAGEMENT unload the data from the source hardware/software environment, then store them in a common format understood and handled by both source and target environments, and load them into the target environment. This approach in fact has been used to unload/load data files across heterogeneous environments for several years. The common format has been usually ASC 11. In a number of cases, specialized types of data are unloadedlloaded via common formats specially designed and tailored to carry data descriptionsandother semantic information from source to target. Examples aresatellite telemetrydata, geographical data types, etc. comWith the emergenceand proliferation1)personal of putersandthe many different types (IBM PC, Apple’s McIntosh, etc.), 2) LocalArea Networks (LANs), and 3) incompatible software packages (spread sheets,word processors, file and databasemanagers,etc.), the need for unloadlload has increased. There is an increasing number of commercial “file transfer” programs whose task is to help transferfiles fromonemachine to another, providing increasing levels of help andtransparency over the many details that heterogeneity springs onto the unload/load process. An examination of the generalized file transfer technologycommerciallyavailable shows that thedata that can be easily transferred are essentially sequential files. Even random-access files are not easily transferable. The transfer ofmore sophisticatedfiles such as indexed sequential files, e.g., VSAM in large IBM operating systems, is not transparent and usually is not automated (try, for example,totransfer such files between Honeywell and IBM environments). Ausual approach is to unloadsuch indexed files to sequential files, stripping all indexing and othervendor-specific control information, use the sequential ASC II file transfer route, and load into the equivalent version (including indexing) on the target environment. Conversion software houses and specialists are usually necessary to dothis. Simplistic unload/load or file transfer programs are of little help in a database environment. All the crucial relatability know-how, indexing, and/orhashing, would be lost in converting the database into a number of individual 589 and attractiveness of relational data management has led sequential ASC I1files. The process of loading the database into the target environment would involve new database to many commercial relational DBMS. Relationalmicrodefinitions, new indexing definitions, invocation of loadingDBMS now predominateatthePC level. Furthermore, there is a tendency for various vendors to provide a relational utilities that might requirespecial formating over the files being transferred,etc. In all, it is a practically most difficult interface or view on top of their existing nonrelational DBMS. Examples of this are Cullinet’s IDMSlR providing a process. Try converting a CODASYL database schemaand relational interface to the internal CODASYL IDMS datacontents from any vendor you select to IMS or vice versa. base [Ill, andHoneywell’s PDQ permitting a relational Specialized Database LoadlUnload: A number of interface to operate directly on thenative CODASYL IDS/ pioneering efforts on thesubject of “file description and translation” in a true database environment were started D M IV database (or, as another option, on a copy of IDS/ the in the 1970s,[38] and others. Other more recent efforts D M IV database) [19]. A few vendors have first provided a relational interface on a native nonrelational system and include IBM’s Express [42]. Due to commercial interests, a numberof specialized database unloadlload packages have then have redone it into a more native relational system. been developed by number a ofvendors. The predominant Unfortunately, there is no relational standard. Although ones are the relational structure and relational calculus andalgebra are the common thread, and IBM‘s SQL and QBE may be 1) those that unload froma nonrelational database syslargely takenas de facto standards, the fact is that thereare tem and load to a relational database system; manyvariationsof SQLandQBE. Exceptforthesimpler read2) those that “download” a database or portions of a only commandSELECT. FROM. WHERE. ,there are database from a mainframe computer to a smaller noticeable variations in other areas, such as o n updating computer or PC. commands whose semantics and integrity controls vary The subtle difference between thesetwo types of packages greatly among implementations. Thus a standard relational is that the latter are more numerous, usually less sophisinterface among DBMS vendors has not evolved, and it ticated, and generallydownload intosequential ASC IIfiles appears that it will not evolve. Nevertheless, there will be for input to simple file handlers such as spread sheets, a common general way of structuring databases and s u p graphics packages, etc. porting operations, specifically project and join. FurtherAmong the most frequently cited mainframe database more, relational interfacesfor DBMS and mainframeto PC bridges may be exercised jointly in some cases. For examloadlunload software bridges are IMS’s Extract to unload portions froman IMS database and loadit as an equivalent ple, download data from the mainframe nonrelational dataSQUDS or DB2 database[22], and Honeywell’sPDQ facility base via its relational interface into the PC environment, to unload portions from an IDSlDM IV database to a relawhere the copy might be manipulated via another relational IQ database (essentially SQL) [19]. tional DBMS (like the dBase family). In spite of the relational DBMS differences, the interThere is a growing number of mainframePC data downloading packages. In fact, a growing numberof DBMS venconnection of DBMS with relational interfacesmay be augdors now offer such capabilityfrom theirDBMS to sequenmented furtherbya network-wide”genera1ized relational” interface thatmay provide a user at anynode in Fig. 1transtial files for use at the PC level. In a number of cases the parency over the relational DBMS differences. Such “gendownload may be invoked from the PC, and data are then downloaded from thedatabase into the following: eralized relational“ interfacewill nothave the challenge of a)ASC II or DIF format for use with popular spreadmapping schemas between different models, and translating between widely different database access languages; sheets, word processors, and even the dBase relational it only has to be concerned with relatively simpler differmicro DBMS; an example is Informatics’ AnswerlDB for downloading IMS data [23]-[25], and its 123/Answer, and ences between the various relational interfaces of each dBase/Answerpackages thattranslate the ASC 11 files DBMS in the network.Such generalized relational interface retrieved by AnswerlDBinto the proper internal format of or front-end t o a distributed relational DBMS network is exemplified by the SDC project outlined in thearticle by these packages. Templeton et al. in this issue. b) Special vendor format foruse with thevendor’s own PC software packages; an example is Cullinet’s facility to download data from its IDMSlR database into its PC GolC. Research and Prototype Projects dengatesoftware packages (including graphics,spreadA number of longer range R&D and prototype projects sheets, etc.) [12]. are aimed at achievingthe goals cited in the Introduction. A fundamental problem or challenge with downloaded They do not entail data unloadlload or download, nor the data is, since it is a redundant copy of mainframe data, the existence of relational DBMS or relational interfacesto every maintenance of consistency orsynchronization in an DBMS in the network. Such long-range projects address updatingenvironment.Theusualcurrentcommercial and perform in various ways the mapping or translation of approach is t o download andpropagate pertinent updates database structures and corresponding data-accessing lanfrom the maindatabase periodically, and either a) not perguage commandsillustrated in Fig. 2. Mostprojects mit updating from the PC level orb) permit updatingon the approach this by introducing intermediate database model PC level and not reflect the updates “upstream.” and databaseaccess language levels. Both thetypesof intermediate models and languages and the number of levels B. Relational Interfaces to DBMS vary, with the number of levels usually ranging from three to five depending on the project. Major efforts include One of the hopes of theadvocates of relational database UClA’s HD-DBMS project [4] the mainfocus of Section Ill; management is that it will be widelyadopted. The success .. .. .. PROCEEDINGS OF THE IEEE, VOL. 75, NO. 5, M A Y 1987 m lications it on have appeared literature open the in [4]-[6], [20], [35].This section-provides astatus of the project, progress, and near-term.plans. The HD-DBMS strives to achieve the major long-range goals cited inSection I, not constraining theuser to acommon arbitrarylanguage nor t o read-only queries; however, it is a very-long-range possibility, beyond the more achievable MULTIBASE and SIRIUS-DELTAtasks. Its primaryfocus is on the heterogeneity challenge, not on the database physical distribution challenge taken up by other efforts assuminga homogeneousorcommon DBMSenvironment. The HD-DBMS approach entails a global (network-wide) conceptual model of data and a global internal model of data. The global conceptual modeis a highly logical model I I I of the information content of the integrated system. It is used as avehicle in the processof understanding userquerFig. 2. Relationship between schema translation and DML ies and decomposing them to extract information from translation. is the individual databases. The globalinternalmodel access-path oriented model of the structure of the integrated system showing precisely the data structures and Computer Corporation ofAmerica’s MULTIBASE [13], [14], access paths actually available (e.g., network-wide access [29],[31],[MI; INRIA’s heterogeneous SIRIUS-DELTA [Iq; routes, local database relationships, inter-database relaand Informatics’ MARK V DAG [24]. In addition to these tionships, etc.), but independent of a specific implemenprojects; a number of authors have alsoaddressed thechaltation.The global internal model is the union of the internal lenge [I], [181, [261-[281, [301, [331, WI, I451, [461. models ofeach participating database. It is used as avehicle The majority of the current research and development inthe processof identifyingthespecific access paths efforts and initial commercial support expected simplify through the differentdatabases that should be followed to the task by requiring every user to communicate using a answer userqueries, while shielding user the from the need common language and data model [MULTIBASE, DAG, SIRto know the intricacies of the access path implementation IUS-DELTA]. A frequent choice is a relational model [SIRand physical storage of data.The global internal model IUS-DELTA]. MULTIBASE further simplifies the task for a identifies major elements outside the realm or interests of more near-term achievable system by handling only read each local DBMS: relationships between entities in differtype of globaldatabase requests; all updates are managed ent DBMS, logical replication, and perhaps physical replocally by individual sites. The complexity and restrictions lication of entities and relationships in heterogeneous dataof updatingthrough user views in relational DBMS is bases. acknowledged. The initial commercial version of MULTIAn extension of the ER model proposed by Chen [q is BASE may be available in thenear future. It will provide distribution transparency and heterogeneity transparency for fundamentally used for the conceptual level, rather than other models [I], [2]. Our model for the internal level[ZO] read-only global queries using DAPLEX as a common lanis an evolution of our earlier proposal[35]; it was inspired guage and data model [43]. (See the article by Chan et al. by and includes ingredientsfrom DlAM (Data Independent in thisissue which includes asynopsis of DAPLEX.) In conAccessing Model) [32], [39], and [40]. trast, HD-DBMS provides a multilingual interface t o hetOther significant efforts toward heterogeneous DBMS erogeneous distributed databases, while theseother networks propose providingusers with either anew model systems provide only a monolingual interface to heteroview, typicallya relational view (MARKV DAG a hierarchical geneous distributed databases. view), of every database, and one query language to be DAG (Distributed Application Generator)(241 intends to eventually translated into search programs to access the be a generator of applications and also of the necessary DBMS commands embeddedin the application program to actual databases. A crucial difference between our project and others is that we wish to permit each user or program accessdatabases managed by IMS andlor SQUDS.The at a node to view andaccess data in thedatabase model and database view t o the application is a logically integrated language desired rather than force learning another lanhierarchical IBM database, although it may be composed guage or reprogramming for another model and language. of portions residing in several separate IMS and/or SQUDS The desired languages would be constrained to a few, of databases at different sites and under different IBM opercourse, but not to only one in a given database model. ating systems and data communications software (CICS, Fig. 3 shows the proposedsystem architecture. Theglobal IMSIDC). query translator processes the query initially submittedby 1 ---- I l l . THE UCLA HD-DBMS PROJECT A. Overall Architecture The UCLA HD-DBMS project is a multi-year, long-range project startedin thelate 1970s. Since 1983 part of the project has involvedcollaborationandsupportfromInformatics General Corp. (now Sterling Software). Several pub- GENEOUSCARDENAS: auserand,withtheknowledgeofthevirtualdatabasemodel associated to that query,translates it to the formacceptable by the global conceptual model (an ER model) and global internal model.The query is then decomposed bya query decomposer andaccess path selector, a translator,into the appropriate subquery(ies). The subquery(ies)will thenhave to be translated into the query language or data manipulation language of a specific DBMS, so as to then be pro- 591 ~ I l u t l o F'rogrmn n $ a = a v i a l Layer Un1114 Vimal Lsyw U n iG l o w l a y e r 1I I I I , L o u l D N h Fig. 4. Layered architecture for the HD-DBMS. MODEL 1 AREA AREA MODEL 2 bREA Fig. 3. System architecture and building blocks to support communication in a heterogeneous database environment. an ER representation of the application program's virtual layer view. 3) The UVL query is then mapped into a unified global layer (UGL) query. The UGL is an ER conceptual representation of the entire heterogeneous database. It represents the union of individual unified local layer (ULL) database views. 4) The UGLquery is transformed into a set of one or more ULL queries andan access plan. A ULL definition exists for each physical database. Externally, a ULL definition of a physical database is an ERviewof thatdatabase. Internally, ULL accesspath specifications existfor data within a single physical database and for each interdatabase relationship between two or more physical databases. 5) A ULLquery istransformed intoa local layer (LL) DBMS dependentquery, and then sentto local the DBMS.TheULL queries are performed according to theprecedence established by theaccess plan. Once theresults of the original queryare obtained, the data are translated back through the layered architecture cessed by the corresponding node(s) t o extract the information from the specific physicaldatabase(s) involved. The answers to thesubquery(ies1 arethen joined togetherand reformatted by the query composer, a translator, according to thevirtual database model. The result is the answer to the original querybased on the user's virtual model. Therewill be,ofcourse,manyuserswhowiIIconfinetheir queries locally to a given physical database managed by a given DBMS.TheywiII undoubtedlyconstitute the majority of the bulk volume applications. In this case, the local DBMS will process their queries directly and completely. The global query translator, the query decomposer and access path selector, and the querycomposer will notbe needed for such cases. HD-DBMS Layered Architecture: A number of important -w NUMBER r PAR? WULL r PART MSCRlPMN catalogs or directories and mapping or translation procedures for data structures and data access commands are NON-NULL r PAR?CLISSIFlCAllON necessary. Fig.4 shows the five different layers of our architecture and their associated models. The local layer conWULL r WAREHOUSENUMBER tains the physical databases actually stored.The outermost DESCRIPTK)N WOKWULL r WAREHOUSE layer is the collection of virtual databases as seen by the users of the heterogeneous database network. The outWULLNUMBER r PAR? ermost layer is the database network. The user deals with NUYBER ~OK~R r LWAUEHOUSE the outermostlevel, called the virtual model(VM),and the HAND ON WULL r WANW system should handle all the necessary mapping to extract information from the localphysical databases. NOKNULL r Following Fig. 4: NON-NULL r 1) An application program databaseview is defined using r the data definition language of a host DBMS. This view is ".NULL r defined to the HD-DBMS at the virtual layer (VL). r 2) An application program query (DML or query comr mand) entersthevirtual layer and is transformed by the HDFig. 5. DB1 definition: An SQL relational database. DBMS into a unified virtual layer (UVL) query. This layer is 592 I '1 't *' I *I I PROCEEDINGS OF THE IEEE, VOL. 75, NO. 5, M A Y 1987 SCHEMA NAME IS PART-WAREHOUSE into the form expected by the application program. This involves both structural and data translation. AREA NAME IS DATA-AREA RECORD NAME IS PART LOCATION MODE IS CALC HASH. P t USING B. Example Heterogeneous Database Network P# IN PART DUPLICATES ARE NOT ALLOWED The following is an example of a close-to-reality heterogeneous database network. It will be used in subsequent sections. The scenario consists of four databases under different DBMS: SQL (two databases), CODASYL, and IMS. Each of the databases i s defined inFigs. 5-8.Fig. 9 presents the unified global conceptual ER model (UGCM) that covers the four databases; note that the partitioned global conceptual model shows the contribution each of of the fourdatabases to the UGCM. WITHIN DATA-AREA 02 PX : TYPE IS CHAR 16 :TYPEISCHAR% 02PD 02 CLASS : TYPE IS CHAR 1 RECORD NAUE IS WH 02 WX : TYPE IS CHAR 5 02 WD :TYPE ISCHAR 16 020TY :TYF€ISDEC6 SET NAME IS INVENTORY OWNER IS PART MEMBER IS Wn SCHEMA NAME IS DB2. AREA NAUE Is DB-AREA MANDATORY AUTOMATIC ASCENDING KEY IS W I NAME IS PART. LOCATION UODE IS CALC H A W . P# USING P t IN PART. WPLICATES NOT ALLOWED. RECORD mw DUPLICATES ARE ALLOWED s n OCCURRENCE s E m n o N IS LOCATION MODE OF OWNER Fig. 8. DB4 definition: A CODASYL network database. WITHIN DB-AR€A. 02 PX TYPE IS CHAR 5. 02 PD TYPE IS CHAR 25. 02 CL TYPE IS CHAR 2. NAME IS WH. RECORD WlTnlN DB-AREA 02 W 1 TYPE IS CHAR 5. 02 WD TYPE IS CHAR 25. NAME IS INVENTORY SET OWNER IS PART. MEMBER IS WH. UANDATORY AUTOUAW. ASCENDING KEY IS W I IN WH. DUPLICATES ARE NOT ALLOWED. SET OCCIlRENCE SELECTION IS THRU LOCATION MODE OF OWNER. El: E2 COMWSEW R1: COMPOSED OF '4 Fig. 6. DB2 definition: A CODASYL network database. P A R I T W E E GLOBAL CONCEPTUAL MODEL DBD NAME I 001. ACCESS = HISAM DATASET OD1 = DEPTDDI. DEVICE = 3380. OVFLW = DEPTOVF SEGM NAME E PART, BVTES I32 LCHILD NAME = (COMP-ISSEM0. O W ) , PAIR = ASSEMI-COMP FIELD NAUE = (PI. SEOI. BYTES = 5. START = 1 FIELD NAME = PD, BYTES FIELD NAME 5 CL. BYTES = 2, START = 31 SEGM NAME = ASSEMB-COMP, BYTES = 10, PART WWOUSE R2: AVPWSl-IN OBI,): I h * b El ' P*, P o , CL = 25, START = 6 POINTER = (LPART. TWIN, LTWIN). PARENT = ((PART). (PART, PHYSICAL, 003) ) FIELD NAME = [PI. SEO). BYTES = 5, START = 1 FIELD NAME z O N ,BYTES SEGM NAME = COUP-ASSEMB, BYTES = 10. POINTER =PAIRED, = 5, START = 6 PARENT = PART, SOURCE = (ASSEMI-COMP, 003) FIELD NAME = (P#, SEO), BYTES = 5, START = 1 FIELD NAME = O N . BYTES = 5. START = 6 i..... ...................... . _ ASSEMB-COMP PU, QTY ~ L U CONCEPTUAL MODEL FORMED BY JOININGTHE ULCM OF DB(11 DM41 - Fig. 9. Global conceptual model in the HD-DBMS. Asampleof queriesissued atthe virtualconceptual model isshowninFig.10,withatraceofthedataaccessedthrough the various heterogeneous databases. PART PX, PD, CL ............................ UGCM THE WKlED G C. Database Mappingflranslation ........................... COMP-ASSEMB PX, QTY Fig. 7. DB3 definition: An ISM/DB hierarchical database. CARDENAS: HETEROGENEOUS DISTRIBUTED DATABASE MANAGEMENT The UGCM is the conceptual model of the integrated database. It is formed by the union of the ULCMs of the participating databases, and any inter-database relationships. A V M can be derived from the UGCMso that a VM 591 QUERY 1: FlNDTHESTOCKSTATUSOFTHEPARTWITHW=l12 NOTES Rw SWIm( n!4y b.fwndIn DB(1), DB(2) md DB(4), n c hunder s dWumnl GDBYS QUERY 2 Fig. 10. Sample queries. is independent of the organization orphysical disposition of the underlying database(s). Thus a number of crucial database mappings or translation procedures are needed. These translations in a few cases may be more like reformatting. The mappings should be kept at least in the network data dictionarykatalog, Fig. 3. Thedata model (schema or subschema) mappings or translations have been identified or developed thus far, from theuser view through thevarious data model layers, to the individualDBMS and back to theuser. We assessed ourworkandworkbyothersinthefieldandoptedforusing algorithms for the following specific translations proposed by Dumpala and Arora[IS]: Mapping Relational Schema into ER Schema Mapping Network Schema into ER Schema Mapping Hierarchical Schema into ER Schema Mapping ER Schema into Relational Schema Mapping ER Schema into Network Schema Mapping ER Schema into Hierarchical Schema These algorithms are ready for implementation. The following is just an example of the mapping between relational andER schemas. A relationin a relationalschema will correspond to one of the followingER constructs: * an entity a k-ary relationship a binary relationship with attributes (1:N or N : l ) an M : N binary relationship set without attributes an entity, plus key attributes of some other entities. Thus a relational query targetedat a relation will be translatedintodifferentquery commands at the ER level, depending on which of the abovedataconstructs are involved. More on this in Section Ill-E. D. Query/DML Translation Theterms"datamanipulationlanguage(DML)"or"query language" shall be used synonymously to refer to any of the data access languagesof the major typesof DBMS: CODASYL DML, relational SQL, or IMS DUI. The terms"database request" and "query" will also be used synonymously. As per Fig. 4, the queriesmade by a user on a V M should be translatedto theequivalent queriesat the UGCMlevel, then at the UGlMlevel, then at the ULlMlevel, and finally at the LLM level for processing by the particular DBMS involved; the answer is then composed or reformatted to adhere to theoriginal V M level. Thus a number of crucial mappings or translation procedures is needed. Fig.2 shows therelationshipbetween schema translationandDML translation. We provide ourprogress in the followingsections. I ) The ER DML Global Conceptual Language: The HDDBMS architecture uses an ER DML as the global conceptual language (GCL), atthe unified global conceptual level. all virtual layer DMLs are transThis is the DML into which lated. This is also the DMLwhose queriesare decomposed and distributedto various local physicaldatabases. Two of the most important justifications for aGCL are the following. First, a GCL reduces the number oftranslations (both schematranslationand DMLtranslations) necessarilywithin a distributed database system.It is easy to understand that, without a GCL, m x n translators would be needed in an HD-DBMS that has n physical databases and supports rn virtual model databases, while with a GCL, only m n translations would be needed. Secondly, a GCL allows for a single, conceptual view of the whole database, which, in reality, consists of a group ofheterogeneous physicaldatabases. Functional Requirements of a GCL: The single most important functional requirement of a GCL is that it be semantically "rich" enough to express queries fromall the virtual level DMLs. This meansthat, for any existing virtual level DML, any DML statement may find its equivalent in the GCL. It is not necessary to have a one to one correspondence between the GCL and other virtual level DMLs so long as the GCL is able to expressany statement expressed by a virtual level DML. How do we know if a GCL meets this requirement?There has not been a satisfactory answer to thisquestion despite various attempts that have been made. One of them is the introduction of the term "completeness" [3], [36]. Informally, a DMLis complete if, for a database, any piece of informationstored in thedatabase can be retrieved using that DML. A GCL that is complete should meet this requirement. Unfortunately, there is no consensus on the definition ofcompleteness for an ER based DML. In addition to the above requirement, it is desirable for a GCL to be as independent of the physical aspects of the database as possible. The reason for this is that a CCL is a DML against a conceptualdatabase only. This requirement alone excludes the possibility of using a procedural type DML (record-at-a-time DML) as CCL sincea procedural DML ties itself too closely to the physical aspect of a database. There are thus twochoices for a GCL: 1)an algebraictype of DML, and 2 ) a calculus type of DML. There have been several proposals for "ER algebra" in the literature[8],[36]. All those proposals are clearly inspired by the relational algebra proposed by Codd[9],[IO]. However, the situation is different in the ER model as opposed to the relational model. In the relational model, only thedata entity i s a relation. All the operations in the relational algebra apply to relations only. The result of any relational algebraic operation i s also a relation.In contrast, there are two basic data entities inthe ER data model: entityandrelationship. Semantically, they are different. An algebra that applies on + PROCEEDINGS OF THE IEEE, VOL. 75, NO. 5 , M A Y 1987 twodataentitiesisconsiderablymoredifficulttodefinethan one that applies on a single data entity since as the number of data entities increases the types ofthe outputdata entity and their semantic meanings seems to grow rapidly. The area of ER algebra is still at its infant stage. More research is needed to find a good definition ofER algebra. Consequently, we have not adopted any existing ER algebra for the GCL. The €I? DML: Our choice forGCL is a calculus type language. Fig. 11 shows a summary of the GCL. We call it cal- fiedcharacteristicsandrequirementsofalgorithms to translate between the various model and language layers of Fig. 4. Our major approachlobjective is to develop a “DDUDML compiler-compiler work bench” from which we can more easily develop the desired translations. Thus we have completed translation algorithms for: Hierarchical IMS D U I (except logical databaseandFast Path commands) into the ER DML CODASYL DML into the ER DML Relational Algebra into the ER DML Relational SQL into the ER DML. Some of this work, that focusing on the translation from SQL to ER DML, is presented in[6];examples of it are provided in the next section. Translation algorithms for the following are now being developed: ER DML into relational SQL ER DML into hierarchical D U I ER DML into CODASYL DML. Our next task is to start prototype implementation of a subset of the following algorithms for proof of concept: CODASYL DML into ER DML into SQL SQL into ER DML into CODASYL DML. Fig. 11. The ER DML global conceptual language. culustype because there is a naturalcorrespondence betweenthistypeofDMLandtherelationalcalculus.Afundamental aspect of a calculus-based DML is the notion of the tuple variable: In relational calculus, tuple variable is a variable that ranges over some named relation. In acalculus type ER DML (i.e., the proposed GCL), the ’a-list’ in the GET statement plays the similar role. An ’a-list’ is a variable that ranges over a specified set of ’paths,‘ where a path is a traversal of an ER diagram. The results from our research havedemonstrated that,with afew modifications, most DML (DUI,CODASYL DML, SQL, and relationalalgebra) against the corresponding data model (hierarchical, network, relational) find their equivalence in this GCL. Therefore, this GCL satisfies the first requirement posed earlier. This GCL has little, if anythingat all, to do with the physical aspects ofthe database, which is thesecond requirement. In arrivingat our requiredER DML, wealso analyzed four earlier relational-type languages proposed by otherauthors: EAS-E [MI, GORDAS [16], ERL 1211, and DAPLEX[29],[43]. EAS-E is very English-like, but seems best suited foran interactive query language rather than a good intermediary language. DAPLEX i s the query language based on the CCA functional data model. GORDAS is a read-only query language. However, the GETcommand it uses seemsvery powerful, andso our language patternsthe GETcommand after GORDAS. Our language is very similar to ERL. ERL claims to be a complete query language (READ, INSERT, MODIFY, DELETE), but there are a fewfeatures we dropped.The language presented will be seen to approach a relational language with the major addition of commands using interentity relations. 2) QueryIDML Translation Algorithms: Wehave identi- GENEOUSCARDENAS: Small programs in languages such as COBOL and C with CODASYL DMLand SQL embedded in them would be used to test the translation paths. E. SQL to ER DML Translation and Examples Herein we provide some insight into the translations involved, by outliningSQUDS to ER DML translation. The translation environment and scheme from SQUDS to ER DML has the following characteristics: It is composed of a set of 10 basic rules. Each SQL statement is one of six types of commands. Each SQL statement appliesto one of five types of relations. A rule may, in turn, cause other rules to be invoked. Fig. 12 outlines the translation matrix. It portrays the ten rules that compose the overallSQL to ER DML translation 1 , R W 4 1 RULE9 1 w ’ 9+10+ CONNECT 1 Fig. 12. SQL to ER DML translation scheme matrix. algorithm. Our translationcovers all SQL DML commands except “groupby” and “aggregate” functions which we may add later. Let uslook at three example translations.Fig. 13 provides two sample ER schemas and corresponding relational sche- 595 Example 1: SCHENA 1 R3 (P#, WX, On) R4 (P#.l, PX.2. O M W. O N . Single rmpping with mm than o n miation * A.scauing Mmpb Ubnm 1 . Ualngrub3 * s(xstatemmi WD SELECT PI. Wt, WD FROM Rl,W,Rl WHERE R1.W = '1" AND R1.W AND R3.Wt = R2.Wt R3.W I SCHEMA 2 GET (W,W#,WD) WHERE (El RZ €2 6 E1.W = ,100') Fig. 14. Example of SQL to ER DML translation. Example 2: - R (EMPX, DER#, NAME, BIRTH DATE, Fig. 13. Sampleschemaandcorrespondingrelational schema. mas. Figs. 14-16 provide three DML translationexamples. The translation scheme for SQL read-type commands follows, explaining in detail the Examples in Figs. 14 and 15, and much ofFig. 12. The translation detail forall SQL commands appears in [6]. In our data model translation strategy, adapted from [15], a relation in a relational schema corresponds t o one of the five following ER constructs: 1)An Entity 2) A k-ary relationship 3 A binary relationship with attributes (1 : N o r N:l) 4) An N: M binary relationship set without attributes 5) An entity, plus key attributes of some other relations. We shall call therespectiverelationstype 1, type 2, . . ,and type5 (see Fig. 13).We now discuss, for each type of relation, how a single mapping involving such a relation can be mapped into the ER DML. Type 7 Relation: In thiscase, a relation,R, with attributes A l , A2, ,An, corresponds to exactly an entity, E, with attributes A l , A2, ,An. Forexample, for the following relation in a relational schema: Relation: EMP(EMPNO,NAME,DNO,SAL) SELECT * FROM R1 WHERE P#tN SELECT PI F R o y R 3 WHERE * Wlr'Wl23' ERttamnt This InMI.dW m r m n i la gemmed first: GET(P#) WHERE (R2 6 R z W C W 1 2 3 ' ) ThhbUmRulmumnt(~rimaapnviournaiemnl nclMin the WHERE slur): GET(P#,PD,CL) WHERE (El R2 6 E1.W I GET(W) WHERE (R2 6 W W * W l Z 3 ' ) ) Fig. 15. Example of SQL to ER DML translation. Example 3: . - update Type5 relation . ACCesaing Mmpb achema * Udng rub 9,lO z s(xrtaiemeni ER schema having the following there exists an entity in the format: Entity EMP(EMPNO,NAME,DNO,SAL). The attribute names need not be exactly the same so long as their semantics remainthe same, for example,SAL in the relation versus SALARY in the entity. Thetranslation ofan SQLquery involving thistypeof relation into theER DML is straightforward since in both data models only one data entity is involved (a relation in the 5% UPDATE R SET D E P T l s ' W 6 "TLGENO' WHERE E Y W E l W ERrmtefnenl DISCONNECT E2(EWP&'ElMS3') FROM E l P4 R12 MODIFY E 2 W E N G ) WHERE (ELEYPb'E10493') CONNECT EZ(E"ElM93')TO EI(DEPTh'DZ3) IN R l Z Fig. 16. Example of SQL to ER DML translation. PROCEEDINGS OF THEIEEE, VOL. 75, NO. 5, M A Y 1987 relational model and entity in the ER model). The following rule is designed to guide such translation: Rule 1: For a single mapping involving a type1 relation, generate a GET statement in the ER DML. The a-list in theGET statement takes the form of the select-clause in the single mapping.The WHERE clause in theGET statement includes twoparts. The first is the name of the entity involved.The second takes the form of the WHERE-clause in the single mapping. Type2 Relation:A type 2 relation in the relational schema corresponds to a k-ary relationship in the ER schema. An attribute of a type2 relation is either one of the attributes of that k-ary relationship or one of the key attributesof the entities connected by the k-ary relationship. The rule for translating a single mapping involving a type 2 relation into the ER DML is as follows: 2 relation, Rule 2: For a single mapping involving a type generate a GET statement in the ER DML. The a-list in theGET statement takes the form of the select-clause in the single mapping. The WHERE clause in theGET statement includes two parts. The first consists of the corresponding relationship name and the names of the k-entities connected by this relationship. The second part takes the form of the WHERE-clause in the single mapping. rule 2 is used to guide the translation, otherwise rule 1 is used. Single Mapping Involving Morethan One Relation: Using the rules developedso far, we are able to translate a single ER DML. These mapping involvinga single relation into the rules alone have limited use since most queries, when expressed in termsof SQL, involve more than one relation. Let us discuss howthis kind of multi-relation mapping can be mapped into the ER DML. To start, we note thatat the global conceptual model level we have an ER schema which is a connected ER diagram. By connected we mean that any two entity sets in the diagram are connected via some relationship sets and some entity sets. This is important to our developing the translation rulessince this guarantees that thereis at least some directed (through a single relationship set) or indirected (through more than one relationship set and some entity sets) relationship between any two relations in the relational schema. This suggeststhat we should try to find such relationship when wehave a single mapping that involves more than one relation. As we have indicated earlier,a relation in the relational schema corresponds to one of the five ER segments (part ofan ER diagram) in the ER sechema. For a single mapping involving more than one relation, we first find all ER segments in the ER schema corresponding to those relations in the single mapping. Once we haveall the ER segments, we find a traversal of the ER diagram that includesall theERsegments.Thistraversa1 will then contain the relationship between the relations in the single m a p ping.Thenextthingtodoistoconnectthequalifier(WHERE clause) in the single mapping into the qualifier on trathe versal (part of the ER diagram that encompasses all the ER segments). The following rule summarizes the above and can be used to guide the translations of a single mapping involving more than one relation into the ER DML. Type3 Relation:A type 3 relation in the relational schema comes from (being mapped from) a binary relationship with attributes in the ER schema. The binary relationship is of either type 1: N o r type N: 1, but not of type N:M, which is mapped into a type 4 relation. An attribute of a type 3 relation i s either one of the attributes of the binary relationshipRule 3: For a single mapping involving more than one or one of the key attributes of the two entities this binary relation,eachofwhichisofoneofthefivetypes, relationship connects. Clearly, the relationship in this case generate a GET statement in the ER DML. The (binary) is a special case of that in the previouscase (k-ary). a-list of theGET statement takes the form of the Therefore, the translation of a single mapping involvinga select-clause in the single mapping. The WHERE type 3 relation can be done by using rule 2. clause of theGET statement includestwo parts. Type4 Relation:A type 4 relation in the relational schema The first part contains the traversal of the ER corresponds to an N:M binaryrelationshipinthe ER schema. The second part takes the form of the schema. Again, this is a special case of a k-ary relationship. WHERE-clause in the single mapping. The traThe translation of a single mapping involving a type 4 relaversal of theER schema i s generated by first findtion can, therefore, also be done by using rule 2. ing the corresponding ER segments for therelaType 5 Relation: In order to understand the formation of tions in the single mapping and then taking part a type 5 relation, the conceptsof source and targetentities of the ER diagram that includes all the ER segneed to be introduced. Let €1 and €2 be the entity sets ments involved in relationship set R, of type 1:N. Then €1 is referred toas the source entity set and €2, the target entity Example: See the example in Fig. 15. set. When an ER schema is mapped intoa relational schema, Nested Mapping: With SQL it is possible to use the result for each type 1:N relationship set without attributes,a type of a mapping in the WHERE clause of another mapping. This 5 relation is created in the relational schema. Theattributes operation is called nested mapping. Nested mappings are ofthetype5relationconsistofalltheattributesofthetarget not restricted to only levels. two When processinga nested entity plus thekey attribute of thesource entity. To transmapping, the innermost mappingis executed as though it 5 relation into the ER late a single mapping involving a type were a single mapping; the result of the mapping is passed DML, the threeclauses (SELECT, FROM, and WHERE) of the to the outer mapping and the outer mapping proceeds then singlemapping areexaminedfirst; if thekeyattributeof the as though itwere given set a of constantsin place the inner source entity appears in one or more of theclauses, then mapping. This continues from the innermost mapping out CARDENAS:HETEROGENEOUSDISTRIBUTEDDATABASEMANAGEMENT 597 until it reaches the outermost mapping. Similarly, the ER DML(theGlobalConceptual Language) allows forthe embedment of a GET statement in the WHERE clause of another GET statement. This nested GET statement feature makes it possible to map a nested mapping in SQL into the ER DML. The following rule guides such a translation. Rule 4 For each nested mapping, generatea nested GET statement in the following manner. Working from the innermost mapping out, each for mapping seen, which is a single mapping, generate a GET statement using the rules describedearlier for single mappings. If the current single mapping has a single mapping in its WHERE clause, which should have been mapped into a GET statement dueto thefact that wework from insideout,thentheWHEREclauseofthecurrent GET statement is combined with the innerGET statement to form thenew WHERE clause. This process continues until the outermost mapping is mapped into the ER DML. €xample: See the example in Fig. 16. We stress that the overall translation approach in our HD-DBMS effort will holdeven if thesource relational language and the target ER DML were tovary. This has been one of our requirements. Thus the translation would be extended t o other relational materializations. The same holds for the other types of DML and correspondingtranslation schemes within our scope. F. View Update While we havestated the ideal long-range goals, we have identified problems thatmay impose limits on the types of user views of the databases and particularly on the types of data accessing commands that may be issued from the VM user level. We have sorted out the various problems, assessed the possibility andcost of solution, identified the limitation ontypes of commands and data model mapping if such problems are not solved, and outlined possible solution approaches. As an example,the magnitude of foreseen and unsolved problemsappears to have led most effortsto not to permit updating database, a evenwhile forcingeach user wishing access t o a heterogeneous database to abide by a new or common model and query language. The “view update” problem in relational systems is one major problem in the distributed heterogeneous case even if relationalsystems are not involved; constraining the differences permitted betweeen user views and local logical models alleviates the problemand makes it more solvable. We have now formally identified the rules of the game to permit 1)updating commandsto various degrees and2) differences i n mapping between the user view and the underlyingparticipating databaseschemas, whilepreserving integrityconstraints. We first assessed actual view updating in IMS, SQUDS, DB2, Oracle, Ingres, and QBE. We also analyzed paper approaches proposed by various authors. We are now designing the mechanisms for DBAsorusers for logically and easily expressing various limitations or controls on the types of user views, data accessing commands, and updates so as to preserve stated integrity controls and various degrees of transparency of distribution 598 and heterogeneity. The role of the Prolog language or of some of its mechanisms as an internal mechanism to formally express such controls are being considered.We are now identifying the translation of such controls to corresponding controls (DDL andlor application programs) on specific DBMS. We have identified the major issues referred t o as the ”view update problem”andalso mostof the required integrity controls or database update decisions that DBAs or users must make to solve most, it not all, realistic view update problems. G. Futher Features A very brief synopsis ofwork wehave donein twomajor areas follows. Protocols: We have identified the protocol information needed to implement theHD-DBMS. In developingthese protocols, the logical components within the HD-DBMSto implement these protocols were also defined. The protocols defined describe the information exchange neededto enable the various logical components of the HD-DBMS handshake or communicateso as t o maintain data integrity in thesystem and also to handle the translations.The protocols allowthe components to implement: queries andlor updates on data within the system; aborts on queried updates; delayed updates; broadcasting andhandling systems status (as in upldownlrecovering). Inaddition todefiningthe protocols,theformat bywhich the protocols travel between the logical componentswas alsodefined. Ample example scenarios of events within the HD-DBMS have been created. Each scenariocontains detailed illustration of the protocols needed to handle the event and the sequence in whichthey are used. lnternal Model: A major model of the HD-DBMS is the internal model, both at the global and the local levels. A generalized database access path model has been defined for the purpose of representing relationships between data entities in theHD-DBMS [20]. This data model, termed the Generalized DataAccess Graph (GDAG), is a major architecturalcomponent. The GDAG is maintainedbythe HD-DBMS as part of the network data dictionary (catalog). It encompasses the capability of modeling the access paths of the three major data models, via a common data independent notation. A salient capability is the modeling of inter-database relationships using an equivalent notation. IV. CONCLUDING REMARKS We have outlined thelanguage desiderata for data sharing and accessing in the increasing scenarios of heterogeneous databases. We have cited the major approaches t o data sharing and accessing: from the primitive commercial file and database unloadlload and PC download, to common interfaces on topof existing DBMS, to the R & D and prototype efforts toward the long-range goals. Commercial availabilityof the more encompassing thrusts may become a realitywith the mounting problems, opportunity costs, and demand for data sharingin the heterogeneous world. The HD-DBMS project is highlighted herein, with a presentation of its status, progress, and plans. It i s a longer range project,with the unique feature of allowing any user PROCEEDINGS OF THE IEEE, VOL. 75, NO. 5, MAY 1987 in the network to use his preferred database model andDML to access any data in the heterogeneousnetwork; another distinguishing feature, thus far, is its support for updating, not only for read-type accessing. Prototype implementation of theHD-DBMS for proof of concept will follow. The first thread probably will be to translate: from a CODASYL DML at the virtual levelinto ER DML into SQL from SQL at thevirtuallevel into ER DML into CODASYL DML. Prototyping will first face read-only commands and immediately thereafter updating commands. A robust data dictionary will beused, undoubtedly extending its model,t o implement the crucial network-wide dictionarykatalog. We intend use to graphical mouse-oriented tools to paint ER database models. ER data definitions and graphical ER diagrams should eventually be generatedautomatically from existing DDLs, and DDLs should be generated automatically also from ERdata definitions andgraphical ERdiagrams.Schema integrationintotheglobalconceptual modelshouldbe semi-automated; the reverse process should also be automated. Although theflavor of presentation is “bottom-up,” that is, starting with existing individually designed heterogeneous databases, the system is also targeted for new databases being designed globally from the start, and then being distributed in the heterogeneous environment. The latter will be a growing case as the flexibility of heterogeneous distributed systems becomes available. ACKNOWLEDGMENT The author wishes to acknowledge the contribution of the followingpast and current members of theHD-DBMS project: E. Nahouraii andM. H. Pirahesh (IBM Corp.), J. BenZvi and J. Horowitz (Informatics), G. Chen (Hughes Aircraft), W. Johnson(Lockheed), A. Chen, and G. Wang. The collaboration and support of Informatics General Corporation is appreciated. Finally, he wishes to thank the two anonymous reviewers for their comments. REFERENCES M. Adiba and D. Portal, “A cooperations system for heterogeneous data base management systems,” lnformat. Syst., vol. 3, no. 3, pp. 209-215, 1978. I.R. Abrial, “Data semantics, in Coflf. Proc. lflf-TUWorking Conf.onDataBaseManagement(Cargese,Corsica,Apr. 1974), J. W. Klimbie and L. Koffeman, Eds. Amsterdam, The Netherlands:North-Holland, 1974. P. Atzeni andP. P. Chen, “Completeness of query languages for the entity-relationship model,” in Proc. Zndlnt. Conf. On Entity-Relationship Approach, P.P. Chen,Ed., ER Institute, 1981. A. F. Cardenas and M. H. Pirahesh, ”Database communication in a heterogeneous database management system network,” lnformat. Syst., vol. 5, no. 1, pp. 55-79, 1980. -, “The E-R model in a heterogeneous data base management system network architecture,”in P. Chen, Ed., froc. lnt. and Conf. on Entity-Relationship Approach to System Analysis Design. Amsterdam,The Netherlands: North-Holland, 1980, pp. 577-583. A. F. Cardenas and G. Wang, ”Translation of SQUDS data accesshpdate into entity/relationship data accesshpdate,” in Proc. 4th lnt. Conf. on the E-R Approach (Chicago, IL, Oct. CARDENAS: HETEROGENEOUS DISTRIBUTED DATABASE MANAGEMENT 28-30, 1985). P. P. Chen, “The entity-relationship model-Towarda unified view of data,“ ACM Trans. Database Syst., vol. 1, no. 1, Mar. 1976. -, “An algebra for a directional binary entity-relationship model,” in froc. 7st /€€E COMPDEC (Los Angeles, CA, Apr. 1984), pp. 37-40. E. F. Codd, “A relational model of data for large shared data banks,” Commun. ACM, vol. 13, no. 6,1970. -, ”Relational completeness of data base sublanguages,” in DataBaseSystems, R. Rustin,Ed.Englewood Cliffs, NJ: Prentice-Hall, 1972. CullinetSoftwareInc.,”IDMSIR,summarydescription,” Westwood, MA. Cullinet Software Inc., ”Goldengate, summary description,” Westwood, MA. U.DayalandH. Y. Hwang,“View definition and generalization for database integration in a multidatabase system,” /FEE Trans. Software Eng., vol. SE-10, no. 6,pp. 628-645, Nov. 1984. U. Dayal, ”Query processing in a multidatabase system, in Query Processingin Data Systems, W. Kim, D. Reiner, and D. Batory,Eds.NewYork,NY:Springer-Verlag, 1985. S. R. Dumpala andS. K. Arora, “Schema translation using the entity-relationshipapproach,” in froc.2nd lnt. Conf. on Entity-Relationship Approach, P.P. Chen,Ed., ER Institute, 1981. R. Elmasri and G. Wiederhold, “GORDAS: Aformal high-level query language for the entity-relationship model,” in froc. 2nd lnt. Conf. on Entity-Relationship Approach (Washington, DC, 1981). A. Ferrier and C. Stangret, “Heterogeneityin the distributed database management systems SIRIUS-DELTA,” in Proc. 8th lnt. Conf. on VeryLargeDataBases(MexicoCity,Mexico, Sept. 8-10, 1982), pp. 45-53. V. D. Gligor and G. L. Luckenbaugh, “Interconnecting heterogeneous data base management system,” /€€€Computer, vol. 22, pp. 33-43, Jan. 1984. Honeywell Information Systems, ”Relational queryhnteractive query reference manual,” Manual #DR52. J. Horowitz and A. F. Cardenas, “Relationships in a heterogeneous distributed database environment,” submitted for publication to lnformat. Syst. H. Y. Hwangand U. Dayal, “Using the entity-relationship model for implementing multiple model database system,’’ in Proc. 2nd lnt.Conf. on Entity-Relationship Approach,P.P. Chen, Ed., 1981. IBMCorp.,“SQUDS,conceptsandfacilities,”Reference Manual GH24-5013. Informatics General Corp., “Answer/DB reference manual,” Canoga Park, CA. Informatics General Corp., “Distributed application generator, technical system description,” Canoga Park, CA. Informatics General Corp., “LotuslAnswer,” ”Visi/Answer,” and “dBase II/Answer,” Reference Manuals, Canoga Park, CA. j. lossiphidis, “A translation to convert the DDLof ERMto the DDL of System 2000,” in Proc. lnt. Conf. on Entity-Relationship Approach to System Analysis and Design, P. P. Chen, Ed. Los Angeles, CA, 1979). B. E. Jacobs, “On database logic,” J. ACM, vol. 29, no. 2, pp. 310-332, Apr. 1982. R. H. Katz, ”Database design and translation multiple for data models,” Ph.D. dissertation, UC Berkeley, 1980. R. Katz and N. Goodman, “View processingin multibase-A heterogeneous database system,” in Entity-Relationship Approach to lnformation Modeling and Analysis, P. P. Chen, Ed., ER Institute, 1981. R. H. Katz and E. Wong, “Decompiling CODASYL DML into relational queries,” ACMTrans. Database Syst., vol. 7, no. 1 , pp. 1-23, 1982. T.A.Landersand R. L. Rosenberg, “An overview of multibase,” in DistributedDatabases, H. j . Schneider, Ed. Amsterdam, The Netherlands: North-Holland, 1982. M. Levin, “The DlAM theory of algebraic access graphics,” Sterling Systems, Inc., Denver, CO, 1980. Y. D. Lien, “Hierarchical schematafor relationaldatabases,” ACM Trans. Database Syst., vol.6, no. 1, pp. 48-69, Mar. 1981. 1341 H. M. Markowitz, A. Mallhota, and D. P. Pazel, "The ER and EAS formalisms for systemmodeling,and the EAS-E language," in Proc. 2nd Int. Conf. on Entity-Relationship Approach (Washington, DC, 1981). E. Z. Nahouraii, L. 0. Brooks,andA. F. Cardenas,"An approach to data communication between different GDBMS," in Proc. 2nd Int. Conf. on VeryLargeDataBases (Brussels, Belgium, Sept. 1976). C. Parent and S. Spaccapietra, "An entity-relationship algebra," in Proc. Ist /E€€ Conf. on Data Engineering (Los Angeles, CA, Apr. 24-27,1984), pp. 500-507. L. S. Schneider "A relational query compilerfor distributed heterogeneous databases," IFlP TC 2.6, NASWG, Jan. 1977. Conf.Reston,VA:AFIPSPress, pp. 487-499. [45] G . Sockut,"Aframeworkfor logical-level changeswithin data base systems,'' IEEE Computer, vol. 23, pp. 9-27, May 1985. [46] E. Wong and R. H. Katz,"Logicaldesignandschema.conversion for relational andDBTG databases," in Proc. Int. Conf. on Entity-RelationshipApproach to SystemAnalysis and Design, P.P. Chen, Ed., Los Angeles, CA, 1979. SDDTGofCODASYLSystemsCommittee,"Astoreddatadef- inition language for the translationof data," Informat. Syst., vol. 2, no. 3, 1977. M. E. Senko, E. 6. Altman, M. M. Astrahan, and P. L. Fehder, "Data structures and accessing in database systems," ISM Syst. I., vol. 12, no. 1, 1973. M. E. Senko,"DIAMasadetailed exampleof theANSllSPARC architecture," in Proc. IFIP-TC2 Working Conf. Modeling in Data Base Mangement Systems (Freudenstadt, Germany, Jan. 1976), C. M. Nijssen, Ed. Amsterdam,TheNetherlands: North-Holland, 1976. N. Shu, B. Housel, and V. Lum, "CONVERT highA level translation definition language for data conversion," IBM Corp. Res.Rep. RJ 1500, San Jose, CA, Jan. 1975. N. Shu et a/., "EXPRESS: A data extraction, processing and restructuring system," ACM Trans. DatabaseSyst.,vol. 2, no. 2, June 1977. D. W. Shipman, "The functional data model and the language DAPLEX," ACM Trans. Database Syst., vol. 6, no. 1, pp. 140173, Mar. 1981. J. M. Smith eta/., "MULTIBASE-Integrating heterogeneous distributed database systems," in Proc. 1981 Nat. Computer PROCEEDINGS OF THEIEEE, VOL. 75, NO. 5, M A Y 1987