6. xmldb
Transcription
6. xmldb
Advanced Databases Lectures November 2013. 6. XML Databases ZPR-FER - Zagreb Advanced Databases 2013/2014 1 Overview XML databases – examples of application XML data model and XML document schema XML in the Database Management Systems Homework assignment ZPR-FER - Zagreb Advanced Databases 2013/2014 2 BBC dynamic publishing – semantic web and content stored in XML database Static publishing (to the 2010) Journalists enter all materials about the articles and the indices /positions where the items will be placed on the BBC website into the Oracle RDBMS once published pages can not be changed until the next publication poor semantics of the stored/presented data inability to connect articles based on semantic features Dynamic publishing (from 2010) establishment of an ontology for the domain of the World Cup processing of the journalistic texts to extract data according to the established ontology publication of metadata in accordance with the ontology Storing content in native XML DBMS (MarkLogic)) dynamically generate pages ZPR-FER - Zagreb pictures taken from BBC web pages http://www.bbc.co.uk Advanced Databases 2013/2014 3 Clinical Document Architecture (CDA) and XML DBMS CDA is a document mark-up (ANSI ) standard that specifies the structure and semantics of "clinical documents" for the purpose of exchange Mayo Clinic, similar to the BBC, use native XML DBMS (MarkLogic) for storing web site content and the semantic web They store clinical notes and genomic data in XML format Data stored in XML format, after processing are being associated with the previously published genomic repositories They recognized the benefits of using XML database compared to other ways of storing and managing XML documents: Full text Search the ability to access documents/nodes along the axis flexible schema … ZPR-FER - Zagreb Advanced Databases 2013/2014 4 XML - EXtensible Markup Language Defined by WWW Consortium (W3C) – version 1.0 released in 1998. Based on Standard Generalized Markup Language Developed for exchanging data on the Web Allows describing the meaning and storage of the data in textual format Self-documenting - XML documents have markups that further describe parts of the document e.g. <title>XML</title> <slide>What is XML…</slide> There are no predefined tags – anyone can define their own tags ZPR-FER - Zagreb Advanced Databases 2013/2014 5 XML document - example <?xml version='1.0' encoding="utf-8"?> <!–- Study programs on FER --> <studyPrograms> <studyProgram acYear="2013/2014"> declaration – processing instruction comment root element (document) attribute <study studyName="Computing"> <course> <courseName>Databases</courseName> Start tag and end tag element <dascription>This is the basic course from databases intendet to… </ dascription > <ects>7.5</ects> text </course> ... </study> <study studyName="Power Engineering"> <course> <courseName>Electrical installations</courseName> <description>The basics of the power system. Voltage and ... </ description> <ects>8</ects> </course> ... </study> </studyProgram> ... <studyPrograms> ZPR-FER - Zagreb Advanced Databases 2013/2014 6 XML document – hierarchical structure • Structured Text - in the form of a tree • Connected acyclic graph XML document from a previous slide represented in form of a tree in Basex XML DBMS: ZPR-FER - Zagreb Advanced Databases 2013/2014 7 XML data model XML data model is a hierarchical data model XML document is modelled as a tree, having elements and attributes as a tree nodes element type nodes can have child nodes – attributes or other element text in element is modelled as a child node of text type the order of children node in the tree corresponds to their order in the XML document elements and attributes (except the root element) have one parent, and he is of element type Element nesting corresponds to parent-child relationship root node children nodes siblings node ZPR-FER - Zagreb Advanced Databases 2013/2014 8 The reasons for the general acceptance of XML XML is a human-readable is a computer-readable is a internationalized (UNICODE) is a platform independent is approved by the World Wide Web Consortium not just a format for storing information, but a set of technologies for storing, managing and presenting data ZPR-FER - Zagreb Advanced Databases 2013/2014 9 What is XML document used for and what makes it popular? Used for: data exchange between heterogeneous systems the possibility of adding new tags, and create nested XML structures have made an XML ideal structure for data exchange storing data search of data XML has become the standard for data storage on the Web All major development frameworks are XML oriented (. NET, Java) Modern architecture of the web includes XML ZPR-FER - Zagreb Advanced Databases 2013/2014 10 XML document schema ZPR-FER - Zagreb Advanced Databases 2013/2014 11 XML document schema We use database schema to place restrictions on the data stored data types, identifiers, relationships, business rules, ... XML document does not have to comply the certain scheme Compliance with the agreed scheme of XML documents allows the exchange of the data Languages for defining XML schema are used to describe the structure and content constraints of XML documents ZPR-FER - Zagreb Advanced Databases 2013/2014 12 Modelling the schema of the XML document The schema of XML document is not a formal as relational model It can be designed following the similar rules that we apply in relational database modelling: Use complex elements to present entities Use elements or attributes to present entity attributes Use references (key, keyref) to describe relationships Difficulties: More degrees of freedom than with relational database schema Unclear the disncon between aributes and sub elements Recommendation: Use element whenever the expansion of the structure is expected Use attribute when it comes to 1:1 relationships ZPR-FER - Zagreb Advanced Databases 2013/2014 13 Languages for modelling the schema of the XML document Document Type Definition (DTD) Limited support for defining data types Is not a XML document XML schema Richer than DTD due to the possibility of defining a broader set of data types complex data types (including those involving inheritance of properties of another complex type) constraints (domain, primary key, foreign key) ... The XML Schema is also a valid XML document W3C recommendation since 2001. Namespaces can be used Complex syntax ZPR-FER - Zagreb Advanced Databases 2013/2014 14 Valid XML document Valid XML document is well formed comply with the rules defined with an XML schema - DTD (Document Type Definition) or XML schema (XSD) Well formed XML document has single root element which contains all the other elements each element has the beginning and end tag the element tags are case-sensitive - the beginning and end tags must match exactly. elements must be nested with none missing and none overlapping can contain attributes which values are placed in quotation marks contains only properly-encoded legal Unicode characters ... ZPR-FER - Zagreb Advanced Databases 2013/2014 15 XML Schema Defines: elements that can appear in the document attributes that can appear in the document elements and attributes that are children of a complex element order of children elements number of children elements content of the element data types for elements and attributes default and fixed values for elements and aributes unique key constraint primary key constraint foreign key constraint ... ZPR-FER - Zagreb Advanced Databases 2013/2014 16 XML schema – example <?xml version="1.0" encoding="UTF-8"?> <note> <to>Ana</to> <from>Marko</from> <heading>Podsjetnik</heading> <body>Prezentacije su u utorak!</body> </note> note.xml note.xsd <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> ZPR-FER - Zagreb Advanced Databases 2013/2014 well formed XML XML schema well formed 17 XML schema – example <?xml version="1.0"?> <note xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNameSpaceSchemaLocation="note.xsd"> <to>Ana</to> <from>Marko</from> <heading>Reminder</heading> <body>Prezentations are in Tuesday!</body> </note> XML schema valid XML ZPR-FER - Zagreb Advanced Databases 2013/2014 18 XML schema/document and namespaces xs ini the following XML document segment, is called namespace. Namespace is the unique URI (Uniform Resource Locator). Namespace i defined by the xmlns <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> They are used primarily to unambiguously identify the "elements" of XML. Namespaces alow use of the "elements" of equal names from different namespace in the same XML. If there is a namespace, then all the "elements" must be prefixed with the correct namespace (elementFormDefault="qualified"). The alternative is to define default namespace: <schema xmlns="http://www.w3.org/2001/XMLSchema"> In an XML more than one namespace can be defined. There is no obligation to use namespaces, even in the XML Schema. The recommendation is to use namespaces always to avoid ambiguity. ZPR-FER - Zagreb Advanced Databases 2013/2014 19 XML schema/document and namespaces <?xml version="1.0" encoding="UTF-8"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" <element name="note"> <complexType> <sequence> <element name="to" type="string"/> <element name="from" type="string"/> <element name="heading" type="string"/> <element name="body" type="string"/> </sequence> </complexType> </element> </schema> note.xsd note.xml <?xml version="1.0"?> <note xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNameSpaceSchemaLocation="note.xsd"> <to>Ana</to> <from>Marko</from> <heading>Podsjetnik</heading> <body>Prezentacije su u utorak!</body> </note> ZPR-FER - Zagreb Advanced Databases 2013/2014 valid 20 XML Schema components Primary: simple type definition complex type definition attribute declaration element declaration Secundary: Helper: Attribute group definitions Identity-constraint definitions Model group definitions Notation declarations ZPR-FER - Zagreb Annotations Model groups Particles Wildcards Attribute Uses Declaration and definition: Declaration Components are associated by (qualified) names to information items being validated. Definition Components define internal schema components that can be used in other schema components. Advanced Databases 2013/2014 21 Declaration and definition - example Declaration <student JMBAG="0036000001"> <fName>Ana</fName> ... </student > <xs:element name="student"> <xs:complexType> <xs:sequence> <xs:element name="fName" type="xs:string"/> ... </xs:sequence> <xs:attribute name="JMBAG" type="xs:string"/> </xs:complexType> </xs:element> Type definition <xs:element name="student" type="studentType"/> <xs:complexType name="studentType"> <xs:sequence> <xs:element name="fName" type="fNameType"/> ... </xs:sequence> <xs:attribute name="JMBAG" type="xs:string"/> </xs:element> ZPR-FER - Zagreb Advanced Databases 2013/2014 22 Type definition 1. Simple type Attributes and elements without element children can not contain other elements 2. Complex type can contain other elements can be empty ZPR-FER - Zagreb Advanced Databases 2013/2014 23 Simple Type definition Can be: a restriction of some other simple type a list or union of simple type definition a built-in primitive dana types <xs:simpleType name="workingAge"> <xs:restriction base="xs:integer"> <xs:minInclusive value="18"/> <xs:maxInclusive value="65"/> </xs:restriction> </xs:simpleType> ZPR-FER - Zagreb <xs:simpleType name="listOfNumbers"> <xs:list itemType="xs:integer"/> </xs:simpleType> <xs:simpleType name="ECTSGrade"> <xs:restriction base="xs:string"> <xs:enumeration value="A"/> <xs:enumeration value="B"/> <xs:enumeration value="C"/> <xs:enumeration value="D"/> <xs:enumeration value="E"/> <xs:enumeration value="F"/> <xs:enumeration value="FX"/> </xs:restriction> </xs:simpleType> Advanced Databases 2013/2014 24 Complex type definition Can be: a restriction of a complex type definition an extension of a simple or complex type definition <xs:complexType name="studentType"> <xs:sequence> <xs:element name="fName" type="fNameType"/> ... </xs:sequence> <xs:attribute name="JMBAG" type="xs:string"/> </xs:complexType> <xs:complexType name="extendedStudentType"> <xs:complexContent> <xs:extension base="studentType"> <xs:sequence> <xs:element name="currStudy" type="xs:string"/> ... </xs:sequence> <xs:extension </xs:complexContent> </xs:complexType> <xs:element ZPR-FER - Zagreb name="student" type=" extendedStudentType"/> Advanced Databases 2013/2014 25 Simple element declaration XML elements that can not contain other elements and/or attributes <xs:element name="name" type="type" /> name – element name the most common data types: xs:boolean xs:integer xs:date xs:string xs:decimal xs:time Some additional attributes that can be defined while declaring simple element: default default value fixed fixed value minOccurs the minimum number of times this element can occur in the parent element (default value is 1) maxOccurs the minimum number of … ZPR-FER - Zagreb Advanced Databases 2013/2014 26 Simple element declaration - example <employee> <lName>Matišić</lName> <age>50</age> <gender>M</gender> <birtDate>1962-06-03</birtDate> <adult>Yes</adult> </employee> <xs:element <xs:element <xs:element <xs:element <xs:element name=“lName" name=“age“ name=„gender" name="birthDate" name="adult" type="xs:string"/> type="xs:integer"/> type="xs:string"/> type="xs:date"/> type="xs:string"/> employee is not a simple element! Default value declaration for a simple element: <xs:element name="gender" type="xs:string" default="F"/> Default value is assigned to elementu in case no other associated value. Fixed value is automatically assigned to element and no other value can be assigned. <xs:element name="adult" type="xs:string" fixed="Yes"/> ZPR-FER - Zagreb Advanced Databases 2013/2014 27 Attribute declaration <xs:attribute name="name" type="type" /> name and type have the same meaning as for xs:element Some additional attributes that can be defined while declaring attribute> default - default value fixed fixed value use can be: "optional" "required" ZPR-FER - Zagreb attribute is optional attribute is required Advanced Databases 2013/2014 28 Attribute declaration - example <predmet> <naziv lang=“HR”>Advanced Databases</naziv> </predmet> <xs:attribute name="lang" type="xs:string" default="HR" /> ZPR-FER - Zagreb Advanced Databases 2013/2014 29 Complex element declaration (1) XML elements that, except text, can contain other elements and/or attributes There are four kinds of complex elements: 1. empty element <country code=“HR”/> 2. element that contain text only 3. element that contain other elements only <country code=“HR”>Croatia</country> 4. elementi that contain text and other elements <country capital=“Zagreb”>Croatia</country> <code>HR</code> <countryName>Croatia</countryName> </country> <country> <code>HR</code> <countryName>Croatia</countryName> </country> All kinds of complex elements may, but need not, contain attributes. ZPR-FER - Zagreb Advanced Databases 2013/2014 30 Complex element declaration (2) <xs:element name="name"> <xs:complexType> ... Complex type information... </xs:complexType> </xs:element> Example: <xs:element name=“person"> <xs:complexType> <xs:sequence> <xs:element name=“fName" <xs:element name=“lName" </xs:sequence> </xs:complexType> </xs:element> type="xs:string"/> type="xs:string"/> Order indicators : all, choice, sequence <xs:all> … </xs:all> the child elements can appear in any order, and each child element must occur only once <xs:choice> … </xs:choice> either one child element or another can occur <xs:sequence>… </xs:sequence> the child elements must appear in the same order as they are declared ZPR-FER - Zagreb Advanced Databases 2013/2014 31 Declaration of the element that contain only text (and attributes) In the declaration of this type of complex element, we enclose the content in XML Schema in <xs:simpleContent> </xs:simpleContent> tags. We must use tag <xs:simpleContent> with extension or restriction of the data type from which the value of an element or attribute can be. <xs:element name="nekoIme"> <xs:complexType> <xs:simpleContent> <xs:extension base="basetype"> ... </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> ZPR-FER - Zagreb <xs:element name="nekoIme"> <xs:complexType> <xs:simpleContent> <xs:restriction base="basetype"> ... </xs:restriction> </xs:simpleContent> </xs:complexType> </xs:element> Advanced Databases 2013/2014 32 Complex element declaration - example <course> <courseName lang="HR">Advanced Databases</courseName> </course> <xs:element name="predmet"> <xs:complexType> <xs:sequence> <xs:element ref="courseName"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="courseName"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute name="lang" type="xs:string" fixed="HR"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> ZPR-FER - Zagreb Advanced Databases 2013/2014 33 XML Schema – referencing Once defined element or attribute can be referenced in XML schema <xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element ref="to"/> <xs:element ref="from"/> <xs:element ref="heading"/> <xs:element ref="body"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element <xs:element <xs:element <xs:element ZPR-FER - Zagreb name="to" name="from" name="heading" name="body" type="xs:string"/> type="xs:string"/> type="xs:string"/> type="xs:string"/> Advanced Databases 2013/2014 34 XML Schema – domain constraints Examples of domain constraints on the values of elements and attributes: restricon on the values between the declared minimum and maximum value restriction on the finite set of declared value limit the length of allowable values between the declared minimum and maximum length restriction on the allowable values extended with the operator that declares the allowable number of occurrences of the declared value ZPR-FER - Zagreb Advanced Databases 2013/2014 35 XML Schema – domain constraints The general form for setting domain integrity constraints: <xs:element name="name"> (same for the xs:attribute) <xs:simpleType> <xs:restriction base="type"> ... constraints ... </xs:restriction> </xs:simpleType> </xs:element> <xs:element name=“totalPoints"> <xs:simpleType> <xs:restriction base="xs:decimal"> <xs:minInclusive value=“0.0“/> <xs:maxInclusive value=“100.0“/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="gender"> <xs:restriction base="xs:string"> <xs:enumeration value="F"/> <xs:enumeration value="M"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="OIB" type="xs:string"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{11}"/> </xs:restriction> </xs:simpleType> </xs:element> ZPR-FER - Zagreb Advanced Databases 2013/2014 36 XML Schema – integrity constraints Following integrity constraints can be expressed in XML Schema: unique constraint primary key constraint foreign key constraint (the tag unique is used) (key) (keyref) To define integrity constraints, XPath expressions are used Using XPath expressions we can set values for: selector field XPath expression takes a tree (XML file) as input, and returns set of tree nodes as a result ZPR-FER - Zagreb Advanced Databases 2013/2014 37 XML Schema – integrity constraints <xs:key name="countryPK"> <xs:selector xpath="//countries/country"/> <xs:field xpath="@countryCode"/> </xs:key> primary key constraint <xs:keyref name="personCountryFK" refer="countryPK"> <xs:selector xpath="//persons/person"/> <xs:field xpath="@countryCode"/> </xs:keyref> foreign key constraint <xs:unique name="personUI"> <xs:selector xpath=".//persons/person"/> <xs:field xpath="personId"/> </xs:unique> ZPR-FER - Zagreb Advanced Databases 2013/2014 unique constraint 38 Integrity constraints - example ZPR-FER - Zagreb Advanced Databases 2013/2014 39 Primary key and unique constraint <xs:element name="student"> XML schema <xs:complexType> <xs:sequence> CREATE TABLE student <xs:element name="JMBAG“> (JMBAG CHAR(10) PRIMARY KEY <xs:simpleType> CONSTRAINT pkStud, <xs:restriction base="xs:string"> JMBG CHAR(13) UNIQUE CONSTRAINT <xs:pattern value="[0-9]{10}"/> uiJMBG, </xs:restriction> fName NCHAR(50) NOT NULL, </xs:simpleType> lName NCHAR(50) NOT NULL, </xs:element> CHECK JMBAG MATCHES <xs:element name="JMBG" type="xs:string"> '[0-9][0-9][0-9][0-9][0-9] <xs:unique name="uiJMBG"> [0-9][0-9][0-9][0-9][0-9]' <xs:selector xpath=".//student"/> ) <xs:field xpath="JMBG"/> value of the element with the name </xs:unique> JMBAG must be </xs:element> <xs:element name="fName" type="xs:string"/> known for each element of the type <xs:element name="lName" type="xs:string“/> student </xs:sequence> unique for each element of the type </xs:complexType> student </xs:element> <xs:key name="pkStud"> value of the element with name JMBG <xs:selector xpath=".//student"/> must be unique for all elements of the <xs:field xpath="JMBAG"/> type student - not necessarily known, but </xs:key> Relational schema only one value may be unknown to be unique ZPR-FER - Zagreb put inside the root element <studAdmin> Advanced Databases 2013/2014 40 Referential integrity Relacijska shema CREATE TABLE acYearEnrollment (acYear SMALLINT, JMBAG CHAR(10) REFERENCES stud(JMBAG) CONSTRAINT fkAcYearEnrStud, enrollDate DATE, <xs:element name="acYearEnrollment"> PRIMARY KEY (JMBAG, acYear) XML shema <xs:complexType> CONSTRAINT pkAcYearEnr <xs:sequence> ) <xs:element name="acYear" type="xs:integer"/> <xs:element name="JMBAG" type="xs:string"/> <xs:element name="enrollDate" type="xs:date"/> value of the element with the <xs:element ref="courseEnroll" name JMBAG in maxOccurs="unbounded"/ </xs:sequence> acYearEnrollment element must </xs:complexType> match exactly one element </xs:element> JMBAG in element student <xs:key name="pkAcYaerEnr"> <xs:selector xpath=".//acYearEnrollment"/> <xs:field xpath="JMBAG"/> <xs:field xpath="acYear"/> </xs:key> <xs:keyref name="fkAcYearEnrStud" refer="pkStud"> <xs:selector xpath=".//acYearEnrollment"/> put inside the root element <xs:field xpath="JMBAG"/> <studAdmin> </xs:keyref> ZPR-FER - Zagreb Advanced Databases 2013/2014 41 Referential integrity <xs:element name="courseEnrollment"> XML schema <xs:complexType> <xs:sequence> CREATE TABLE courseEnrollment <xs:element name="courseId" type="xs:integer"> (acYear SMALLINT, <xs:element name="semester" type="xs:integer"/> JMBAG CHAR(10), <xs:keyref name="CourseEnrCourse" courseID INTEGER, refer="pkCourse"> semester SMALLINT, <xs:selector xpath=".//course"/> PRIMARY KEY (JMBAG, acYear, <xs:field xpath="courseID"/> courseID, semester) </xs:keyref> CONSTRAINT pkCourseEnr, </xs:element> FOREIGN KEY (JMBAG, acYear) </xs:sequence> REFERENCES </xs:complexType> acYearEnrollment(JMBAG, acYear) CONSTRAINT fkCourseEnrYearEnr <xs:key name="pkCourseEnr"> , <xs:selector xpath=".//CourseEnrollment"/> FOREIGN KEY (courseID) <xs:field xpath="courseID"/> REFERENCES course(courseID) <xs:field xpath="semester"/> CONSTRAINT fkCourseEnrCourse) </xs:key> ) </xs:element> Relationa schema Why is constraint fkCourseEnrYearEnr ommited? Pay attention to the implementation of pkCourseEnr constraint. ZPR-FER - Zagreb Advanced Databases 2013/2014 42 Assignment: Create an XML Schema for the following XML document: <?xml version="1.0" encoding="UTF-8"?> <topiscAndAssignments xsi:noNamespaceSchemaLocation="topicsAndAssignments.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <topics> <topic ordinal="1"> <topicName>Advanced SQL</topicName> </topic> <topic ordinal="2"> <topicName>Object-oriented databases</topicName> </topic> <topic ordinal="3"> <topicName>Object/relational databases</topicName> </topic> <topic ordinal="4"> <topicName>XML databases</topicName> </topic> </topics> <assignments> <assignment id="1"> <assignmentName>Midterm exam</assignmentName> <points>4</points> </assignment> <assignment id="2"> <assignmentName>Final exam</assignmentName> <points>15</points> </assignment> </assignments> </topiscAndAssignments> ZPR-FER - Zagreb Advanced Databases 2013/2014 43 Assignment – solution <?xml version="1.0" encoding="UTF-8"?> topicsAndAssignments.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="topicsAndAssignments"> <xs:complexType> <xs:sequence> <xs:element ref="topics"/> <xs:element ref="assignments"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="topics"> <xs:complexType> <xs:sequence> <xs:element ref="topic" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="topic"> <xs:complexType> <xs:sequence> <xs:element ref="topicName"/> </xs:sequence> <xs:attribute name="ordinal" type="xs:integer" use="required"/> </xs:complexType> </xs:element> <xs:element name="topicName" type="xs:string"/> ZPR-FER - Zagreb Advanced Databases 2013/2014 44 Assignment – solution <xs:element name="assignments"> <xs:complexType> <xs:sequence> <xs:element ref="assignment" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="assignment"> <xs:complexType> <xs:sequence> <xs:element ref="assignmentName"/> <xs:element ref="points"/> </xs:sequence> <xs:attribute name="id" type="xs:integer" use="required"/> </xs:complexType> </xs:element> <xs:element name="assignmentName" type="xs:string"/> <xs:element name="points"> <xs:simpleType> <xs:restriction base="xs:decimal"> <xs:minInclusive value="0.00"/> <xs:maxInclusive value="100.00"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:schema> ZPR-FER - Zagreb Advanced Databases 2013/2014 45 XML in database management systems ZPR-FER - Zagreb Advanced Databases 2013/2014 46 XML in DBMSs today Well-defined concepts are the result of many years of development. 1. XML enabled DBMS relational database management system that supports XML data type and know work with it efficiently 2. Native XML DBMS custom data structures optimised for X-languages (Xpath, Xquery,…) ZPR-FER - Zagreb Advanced Databases 2013/2014 47 Relational DBMSs with XML support Some relational DBMS with support for XML: Oracle 9i+ Microsoft SQL Server 2000+ Microsoft Access XP IBM DB2 PostgreSQL 8.4+ ZPR-FER - Zagreb Advanced Databases 2013/2014 48 Microsoft SQL Server 2012 and XML - example <diplomaThesis thesisId="34567"> <title>Algorithms for finding the shortest path in the graph</title> <assignment>In this thesis, it is necessary to describe the basic idea, show pseudocode and analyze the complexity of the two algorithms for finding the shortest path in the graph: Dijkstra and Floyd. Implement both algorithms, and compare the performance time on specific default graphs. </assignment> </diplomaThesis> diplomaId 1 thesisXML <diplomaThesis thesisId="34567"> <title>Algorithms for finding the shortest path in the graph</title> <assignment>In this thesis, it is necessary to describe the basic idea, show pseudocode and analyze the complexity of the two algorithms for finding the shortest path in the graph: Dijkstra and Floyd. Implement both algorithms, and compare the performance time on specific default graphs. </assignment> </diplomaThesis> … … ZPR-FER - Zagreb Advanced Databases 2013/2014 49 Microsoft SQL Server 2012 i XML - primjer CREATE XML SCHEMA COLLECTION diplomaThesisSchema AS '<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="diplomaThesis"> <xs:complexType> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="assignment" type="xs:string"/> </xs:sequence> <xs:attribute name="thesisId" type="xs:integer"/> </xs:complexType> </xs:element> </xs:schema>' CREATE TABLE diplomaThesis (diplomaId INT PRIMARY KEY , thesisXML XML (diplomaThesisSchema)) INSERT INTO diplomaThesis VALUES (1, '<diplomaThesis thesisId="34567"> <title>Algorithms for …</title> <assignment>In this …</assignment> </diplomaThesis>' ) ZPR-FER - Zagreb INSERT INTO diplomaThesis VALUES (2, '<diplomaThesis thesisId="34444"> <title> Generators of the …</title> <assignment>In this …</assignment> <thesisContent/> </diplomaThesis>' ) XML Validation: Unexpected element(s): thesisContent. Location: /*:diplomaThesis[1]/*:thesisContent[1] Advanced Databases 2013/2014 50 PostgreSQL and XML - example It is not possible to validate XML content CREATE TABLE diplomaThesis (diplomaId INT PRIMARY KEY , thesisXML XML) INSERT INTO diplomaThesis VALUES (1, '<diplomaThesis thesisId="34567"> <title>Algorithms for …</title> <assignment>In this …</assignment> </diplomaThesis>' ) INSERT INTO diplomaThesis VALUES (2, '<diplomaThesis thesisId="34444"> <title>Generators of the …</title> <assignment>In this …</assignment> <thesisContent/> </diplomaThesis>' ) SELECT xpath('/diplomaThesis/title', thesisXML) FROM diplomaThesis title XML[ ] "{"<title>Algorithms for …</title>"}" "{"<title>Generators of the high…</title>"}" ZPR-FER - Zagreb Advanced Databases 2013/2014 51 Native XML database (NXD) (1) Definicija 1. (Gavin Powell: Beginning XML Database, 2007) A native XML database is essentially any method of storing XML data as an XML document. Definicija 2. (http://www.ibm.com/developerworks/xml/library/x-mxd4.html) A native XML database is one that treats XML documents and elements as the fundamental structures rather than tables, records, and fields Definicija 3. (XML:DB Initiative) A native XML database... Defines a (logical) model for an XML document -- as opposed to the data in that document -- and stores and retrieves documents according to that model. At a minimum, the model must include elements, attributes, PCDATA, and document order. Examples of such models are the XPath data model, the XML Infoset, and the models implied by the DOM and the events in SAX 1.0. Has an XML document as its fundamental unit of (logical) storage, just as a relational database has a row in a table as its fundamental unit of (logical) storage. Is not required to have any particular underlying physical storage model. For example, it can be built on a relational, hierarchical, or object-oriented database, or use a proprietary storage format such as indexed, compressed files. ZPR-FER - Zagreb Advanced Databases 2013/2014 52 Native XML database (NXD) (2) In a database only XML can be stored – from the database only XML can be retrieved XML is stored in a form optimized to work with the hierarchical structure of an XML document The basic logical unit for storing data is document XML data type is a special data type equivalent structure in the RDBMS is tuple System architecture for storing XML documents in Basex: Meta Data: common information, statistics Main Table: tabular representation of each node Directory: additional information for speed updates ZPR-FER - Zagreb Advanced Databases 2013/2014 53 Indexes in NXD Three types of indexes 1. Value indexes They index text and attribute values Used to resolve queries such as, “Find all elements or attributes whose value is ‘Advanced Databases’” 2. Structural indexes They index location of elements and attributes Used to resolve queries such as, “Find all Course elements” Value and structural indexes are combined to resolve queries such as, “Find all Course elements whose value is ‘Advanced Databases’” 3. Full-text indexes They index individual tokens in text and attribute values Used to resolve queries such as, “Find all documents that contain the words ‘Advanced Databases’” Used with structural indexes for queries like, “Find all documents that contain the words “Advanced Databases” inside an Course element” ZPR-FER - Zagreb Advanced Databases 2013/2014 54 NXD – query languages, updates Query languages Most NXDs support XPath and XQuery for querying XPath is most commonly used query language for NXDs. But XPath lacks functionality like grouping, sorting, cross document joins, etc. XQuery has overcome these shortcomings and becomes a standard Updates Some databases replaces entire documents Strategies: Changes through the DOM (object representation of XML trees) Special language for updates – use XPath to find a node, perform insert (before or after the node), update or deletion of the node Extending XQuery ZPR-FER - Zagreb Advanced Databases 2013/2014 55 XPath Xpath is a languge for extracting parts of XML documents. Path expressions (like the one in unix or linux OS) are used to navigate through the XML document. XPath examples: <?xml version='1.0' encoding="utf-8"?> <studyProgrames> <studyProgram acYear="2013/2014"> <study studyName="Computing"> <course> <courseName>Baze podataka</courseName> <description>This is the … </description> <ects>7.5</ects> </course> ... </study> <study studyName="Power Engineering"> ... </study> </studyProgram> ... < studyProgrames> //course[description] All elements named course, having sub element named description //study[@studyName] All elements named study, having attribute named studyName //studyProgram[@acYear="2013/2014"] All elements named studyProgram, having attribute named acYear with the value equals to "2013/2014" //course[courseName="Advanced Databases"][ects = 4] All elements named course, having element named courseName with the value "Advanced Databases" and element name ects with the value equals to 4 //course[courseName="Advanced Databases" or courseName="C#"] All elements named course, having element named courseName with the value "Advanced Databases" or "C#" ZPR-FER - Zagreb Advanced Databases 2013/2014 56 XQuery Language for querying data stored in the form of XML documents, with or without a defined schema (XML Schema or DTD) Uses XPath 2.0 XQuery is for XML the same as SQL for relational databases Supported in majority XML enabled DBMSs (IBM, Oracle, Microsoft,…) and in majority NXDs ZPR-FER - Zagreb Advanced Databases 2013/2014 57 FLWOR expressions Similar to SQL SELECT-FROM-WHERE-ORDER BY expressions FOR – Iterates through a sequence, bind variable to items LET – binds a variable to a sequence WHERE –eliminates items of the iteration ORDER BY – sorts query results RETURN – constructs query results Example: for $s in //study let $p := $s/course where $s/@studyName="Computing" order by $s/../@acYear return <study>{ concat( $s/@studyName , " ", $s/../@acYear , " cnt=" , count($p)) } </study> Result (baseX): ZPR-FER - Zagreb Advanced Databases 2013/2014 58 When and why use native XML database? When storing unstructured or semi-structured data When the data schema changes frequently • For XML documents stored in NXD XML schema can, but need not be defined When it is important to preserve the original layout of XML documents Due to the custom query language optimised to work efficiently with XML data type Because of the query processing speed that uses an index structure tailored to the XML data type ZPR-FER - Zagreb Advanced Databases 2013/2014 59 MarkLogic Server – example of NXD Besides NXD characteristics, includes the majority of RDBMS features: MarkLogic Server is a native XML database that can convert Word, PowerPoint, Excel, PDF, and HTML documents to XML Supports Xquery and full-text search using wildcards, stemming, spell checking and so on It has a single index that combines full-text index, value index and structural index Supports transactions, triggers, journaling, role-based security, clustering, and backup http://www.marklogic.com/product/demos.html ZPR-FER - Zagreb Advanced Databases 2013/2014 60 The best known NXD realization Open source eXist - http://exist.sourceforge.net/ BaseX - basex.org Sedna - http://www.sedna.org/ XIndice (Apache) - http://xml.apache.org/xindice/ Comercial MarkLogic http://www.marklogic.com/ Tamino XML Server ZPR-FER - Zagreb Advanced Databases 2013/2014 61 Homework – XML data model Arbitrarily choose a real world segment and information about it store in the XML document (LastName.xml). Design the scheme of the document with XML Schema (LastName.xsd ) . Briefly describe the modelled segment of the real world (in file LastName.txt ) In the XML schema demonstrate the definition of at least one: 1. element or attribute of the following data types: string, integer, boolean, date other data types, use arbitrarily 2. default value for the element or aribute 3. domain constraint (which can not be expressed using only the primitive data type) for the element or attribute 4. unique constraint 5. primary key constraint 6. foreign key constraint Note: to make XML and XSD documents easier, use one of the XML editors (eg free or trial versions of EditX , oXigen etc.) Additionally, in the file LastName.txt for each point 1st-6th specify what changes need to be done in a valid XML document that will lead to a violation of individual limitations. ZPR-FER - Zagreb Advanced Databases 2013/2014 62 Homework – eXist native XML DBMS Download and install Java SE Development Kit (JDK) http://www.oracle.com/technetwork/java/javase/downloads/jdk7downloads-1880260.html Download and install eXist - native XML DBMS http://exist-db.org/exist/apps/homepage/index.html Create collection in eXist DBMS and place file 0036000001.xsd in it. Ensure that eXist allows to store XML document LastName.xml only if it complies with the rules defined in the scheme LastName.xsd. ZPR-FER - Zagreb Advanced Databases 2013/2014 63 Configure implicit validation in eXist Configure implicit validation of XML documents (http://existdb.org/exist/apps/doc/validation.xml): 1. In $EXIST_HOME /conf.xml file set attribute mode of the element validation to the value "auto" (or to the value "yes"): segment of the conf.xml file ... <validation mode="auto"> <entity-resolver> <catalog uri="${WEBAPP_HOME}/WEB-INF/catalog.xml"/> </entity-resolver> </validation> ... ZPR-FER - Zagreb Advanced Databases 2013/2014 64 Configure implicit validation in eXist 2. Modify file /db/system/config/db/collection.xconf (using eXist interface) to include the following element: element:<validation mode="yes"/> ZPR-FER - Zagreb Advanced Databases 2013/2014 65 Configure implicit validation in eXist 3. Modify file $EXIST_HOME\webapp\WEB-INF\entities\catalog.xml so that it contains the following element: segment of the catalog.xml file <uri name="http://www.fer.hr/0036000001" uri="xmldb:exist:///db/0036000001/0036000001.xsd" /> 4. Place the file 0036000001.xsd into the folder $EXIST_HOME\tools\wrapper\bin ZPR-FER - Zagreb Advanced Databases 2013/2014 66 Homework – storing XML in eXist An attempt to store an XML document that is not in accordance with a defined XML scheme will produce errors such as the one shown in the picture: 5 points Solution (files 0036000001.xml, 0036000001.xsd and 0036000001.txt) send attached in email to [email protected] Deadline: 03.12.2013 u 9:00. ZPR-FER - Zagreb Advanced Databases 2013/2014 67 Literature A.B. Chaundry, A. Rashid, R. Zicari: XML Data Management Native XML and XML/Enabled Database Systems, AddisonWesley, 2003 G. Powel: Beginning XML Databases, Wiley Publishing, Inc, 2007 K. Williams: Professional XML Databases, Wrox Press Ltd, 2000 http://xmldb-org.sourceforge.net/faqs.html http://www.rpbourret.com/xml/XMLDatabaseProds.htm http://www.w3.org/XML http://www.xml.com ZPR-FER - Zagreb Advanced Databases 2013/2014 68