6. xmldb

Transcription

6. xmldb
Advanced Databases
Lectures
November 2013.
6. XML Databases
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
1
Overview
XML databases – examples of application
XML data model and XML document schema
XML in the Database Management Systems
Homework assignment
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
2
BBC dynamic publishing – semantic web and content
stored in XML database
Static publishing (to the 2010)
Journalists enter all materials about the articles
and the indices /positions where the items will be
placed on the BBC website into the Oracle RDBMS
once published pages can not be changed until
the next publication
poor semantics of the stored/presented data
inability to connect articles based on semantic
features
Dynamic publishing (from 2010)
establishment of an ontology for the domain of
the World Cup
processing of the journalistic texts to extract data
according to the established ontology
publication of metadata in accordance with the
ontology
Storing content in native XML DBMS (MarkLogic))
dynamically generate pages
 ZPR-FER - Zagreb
pictures taken from BBC web pages http://www.bbc.co.uk
Advanced Databases 2013/2014
3
Clinical Document Architecture (CDA) and XML DBMS
CDA is a document mark-up (ANSI ) standard that specifies the
structure and semantics of "clinical documents" for the purpose
of exchange
Mayo Clinic, similar to the BBC, use native XML DBMS (MarkLogic)
for storing web site content and the semantic web
They store clinical notes and genomic data in XML format
Data stored in XML format, after processing are being associated
with the previously published genomic repositories
They recognized the benefits of using XML database compared to
other ways of storing and managing XML documents:
Full text Search
the ability to access documents/nodes along the axis
flexible schema
…
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
4
XML - EXtensible Markup Language
Defined by WWW Consortium (W3C) – version 1.0 released in
1998.
Based on Standard Generalized Markup Language
Developed for exchanging data on the Web
Allows describing the meaning and storage of the data in textual
format
Self-documenting - XML documents have markups that further
describe parts of the document
e.g. <title>XML</title>
<slide>What is XML…</slide>
There are no predefined tags – anyone can define their own tags
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
5
XML document - example
<?xml version='1.0' encoding="utf-8"?>
<!–- Study programs on FER
-->
<studyPrograms>
<studyProgram acYear="2013/2014">
declaration – processing instruction
comment
root element (document)
attribute
<study studyName="Computing">
<course>
<courseName>Databases</courseName>
Start tag
and
end tag
element
<dascription>This is the basic course from databases intendet to… </ dascription >
<ects>7.5</ects>
text
</course>
...
</study>
<study studyName="Power Engineering">
<course>
<courseName>Electrical installations</courseName>
<description>The basics of the power system. Voltage and ... </ description>
<ects>8</ects>
</course>
...
</study>
</studyProgram>
...
<studyPrograms>
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
6
XML document – hierarchical structure
• Structured Text - in the form of a tree
• Connected acyclic graph
XML document from a previous slide represented in form of a tree in Basex XML
DBMS:
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
7
XML data model
XML data model is a hierarchical data model
XML document is modelled as a tree, having elements and
attributes as a tree nodes
element type nodes can have child nodes – attributes or other element
text in element is modelled as a child node of text type
the order of children node in the tree corresponds to their order in the
XML document
elements and attributes (except the root element) have one parent,
and he is of element type
Element nesting corresponds to
parent-child relationship
root node
children nodes
siblings node
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
8
The reasons for the general acceptance of XML
XML
is a human-readable
is a computer-readable
is a internationalized (UNICODE)
is a platform independent
is approved by the World Wide Web Consortium
not just a format for storing information, but a set of
technologies for storing, managing and presenting data
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
9
What is XML document used for and what makes it
popular?
Used for:
data exchange between heterogeneous systems
the possibility of adding new tags, and create nested XML structures
have made an XML ideal structure for data exchange
storing data
search of data
XML has become the standard for data storage on the Web
All major development frameworks are XML oriented (. NET, Java)
Modern architecture of the web includes XML
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
10
XML document schema
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
11
XML document schema
We use database schema to place restrictions on the data stored data types, identifiers, relationships, business rules, ...
XML document does not have to comply the certain scheme
Compliance with the agreed scheme of XML documents allows
the exchange of the data
Languages for defining XML schema are used to describe the
structure and content constraints of XML documents
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
12
Modelling the schema of the XML document
The schema of XML document is not a formal as relational model
It can be designed following the similar rules that we apply in
relational database modelling:
Use complex elements to present entities
Use elements or attributes to present entity attributes
Use references (key, keyref) to describe relationships
Difficulties:
More degrees of freedom than with relational database schema
Unclear the disncon between aributes and sub elements
Recommendation:
Use element whenever the expansion of the structure is expected
Use attribute when it comes to 1:1 relationships
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
13
Languages for modelling the schema of the XML
document
Document Type Definition (DTD)
Limited support for defining data types
Is not a XML document
XML schema
Richer than DTD due to the possibility of defining
a broader set of data types
complex data types (including those involving inheritance of properties of
another complex type)
constraints (domain, primary key, foreign key)
...
The XML Schema is also a valid XML document
W3C recommendation since 2001.
Namespaces can be used
Complex syntax
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
14
Valid XML document
Valid XML document is
well formed
comply with the rules defined with an XML schema - DTD
(Document Type Definition) or XML schema (XSD)
Well formed XML document
has single root element which contains all the other elements
each element has the beginning and end tag
the element tags are case-sensitive - the beginning and end tags
must match exactly.
elements must be nested with none missing and none overlapping
can contain attributes which values are placed in quotation marks
contains only properly-encoded legal Unicode characters
...
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
15
XML Schema
Defines:
elements that can appear in the document
attributes that can appear in the document
elements and attributes that are children of a complex element
order of children elements
number of children elements
content of the element
data types for elements and attributes
default and fixed values for elements and aributes
unique key constraint
primary key constraint
foreign key constraint
...
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
16
XML schema – example
<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>Ana</to>
<from>Marko</from>
<heading>Podsjetnik</heading>
<body>Prezentacije su u utorak!</body>
</note>
note.xml
note.xsd
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
well
formed
XML
XML
schema
well
formed
17
XML schema – example
<?xml version="1.0"?>
<note xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNameSpaceSchemaLocation="note.xsd">
<to>Ana</to>
<from>Marko</from>
<heading>Reminder</heading>
<body>Prezentations are in Tuesday!</body>
</note>
XML
schema
valid
XML
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
18
XML schema/document and namespaces
xs ini the following XML document segment, is called namespace. Namespace is the
unique URI (Uniform Resource Locator).
Namespace i defined by the xmlns
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
They are used primarily to unambiguously identify the "elements" of XML.
Namespaces alow use of the "elements" of equal names from different namespace in
the same XML.
If there is a namespace, then all the "elements" must be prefixed with the correct
namespace (elementFormDefault="qualified").
The alternative is to define default namespace:
<schema xmlns="http://www.w3.org/2001/XMLSchema">
In an XML more than one namespace can be defined.
There is no obligation to use namespaces, even in the XML Schema.
The recommendation is to use namespaces always to avoid ambiguity.
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
19
XML schema/document and namespaces
<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
<element name="note">
<complexType>
<sequence>
<element name="to" type="string"/>
<element name="from" type="string"/>
<element name="heading" type="string"/>
<element name="body" type="string"/>
</sequence>
</complexType>
</element>
</schema>
note.xsd
note.xml
<?xml version="1.0"?>
<note xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNameSpaceSchemaLocation="note.xsd">
<to>Ana</to>
<from>Marko</from>
<heading>Podsjetnik</heading>
<body>Prezentacije su u utorak!</body>
</note>
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
valid
20
XML Schema components
Primary:
simple type definition
complex type definition
attribute declaration
element declaration
Secundary:
Helper:
Attribute group definitions
Identity-constraint definitions
Model group definitions
Notation declarations
 ZPR-FER - Zagreb
Annotations
Model groups
Particles
Wildcards
Attribute Uses
Declaration and definition:
Declaration Components
are associated by (qualified) names
to information items being validated.
Definition Components
define internal schema components
that can be used in other schema
components.
Advanced Databases 2013/2014
21
Declaration and definition - example
Declaration
<student JMBAG="0036000001">
<fName>Ana</fName>
...
</student >
<xs:element name="student">
<xs:complexType>
<xs:sequence>
<xs:element name="fName" type="xs:string"/>
...
</xs:sequence>
<xs:attribute name="JMBAG" type="xs:string"/>
</xs:complexType>
</xs:element>
Type definition
<xs:element name="student" type="studentType"/>
<xs:complexType name="studentType">
<xs:sequence>
<xs:element name="fName" type="fNameType"/>
...
</xs:sequence>
<xs:attribute name="JMBAG" type="xs:string"/>
</xs:element>
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
22
Type definition
1. Simple type
Attributes and elements without element children
can not contain other elements
2. Complex type
can contain other elements
can be empty
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
23
Simple Type definition
Can be:
a restriction of some other simple type
a list or union of simple type definition
a built-in primitive dana types
<xs:simpleType name="workingAge">
<xs:restriction base="xs:integer">
<xs:minInclusive value="18"/>
<xs:maxInclusive value="65"/>
</xs:restriction>
</xs:simpleType>
 ZPR-FER - Zagreb
<xs:simpleType name="listOfNumbers">
<xs:list itemType="xs:integer"/>
</xs:simpleType>
<xs:simpleType name="ECTSGrade">
<xs:restriction base="xs:string">
<xs:enumeration value="A"/>
<xs:enumeration value="B"/>
<xs:enumeration value="C"/>
<xs:enumeration value="D"/>
<xs:enumeration value="E"/>
<xs:enumeration value="F"/>
<xs:enumeration value="FX"/>
</xs:restriction>
</xs:simpleType>
Advanced Databases 2013/2014
24
Complex type definition
Can be:
a restriction of a complex type definition
an extension of a simple or complex type definition
<xs:complexType name="studentType">
<xs:sequence>
<xs:element name="fName" type="fNameType"/>
...
</xs:sequence>
<xs:attribute name="JMBAG" type="xs:string"/>
</xs:complexType>
<xs:complexType name="extendedStudentType">
<xs:complexContent>
<xs:extension base="studentType">
<xs:sequence>
<xs:element name="currStudy" type="xs:string"/>
...
</xs:sequence>
<xs:extension
</xs:complexContent>
</xs:complexType>
<xs:element
 ZPR-FER - Zagreb
name="student"
type=" extendedStudentType"/>
Advanced Databases 2013/2014
25
Simple element declaration
XML elements that can not contain other elements and/or attributes
<xs:element
name="name"
type="type" />
name – element name
the most common data types:
xs:boolean
xs:integer
xs:date
xs:string
xs:decimal
xs:time
Some additional attributes that can be defined while declaring simple element:
default
default value
fixed
fixed value
minOccurs
the minimum number of times this element can occur
in the parent element (default value is 1)
maxOccurs
the minimum number of …
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
26
Simple element declaration - example
<employee>
<lName>Matišić</lName>
<age>50</age>
<gender>M</gender>
<birtDate>1962-06-03</birtDate>
<adult>Yes</adult>
</employee>
<xs:element
<xs:element
<xs:element
<xs:element
<xs:element
name=“lName"
name=“age“
name=„gender"
name="birthDate"
name="adult"
type="xs:string"/>
type="xs:integer"/>
type="xs:string"/>
type="xs:date"/>
type="xs:string"/>
employee is not a
simple element!
Default value declaration for a simple element:
<xs:element name="gender" type="xs:string" default="F"/>
Default value is assigned to elementu in case no other associated value.
Fixed value is automatically assigned to element and no other value can be
assigned.
<xs:element name="adult" type="xs:string" fixed="Yes"/>
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
27
Attribute declaration
<xs:attribute
name="name"
type="type" />
name and type have the same meaning as for xs:element
Some additional attributes that can be defined while declaring attribute>
default - default value
fixed fixed value
use
can be:
"optional"
"required"
 ZPR-FER - Zagreb
attribute is optional
attribute is required
Advanced Databases 2013/2014
28
Attribute declaration - example
<predmet>
<naziv lang=“HR”>Advanced Databases</naziv>
</predmet>
<xs:attribute name="lang" type="xs:string" default="HR" />
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
29
Complex element declaration (1)
XML elements that, except text, can contain other elements and/or
attributes
There are four kinds of complex elements:
1. empty element
<country code=“HR”/>
2. element that contain text
only
3. element that contain other
elements only
<country code=“HR”>Croatia</country>
4. elementi that contain text
and other elements
<country capital=“Zagreb”>Croatia</country>
<code>HR</code>
<countryName>Croatia</countryName>
</country>
<country>
<code>HR</code>
<countryName>Croatia</countryName>
</country>
All kinds of complex elements may, but need not, contain attributes.
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
30
Complex element declaration (2)
<xs:element
name="name">
<xs:complexType>
... Complex type information...
</xs:complexType>
</xs:element>
Example: <xs:element
name=“person">
<xs:complexType>
<xs:sequence>
<xs:element name=“fName"
<xs:element name=“lName"
</xs:sequence>
</xs:complexType>
</xs:element>
type="xs:string"/>
type="xs:string"/>
Order indicators : all, choice, sequence
<xs:all> … </xs:all>
the child elements can appear in any order, and each child element
must occur only once
<xs:choice> … </xs:choice>
either one child element or another can occur
<xs:sequence>… </xs:sequence>
the child elements must appear in the same order as they are declared
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
31
Declaration of the element that contain only text (and
attributes)
In the declaration of this type of complex element, we enclose the content in
XML Schema in
<xs:simpleContent> </xs:simpleContent> tags.
We must use tag <xs:simpleContent> with extension or restriction of the data
type from which the value of an element or attribute can be.
<xs:element name="nekoIme">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="basetype">
...
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
 ZPR-FER - Zagreb
<xs:element name="nekoIme">
<xs:complexType>
<xs:simpleContent>
<xs:restriction base="basetype">
...
</xs:restriction>
</xs:simpleContent>
</xs:complexType>
</xs:element>
Advanced Databases 2013/2014
32
Complex element declaration - example
<course>
<courseName lang="HR">Advanced Databases</courseName>
</course>
<xs:element name="predmet">
<xs:complexType>
<xs:sequence>
<xs:element ref="courseName"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="courseName">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="lang" type="xs:string" fixed="HR"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
33
XML Schema – referencing
Once defined element or attribute can be referenced in XML schema
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element ref="to"/>
<xs:element ref="from"/>
<xs:element ref="heading"/>
<xs:element ref="body"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element
<xs:element
<xs:element
<xs:element
 ZPR-FER - Zagreb
name="to"
name="from"
name="heading"
name="body"
type="xs:string"/>
type="xs:string"/>
type="xs:string"/>
type="xs:string"/>
Advanced Databases 2013/2014
34
XML Schema – domain constraints
Examples of domain constraints on the values of elements and
attributes:
restricon on the values between the declared minimum and
maximum value
restriction on the finite set of declared value
limit the length of allowable values between the declared
minimum and maximum length
restriction on the allowable values extended with the operator
that declares the allowable number of occurrences of the
declared value
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
35
XML Schema – domain constraints
The general form for setting domain integrity constraints:
<xs:element name="name">
(same for the xs:attribute)
<xs:simpleType>
<xs:restriction base="type">
... constraints ...
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name=“totalPoints">
<xs:simpleType>
<xs:restriction base="xs:decimal">
<xs:minInclusive value=“0.0“/>
<xs:maxInclusive value=“100.0“/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="gender">
<xs:restriction base="xs:string">
<xs:enumeration value="F"/>
<xs:enumeration value="M"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="OIB" type="xs:string">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{11}"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
36
XML Schema – integrity constraints
Following integrity constraints can be expressed in XML Schema:
unique constraint
primary key constraint
foreign key constraint
(the tag unique is used)
(key)
(keyref)
To define integrity constraints, XPath expressions are used
Using XPath expressions we can set values for:
selector
field
XPath expression takes a tree (XML file) as input, and returns set
of tree nodes as a result
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
37
XML Schema – integrity constraints
<xs:key name="countryPK">
<xs:selector xpath="//countries/country"/>
<xs:field xpath="@countryCode"/>
</xs:key>
primary key
constraint
<xs:keyref name="personCountryFK"
refer="countryPK">
<xs:selector xpath="//persons/person"/>
<xs:field xpath="@countryCode"/>
</xs:keyref>
foreign key
constraint
<xs:unique name="personUI">
<xs:selector xpath=".//persons/person"/>
<xs:field xpath="personId"/>
</xs:unique>
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
unique
constraint
38
Integrity constraints - example
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
39
Primary key and unique constraint
<xs:element name="student">
XML schema
<xs:complexType>
<xs:sequence>
CREATE TABLE student
<xs:element name="JMBAG“>
(JMBAG CHAR(10) PRIMARY KEY
<xs:simpleType>
CONSTRAINT pkStud,
<xs:restriction base="xs:string">
JMBG CHAR(13) UNIQUE CONSTRAINT
<xs:pattern value="[0-9]{10}"/>
uiJMBG,
</xs:restriction>
fName
NCHAR(50) NOT NULL,
</xs:simpleType>
lName
NCHAR(50) NOT NULL,
</xs:element>
CHECK JMBAG MATCHES
<xs:element name="JMBG" type="xs:string">
'[0-9][0-9][0-9][0-9][0-9]
<xs:unique name="uiJMBG">
[0-9][0-9][0-9][0-9][0-9]'
<xs:selector xpath=".//student"/>
)
<xs:field xpath="JMBG"/>
value of the element with the name
</xs:unique>
JMBAG must be
</xs:element>
<xs:element name="fName" type="xs:string"/>
known for each element of the type
<xs:element name="lName" type="xs:string“/>
student
</xs:sequence>
unique for each element of the type
</xs:complexType>
student
</xs:element>
<xs:key name="pkStud">
value of the element with name JMBG
<xs:selector xpath=".//student"/>
must be unique for all elements of the
<xs:field xpath="JMBAG"/>
type student - not necessarily known, but
</xs:key>
Relational schema
only one value may be unknown to be
unique
 ZPR-FER - Zagreb
put inside the root element <studAdmin>
Advanced Databases 2013/2014
40
Referential integrity
Relacijska shema
CREATE TABLE acYearEnrollment
(acYear
SMALLINT,
JMBAG
CHAR(10) REFERENCES stud(JMBAG) CONSTRAINT fkAcYearEnrStud,
enrollDate DATE,
<xs:element name="acYearEnrollment">
PRIMARY KEY (JMBAG, acYear)
XML shema
<xs:complexType>
CONSTRAINT pkAcYearEnr
<xs:sequence>
)
<xs:element name="acYear" type="xs:integer"/>
<xs:element name="JMBAG" type="xs:string"/>
<xs:element name="enrollDate" type="xs:date"/>
value of the element with the
<xs:element ref="courseEnroll"
name JMBAG in
maxOccurs="unbounded"/
</xs:sequence>
acYearEnrollment element must
</xs:complexType>
match exactly one element
</xs:element>
JMBAG in element student
<xs:key name="pkAcYaerEnr">
<xs:selector xpath=".//acYearEnrollment"/>
<xs:field xpath="JMBAG"/>
<xs:field xpath="acYear"/>
</xs:key>
<xs:keyref name="fkAcYearEnrStud"
refer="pkStud">
<xs:selector xpath=".//acYearEnrollment"/>
put inside the root element
<xs:field xpath="JMBAG"/>
<studAdmin>
</xs:keyref>
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
41
Referential integrity
<xs:element name="courseEnrollment">
XML schema
<xs:complexType>
<xs:sequence>
CREATE TABLE courseEnrollment
<xs:element name="courseId" type="xs:integer">
(acYear SMALLINT,
<xs:element name="semester" type="xs:integer"/>
JMBAG
CHAR(10),
<xs:keyref name="CourseEnrCourse"
courseID INTEGER,
refer="pkCourse">
semester SMALLINT,
<xs:selector xpath=".//course"/>
PRIMARY KEY (JMBAG, acYear,
<xs:field xpath="courseID"/>
courseID, semester)
</xs:keyref>
CONSTRAINT pkCourseEnr,
</xs:element>
FOREIGN KEY (JMBAG, acYear)
</xs:sequence>
REFERENCES
</xs:complexType>
acYearEnrollment(JMBAG, acYear)
CONSTRAINT fkCourseEnrYearEnr
<xs:key name="pkCourseEnr">
,
<xs:selector xpath=".//CourseEnrollment"/>
FOREIGN KEY (courseID)
<xs:field xpath="courseID"/>
REFERENCES course(courseID)
<xs:field xpath="semester"/>
CONSTRAINT fkCourseEnrCourse)
</xs:key>
)
</xs:element>
Relationa schema
Why is constraint fkCourseEnrYearEnr ommited?
Pay attention to the implementation of pkCourseEnr constraint.
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
42
Assignment: Create an XML Schema for the following XML
document:
<?xml version="1.0" encoding="UTF-8"?>
<topiscAndAssignments xsi:noNamespaceSchemaLocation="topicsAndAssignments.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<topics>
<topic ordinal="1">
<topicName>Advanced SQL</topicName>
</topic>
<topic ordinal="2">
<topicName>Object-oriented databases</topicName>
</topic>
<topic ordinal="3">
<topicName>Object/relational databases</topicName>
</topic>
<topic ordinal="4">
<topicName>XML databases</topicName>
</topic>
</topics>
<assignments>
<assignment id="1">
<assignmentName>Midterm exam</assignmentName>
<points>4</points>
</assignment>
<assignment id="2">
<assignmentName>Final exam</assignmentName>
<points>15</points>
</assignment>
</assignments>
</topiscAndAssignments>
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
43
Assignment – solution
<?xml version="1.0" encoding="UTF-8"?>
topicsAndAssignments.xsd
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="topicsAndAssignments">
<xs:complexType>
<xs:sequence>
<xs:element ref="topics"/>
<xs:element ref="assignments"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="topics">
<xs:complexType>
<xs:sequence>
<xs:element ref="topic" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="topic">
<xs:complexType>
<xs:sequence>
<xs:element ref="topicName"/>
</xs:sequence>
<xs:attribute name="ordinal" type="xs:integer" use="required"/>
</xs:complexType>
</xs:element>
<xs:element name="topicName" type="xs:string"/>
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
44
Assignment – solution
<xs:element name="assignments">
<xs:complexType>
<xs:sequence>
<xs:element ref="assignment" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="assignment">
<xs:complexType>
<xs:sequence>
<xs:element ref="assignmentName"/>
<xs:element ref="points"/>
</xs:sequence>
<xs:attribute name="id" type="xs:integer" use="required"/>
</xs:complexType>
</xs:element>
<xs:element name="assignmentName" type="xs:string"/>
<xs:element name="points">
<xs:simpleType>
<xs:restriction base="xs:decimal">
<xs:minInclusive value="0.00"/>
<xs:maxInclusive value="100.00"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:schema>
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
45
XML in database management systems
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
46
XML in DBMSs today
Well-defined concepts are the result of
many years of development.
1. XML enabled DBMS
relational database management
system that supports XML data type
and know work with it efficiently
2. Native XML DBMS
custom data structures
optimised for X-languages (Xpath,
Xquery,…)
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
47
Relational DBMSs with XML support
Some relational DBMS with support for XML:
Oracle 9i+
Microsoft SQL Server 2000+
Microsoft Access XP
IBM DB2
PostgreSQL 8.4+
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
48
Microsoft SQL Server 2012 and XML - example
<diplomaThesis thesisId="34567">
<title>Algorithms
for finding the shortest path in the graph</title>
<assignment>In this thesis, it is necessary to describe the basic idea, show
pseudocode and analyze the complexity of the two algorithms for
finding the shortest path in the graph: Dijkstra and Floyd.
Implement both algorithms, and compare the performance time
on specific default graphs.
</assignment>
</diplomaThesis>
diplomaId
1
thesisXML
<diplomaThesis thesisId="34567">
<title>Algorithms
for finding the shortest path in the graph</title>
<assignment>In
this thesis, it is necessary to describe the basic idea, show
pseudocode and analyze the complexity of the two algorithms for
finding the shortest path in the graph: Dijkstra and Floyd.
Implement both algorithms, and compare the performance time
on specific default graphs.
</assignment>
</diplomaThesis>
…
…
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
49
Microsoft SQL Server 2012 i XML - primjer
CREATE XML SCHEMA COLLECTION diplomaThesisSchema AS
'<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="diplomaThesis">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="assignment" type="xs:string"/>
</xs:sequence>
<xs:attribute name="thesisId" type="xs:integer"/>
</xs:complexType>
</xs:element>
</xs:schema>'
CREATE TABLE diplomaThesis
(diplomaId INT PRIMARY KEY
, thesisXML XML (diplomaThesisSchema))
INSERT INTO diplomaThesis VALUES (1,
'<diplomaThesis thesisId="34567">
<title>Algorithms for …</title>
<assignment>In this …</assignment>
</diplomaThesis>'
)
 ZPR-FER - Zagreb
INSERT INTO diplomaThesis VALUES (2,
'<diplomaThesis thesisId="34444">
<title> Generators of the …</title>
<assignment>In this …</assignment>
<thesisContent/>
</diplomaThesis>' )
XML Validation: Unexpected element(s): thesisContent.
Location: /*:diplomaThesis[1]/*:thesisContent[1]
Advanced Databases 2013/2014
50
PostgreSQL and XML - example
It is not possible to validate XML content
CREATE TABLE diplomaThesis
(diplomaId INT PRIMARY KEY
, thesisXML XML)
INSERT INTO diplomaThesis VALUES (1,
'<diplomaThesis thesisId="34567">
<title>Algorithms for …</title>
<assignment>In this …</assignment>
</diplomaThesis>'
)
INSERT INTO diplomaThesis VALUES (2,
'<diplomaThesis thesisId="34444">
<title>Generators of the …</title>
<assignment>In this …</assignment>
<thesisContent/>
</diplomaThesis>' )
SELECT xpath('/diplomaThesis/title', thesisXML)
FROM diplomaThesis
title
XML[ ]
"{"<title>Algorithms for …</title>"}"
"{"<title>Generators of the high…</title>"}"
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
51
Native XML database (NXD) (1)
Definicija 1.
(Gavin Powell: Beginning XML Database, 2007)
A native XML database is essentially any method of storing XML data as an XML document.
Definicija 2.
(http://www.ibm.com/developerworks/xml/library/x-mxd4.html)
A native XML database is one that treats XML documents and elements as the fundamental
structures rather than tables, records, and fields
Definicija 3.
(XML:DB Initiative)
A native XML database...
Defines a (logical) model for an XML document -- as opposed to the data in that document -- and
stores and retrieves documents according to that model. At a minimum, the model must include
elements, attributes, PCDATA, and document order. Examples of such models are the XPath data
model, the XML Infoset, and the models implied by the DOM and the events in SAX 1.0.
Has an XML document as its fundamental unit of (logical) storage, just as a relational database has
a row in a table as its fundamental unit of (logical) storage.
Is not required to have any particular underlying physical storage model. For example, it can be
built on a relational, hierarchical, or object-oriented database, or use a proprietary storage format
such as indexed, compressed files.
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
52
Native XML database (NXD) (2)
In a database only XML can be stored – from the database only XML can be
retrieved
XML is stored in a form optimized to work with the hierarchical structure of an
XML document
The basic logical unit for storing data is document
XML data type is a special data type
equivalent structure in the RDBMS is tuple
System architecture for storing XML documents in Basex:
Meta Data: common information, statistics
Main Table: tabular representation of each node
Directory: additional information for speed updates
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
53
Indexes in NXD
Three types of indexes
1. Value indexes
They index text and attribute values
Used to resolve queries such as, “Find all elements or attributes whose value is ‘Advanced
Databases’”
2. Structural indexes
They index location of elements and attributes
Used to resolve queries such as, “Find all Course elements”
Value and structural indexes are combined to resolve queries such as, “Find all Course
elements whose value is ‘Advanced Databases’”
3. Full-text indexes
They index individual tokens in text and attribute values
Used to resolve queries such as, “Find all documents that contain the words ‘Advanced
Databases’”
Used with structural indexes for queries like, “Find all documents that contain the words
“Advanced Databases” inside an Course element”
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
54
NXD – query languages, updates
Query languages
Most NXDs support XPath and XQuery for querying
XPath is most commonly used query language for NXDs. But XPath
lacks functionality like grouping, sorting, cross document joins, etc.
XQuery has overcome these shortcomings and becomes a standard
Updates
Some databases replaces entire documents
Strategies:
Changes through the DOM (object representation of XML trees)
Special language for updates – use XPath to find a node, perform insert
(before or after the node), update or deletion of the node
Extending XQuery
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
55
XPath
Xpath is a languge for extracting parts
of XML documents.
Path expressions (like the one in unix
or linux OS) are used to navigate
through the XML document.
XPath examples:
<?xml version='1.0' encoding="utf-8"?>
<studyProgrames>
<studyProgram acYear="2013/2014">
<study studyName="Computing">
<course>
<courseName>Baze podataka</courseName>
<description>This is the … </description>
<ects>7.5</ects>
</course>
...
</study>
<study studyName="Power Engineering">
...
</study>
</studyProgram>
...
< studyProgrames>
//course[description]
All elements named course, having sub element named
description
//study[@studyName]
All elements named study, having attribute named
studyName
//studyProgram[@acYear="2013/2014"]
All elements named studyProgram, having attribute named
acYear with the value equals to "2013/2014"
//course[courseName="Advanced
Databases"][ects = 4]
All elements named course, having element named
courseName with the value "Advanced Databases" and
element name ects with the value equals to 4
//course[courseName="Advanced
Databases" or courseName="C#"]
All elements named course, having element named
courseName with the value "Advanced Databases" or "C#"
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
56
XQuery
Language for querying data stored in the form of XML documents,
with or without a defined schema (XML Schema or DTD)
Uses XPath 2.0
XQuery is for XML the same as SQL for relational databases
Supported in majority XML enabled DBMSs (IBM, Oracle,
Microsoft,…) and in majority NXDs
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
57
FLWOR expressions
Similar to SQL SELECT-FROM-WHERE-ORDER BY expressions
FOR – Iterates through a sequence, bind variable to items
LET – binds a variable to a sequence
WHERE –eliminates items of the iteration
ORDER BY – sorts query results
RETURN – constructs query results
Example:
for $s in //study
let $p := $s/course
where $s/@studyName="Computing"
order by $s/../@acYear
return <study>{ concat( $s/@studyName
, " ", $s/../@acYear
, " cnt="
, count($p))
}
</study>
Result (baseX):
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
58
When and why use native XML database?
When storing unstructured or semi-structured data
When the data schema changes frequently
• For XML documents stored in NXD XML schema can, but need
not be defined
When it is important to preserve the original layout of XML documents
Due to the custom query language optimised to work efficiently with
XML data type
Because of the query processing speed that uses an index structure
tailored to the XML data type
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
59
MarkLogic Server – example of NXD
Besides NXD characteristics, includes the majority of RDBMS
features:
MarkLogic Server is a native XML database that can convert
Word, PowerPoint, Excel, PDF, and HTML documents to XML
Supports Xquery and full-text search using wildcards, stemming,
spell checking and so on
It has a single index that combines full-text index, value index and
structural index
Supports transactions, triggers, journaling, role-based security,
clustering, and backup
http://www.marklogic.com/product/demos.html
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
60
The best known NXD realization
Open source
eXist - http://exist.sourceforge.net/
BaseX - basex.org
Sedna - http://www.sedna.org/
XIndice (Apache) - http://xml.apache.org/xindice/
Comercial
MarkLogic http://www.marklogic.com/
Tamino XML Server
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
61
Homework – XML data model
Arbitrarily choose a real world segment and information about it store in the XML
document (LastName.xml). Design the scheme of the document with XML Schema
(LastName.xsd ) .
Briefly describe the modelled segment of the real world (in file LastName.txt )
In the XML schema demonstrate the definition of at least one:
1. element or attribute of the following data types: string, integer, boolean, date
other data types, use arbitrarily
2. default value for the element or aribute
3. domain constraint (which can not be expressed using only the primitive data type) for the
element or attribute
4. unique constraint
5. primary key constraint
6. foreign key constraint
Note: to make XML and XSD documents easier, use one of the XML editors (eg free or
trial versions of EditX , oXigen etc.)
Additionally, in the file LastName.txt for each point 1st-6th specify what changes need
to be done in a valid XML document that will lead to a violation of individual limitations.
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
62
Homework – eXist native XML DBMS
Download and install Java SE Development Kit (JDK)
http://www.oracle.com/technetwork/java/javase/downloads/jdk7downloads-1880260.html
Download and install eXist - native XML DBMS
http://exist-db.org/exist/apps/homepage/index.html
Create collection in eXist DBMS and place file 0036000001.xsd in it.
Ensure that eXist allows to store XML document LastName.xml only if it
complies with the rules defined in the scheme LastName.xsd.
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
63
Configure implicit validation in eXist
Configure implicit validation of XML documents (http://existdb.org/exist/apps/doc/validation.xml):
1.
In $EXIST_HOME /conf.xml file set attribute mode of the element validation to the
value "auto" (or to the value "yes"):
segment of the conf.xml file
...
<validation mode="auto">
<entity-resolver>
<catalog uri="${WEBAPP_HOME}/WEB-INF/catalog.xml"/>
</entity-resolver>
</validation>
...
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
64
Configure implicit validation in eXist
2.
Modify file /db/system/config/db/collection.xconf (using eXist interface)
to include the following element:
element:<validation mode="yes"/>
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
65
Configure implicit validation in eXist
3.
Modify file
$EXIST_HOME\webapp\WEB-INF\entities\catalog.xml
so that it contains the following element:
segment of the catalog.xml file
<uri name="http://www.fer.hr/0036000001"
uri="xmldb:exist:///db/0036000001/0036000001.xsd" />
4.
Place the file 0036000001.xsd into the folder
$EXIST_HOME\tools\wrapper\bin
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
66
Homework – storing XML in eXist
An attempt to store an XML document that is not in accordance with a
defined XML scheme will produce errors such as the one shown in the
picture:
5 points
Solution (files 0036000001.xml, 0036000001.xsd and 0036000001.txt) send
attached in email to [email protected]
Deadline: 03.12.2013 u 9:00.
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
67
Literature
A.B. Chaundry, A. Rashid, R. Zicari: XML Data Management
Native XML and XML/Enabled Database Systems, AddisonWesley, 2003
G. Powel: Beginning XML Databases, Wiley Publishing, Inc, 2007
K. Williams: Professional XML Databases, Wrox Press Ltd, 2000
http://xmldb-org.sourceforge.net/faqs.html
http://www.rpbourret.com/xml/XMLDatabaseProds.htm
http://www.w3.org/XML
http://www.xml.com
 ZPR-FER - Zagreb
Advanced Databases 2013/2014
68