pdf4up

Transcription

pdf4up
Universität
Konstanz
Database & Information Systems Group
Prof. Marc H. Scholl
Universität
Konstanz
Database & Information Systems Group
Prof. Marc H. Scholl
Tutorial XML & Databases WS 05/06 – Christian Grün
Tutorial XML & Databases WS 05/06 – Christian Grün
XML Database – Table Storage
Ideas of the Pathfinder Approach:
XML & Databases
•
30 years of experience in relational DBMS:
– scalability (huge amounts of data can be stored in simple tables)
– automatic index structures (B-Trees, Hash Tables)
– query optimization, transaction management etc.
Tutorial
•
•
•
6. Building a Database
XML documents can be broken down to table entries
tree representation: store child / parent nodes?
document order is essential for XML documents:
–
–
–
–
Christian Grün, Database & Information Systems Group
University of Konstanz, Winter 2005/06
sequential document reading corresponds to a depth-first, left-to-right traversal of the tree
opening elements will be numbered as PRE value
ending elements will be numbered as POST value
current level depth is stored as LEVEL value
Paper: Torsten Grust, Accelerating XPath Location Steps. SIGMOD Conference, 2002
Seite 3
Universität
Konstanz
Database & Information Systems Group
Prof. Marc H. Scholl
Tutorial XML & Databases WS 05/06 – Christian Grün
Tutorial XML & Databases WS 05/06 – Christian Grün
XML Database
XML Database – Table Storage
State-of-the-Art:
Tree Structure:
•
•
•
we have created an XMLScanner and an XMLParser
tokens will now be processed in an XML Database (XMLDB)
XMLDB implements an Events interface & XMLParser sends notifications:
XMLConstants
s
end
ext
XMLScanner
// scan tokens
scan()
...
new()
getToken()
extends
XMLParser
// parse tokens
parse()
...
Universität
Konstanz
Database & Information Systems Group
Prof. Marc H. Scholl
html
head
XMLParserEvents
exten
ds
new()
startElement()
endElement()
content()
body
title
h1
XML
Databases
& XML
div
implements
<html>
<head>
<title>XML</title>
</head>
<body bgcolor="#FFFFFF" text = "#000000">
<h1>Databases &amp; XML</h1>
<div align="right">
<b>Assignments</b>
<ul>
<li>Exercise 1</li>
<li>Exercise 2</li>
</ul>
</div>
</body>
</html>
XMLDB
// build database
build()
...
Seite 2
b
Assignments
ul
li
li
Exercise 1
Exercise 2
Seite 4
Universität
Konstanz
Database & Information Systems Group
Prof. Marc H. Scholl
Universität
Konstanz
Database & Information Systems Group
Prof. Marc H. Scholl
Tutorial XML & Databases WS 05/06 – Christian Grün
Tutorial XML & Databases WS 05/06 – Christian Grün
XML Database – Table Storage
XML Database – Table Storage
Tree Structure:
Tree Structure:
1
html
15
html
14
2
head
body
3
title
h1
XML
Databases
&amp; XML
div
2
b
Assignments
li
li
Exercise 1
Exercise 2
13
1
<html>
<head>
<title>XML</title>
</head>
<body bgcolor="#FFFFFF" text = "#000000">
<h1>Databases &amp; XML</h1>
<div align="right">
<b>Assignments</b>
<ul>
<li>Exercise 1</li>
<li>Exercise 2</li>
</ul>
</div>
</body>
</html>
7
8
5
h1
XML
ul
body
6
title
4
<html>
<head>
<title>XML</title>
</head>
<body bgcolor="#FFFFFF" text = "#000000">
<h1>Databases &amp; XML</h1>
<div align="right">
<b>Assignments</b>
<ul>
<li>Exercise 1</li>
<li>Exercise 2</li>
</ul>
</div>
</body>
</html>
5
3
head
div
4
12
9
Databases
&amp; XML
b
10
11
7
ul
6
12
Assignments
13
11
li
14
9
8
li
15
Exercise 1
10
Exercise 2
Seite 5
Seite 7
Universität
Konstanz
Database & Information Systems Group
Prof. Marc H. Scholl
Universität
Konstanz
Database & Information Systems Group
Prof. Marc H. Scholl
Tutorial XML & Databases WS 05/06 – Christian Grün
Tutorial XML & Databases WS 05/06 – Christian Grün
XML Database – Table Storage
XML Database – Table Storage
Tree Structure:
Pre/Post Table:
1
html
2
5
head
3
body
6
title
8
h1
div
7
4
XML
<html>
<head>
<title>XML</title>
</head>
<body bgcolor="#FFFFFF" text = "#000000">
<h1>Databases &amp; XML</h1>
<div align="right">
<b>Assignments</b>
<ul>
<li>Exercise 1</li>
<li>Exercise 2</li>
</ul>
</div>
</body>
</html>
Databases
&amp; XML
9
11
b
ul
10
Assignments
12
li
13
Exercise 1
14
li
15
Exercise 2
Seite 6
<html>
<head>
<title>XML</title>
</head>
<body bgcolor="#FFFFFF" text = "#000000">
<h1>Databases &amp; XML</h1>
<div align="right">
<b>Assignments</b>
<ul>
<li>Exercise 1</li>
<li>Exercise 2</li>
</ul>
</div>
</body>
</html>
PRE
POST
LEVEL TYPE
TOKEN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
15
3
2
1
14
5
4
13
7
6
12
9
8
11
10
1
2
3
4
2
3
4
3
4
5
4
5
6
5
6
html
head
title
XML
body
h1
Databases &amp; XML
div
b
Assignments
ul
li
Exercise 1
li
Exercise 2
elem
elem
elem
text
elem
elem
text
elem
elem
text
elem
elem
text
elem
text
Seite 8
Universität
Konstanz
Database & Information Systems Group
Prof. Marc H. Scholl
Universität
Konstanz
Database & Information Systems Group
Prof. Marc H. Scholl
Tutorial XML & Databases WS 05/06 – Christian Grün
Tutorial XML & Databases WS 05/06 – Christian Grün
XML Database – Table Storage
XML Database – Table Storage
Pre/Post Plane:
Attributes:
•
•
<html>
<head>
<title>XML</title>
</head>
<body bgcolor="#FFFFFF" text = "#000000">
<h1>Databases &amp; XML</h1>
<div align="right">
<b>Assignments</b>
<ul>
<li>Exercise 1</li>
<li>Exercise 2</li>
</ul>
</div>
</body>
</html>
attributes are "owned" by their elements t no table storage
another table field is introduced which references attribute arrays
PRE
POST
LEVEL TYPE
TOKEN
1
2
3
4
5
6
7
8
9
...
15
3
2
1
14
5
4
13
7
...
1
2
3
4
2
3
4
3
4
...
html
head
title
XML
body
atts
h1
Databases...
div
atts
b
...
elem
elem
elem
text
elem
elem
text
elem
elem
...
ATTS
ATTRIBUTE VALUE
bgcolor
text
#FFFFFF
#000000
ATTRIBUTE VALUE
align
right
Seite 9
Seite 11
Universität
Konstanz
Database & Information Systems Group
Prof. Marc H. Scholl
Tutorial XML & Databases WS 05/06 – Christian Grün
Tutorial XML & Databases WS 05/06 – Christian Grün
XML Database – Table Storage
Building the Table:
•
handling startElement events:
– new XMLNode is created and stored in the XMLTable
– node is pushed to the stack (XMLNodeStack)
– current level value is assigned and increased
•
handling endElement events:
–
–
t
–
–
•
last XMLNode is popped from the stack
new token is compared with token from stack node
error is dumped if names of opening and closing tag differ
current post value is assigned and increased
current level value is decreased
handling content events:
– new XMLNode is created and stored in the XMLTable
– current level value is assigned
– current post value is assigned and increased
Universität
Konstanz
Database & Information Systems Group
Prof. Marc H. Scholl
XML Database – Implementation
<html>
<head>
<title>XML</title>
</head>
<body bgcolor="#FFFFFF" text = "#000000">
<h1>Databases &amp; XML</h1>
<div align="right">
<b>Assignments</b>
<ul>
<li>Exercise 1</li>
<li>Exercise 2</li>
</ul>
</div>
</body>
</html>
Architecture
(simplified):
XMLConstants
e
ext
s
nd
XMLScanner
POST
LEVEL
TOKEN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
15
3
2
1
14
5
4
13
7
6
12
9
8
11
10
1
2
3
4
2
3
4
3
4
5
4
5
6
5
6
html
head
title
XML
body
h1
Databases...
div
b
Assignments
ul
li
Exercise 1
li
Exercise 2
Seite 10
XMLNode
XMLAttributes
implements
XMLParser
XMLNodeStack
PRE
XMLParserEvents
ext
e nd
s
extends
XMLAttributes
XMLDB
XMLTable
XMLNode
XMLNode
XMLToken
XMLToken
XMLToken
Seite 12
Database & Information Systems Group
Prof. Marc H. Scholl
Universität
Konstanz
Database & Information Systems Group
Prof. Marc H. Scholl
Tutorial XML & Databases WS 05/06 – Christian Grün
Tutorial XML & Databases WS 05/06 – Christian Grün
XML – Full Output
XML – Partial Output
State-of-the-Art:
Full Output:
•
•
•
•
we have created a scanner, a parser and a XML table representation
we need to handle XML outputs
before starting the output, we need to specify a proper file encoding
encoding of output stream should match the text declaration's encoding
Back to the XML representation:
Partial Output:
•
•
•
•
•
•
•
Universität
Konstanz
we sequentially read the table
a stack is used to remember opened tags
the current tag is printed and pushed to the stack
before the current tag is processed, all closing nodes with
post of stack node < post of current node are taken from the stack
when all nodes are processed, the remaining stack nodes must be printed
XPath queries find partial matches inside the document tree
to support partial matches, we stop output when table end is reached or
post value of initial node < post value of current node
Formatted Output:
•
•
empty tags are directly closed ( e.g. <br/> ) by checking the next
node's post value
nodes are relatively indented to the first output node
Seite 13
Database & Information Systems Group
Prof. Marc H. Scholl
Universität
Konstanz
Seite 15
Database & Information Systems Group
Prof. Marc H. Scholl
Tutorial XML & Databases WS 05/06 – Christian Grün
Universität
Konstanz
Tutorial XML & Databases WS 05/06 – Christian Grün
XML – Full Output
write node (initial pre value, initial level depth):
initialize stack
remember post value of initial node
Pseudo Code:
while pre < table size:
get node of current pre value
break when post < post value of current node
initialize stack
while stack not empty and post value of stack's top node < post value of node:
pop node from stack and print indented closing tag
for all nodes:
get current node
while stack not empty and post value of stack's top node < post value of node:
pop node from stack and print closing tag
if node type is TEXT:
print content of node
if node type is ELEMENT or DOCUMENT:
if post value of next node > post value of current node:
print indented empty tag and (if available) node attributes
otherwise:
print indented opening tag and (if available) node attributes
push node to the stack
if node type is TEXT:
print content of node
if node type is ELEMENT or DOCUMENT:
print opening tag and (if available) node attributes
push node to the stack
while stack not empty:
pop node from stack and print closing tag
increase pre
while stack not empty:
pop node from stack and print indented closing tag
Seite 14
Seite 16