pdf4up
Transcription
pdf4up
Universität Konstanz Database & Information Systems Group Prof. Marc H. Scholl Universität Konstanz Database & Information Systems Group Prof. Marc H. Scholl Tutorial XML & Databases WS 05/06 – Christian Grün Tutorial XML & Databases WS 05/06 – Christian Grün XML Database – Table Storage Ideas of the Pathfinder Approach: XML & Databases • 30 years of experience in relational DBMS: – scalability (huge amounts of data can be stored in simple tables) – automatic index structures (B-Trees, Hash Tables) – query optimization, transaction management etc. Tutorial • • • 6. Building a Database XML documents can be broken down to table entries tree representation: store child / parent nodes? document order is essential for XML documents: – – – – Christian Grün, Database & Information Systems Group University of Konstanz, Winter 2005/06 sequential document reading corresponds to a depth-first, left-to-right traversal of the tree opening elements will be numbered as PRE value ending elements will be numbered as POST value current level depth is stored as LEVEL value Paper: Torsten Grust, Accelerating XPath Location Steps. SIGMOD Conference, 2002 Seite 3 Universität Konstanz Database & Information Systems Group Prof. Marc H. Scholl Tutorial XML & Databases WS 05/06 – Christian Grün Tutorial XML & Databases WS 05/06 – Christian Grün XML Database XML Database – Table Storage State-of-the-Art: Tree Structure: • • • we have created an XMLScanner and an XMLParser tokens will now be processed in an XML Database (XMLDB) XMLDB implements an Events interface & XMLParser sends notifications: XMLConstants s end ext XMLScanner // scan tokens scan() ... new() getToken() extends XMLParser // parse tokens parse() ... Universität Konstanz Database & Information Systems Group Prof. Marc H. Scholl html head XMLParserEvents exten ds new() startElement() endElement() content() body title h1 XML Databases & XML div implements <html> <head> <title>XML</title> </head> <body bgcolor="#FFFFFF" text = "#000000"> <h1>Databases & XML</h1> <div align="right"> <b>Assignments</b> <ul> <li>Exercise 1</li> <li>Exercise 2</li> </ul> </div> </body> </html> XMLDB // build database build() ... Seite 2 b Assignments ul li li Exercise 1 Exercise 2 Seite 4 Universität Konstanz Database & Information Systems Group Prof. Marc H. Scholl Universität Konstanz Database & Information Systems Group Prof. Marc H. Scholl Tutorial XML & Databases WS 05/06 – Christian Grün Tutorial XML & Databases WS 05/06 – Christian Grün XML Database – Table Storage XML Database – Table Storage Tree Structure: Tree Structure: 1 html 15 html 14 2 head body 3 title h1 XML Databases & XML div 2 b Assignments li li Exercise 1 Exercise 2 13 1 <html> <head> <title>XML</title> </head> <body bgcolor="#FFFFFF" text = "#000000"> <h1>Databases & XML</h1> <div align="right"> <b>Assignments</b> <ul> <li>Exercise 1</li> <li>Exercise 2</li> </ul> </div> </body> </html> 7 8 5 h1 XML ul body 6 title 4 <html> <head> <title>XML</title> </head> <body bgcolor="#FFFFFF" text = "#000000"> <h1>Databases & XML</h1> <div align="right"> <b>Assignments</b> <ul> <li>Exercise 1</li> <li>Exercise 2</li> </ul> </div> </body> </html> 5 3 head div 4 12 9 Databases & XML b 10 11 7 ul 6 12 Assignments 13 11 li 14 9 8 li 15 Exercise 1 10 Exercise 2 Seite 5 Seite 7 Universität Konstanz Database & Information Systems Group Prof. Marc H. Scholl Universität Konstanz Database & Information Systems Group Prof. Marc H. Scholl Tutorial XML & Databases WS 05/06 – Christian Grün Tutorial XML & Databases WS 05/06 – Christian Grün XML Database – Table Storage XML Database – Table Storage Tree Structure: Pre/Post Table: 1 html 2 5 head 3 body 6 title 8 h1 div 7 4 XML <html> <head> <title>XML</title> </head> <body bgcolor="#FFFFFF" text = "#000000"> <h1>Databases & XML</h1> <div align="right"> <b>Assignments</b> <ul> <li>Exercise 1</li> <li>Exercise 2</li> </ul> </div> </body> </html> Databases & XML 9 11 b ul 10 Assignments 12 li 13 Exercise 1 14 li 15 Exercise 2 Seite 6 <html> <head> <title>XML</title> </head> <body bgcolor="#FFFFFF" text = "#000000"> <h1>Databases & XML</h1> <div align="right"> <b>Assignments</b> <ul> <li>Exercise 1</li> <li>Exercise 2</li> </ul> </div> </body> </html> PRE POST LEVEL TYPE TOKEN 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 15 3 2 1 14 5 4 13 7 6 12 9 8 11 10 1 2 3 4 2 3 4 3 4 5 4 5 6 5 6 html head title XML body h1 Databases & XML div b Assignments ul li Exercise 1 li Exercise 2 elem elem elem text elem elem text elem elem text elem elem text elem text Seite 8 Universität Konstanz Database & Information Systems Group Prof. Marc H. Scholl Universität Konstanz Database & Information Systems Group Prof. Marc H. Scholl Tutorial XML & Databases WS 05/06 – Christian Grün Tutorial XML & Databases WS 05/06 – Christian Grün XML Database – Table Storage XML Database – Table Storage Pre/Post Plane: Attributes: • • <html> <head> <title>XML</title> </head> <body bgcolor="#FFFFFF" text = "#000000"> <h1>Databases & XML</h1> <div align="right"> <b>Assignments</b> <ul> <li>Exercise 1</li> <li>Exercise 2</li> </ul> </div> </body> </html> attributes are "owned" by their elements t no table storage another table field is introduced which references attribute arrays PRE POST LEVEL TYPE TOKEN 1 2 3 4 5 6 7 8 9 ... 15 3 2 1 14 5 4 13 7 ... 1 2 3 4 2 3 4 3 4 ... html head title XML body atts h1 Databases... div atts b ... elem elem elem text elem elem text elem elem ... ATTS ATTRIBUTE VALUE bgcolor text #FFFFFF #000000 ATTRIBUTE VALUE align right Seite 9 Seite 11 Universität Konstanz Database & Information Systems Group Prof. Marc H. Scholl Tutorial XML & Databases WS 05/06 – Christian Grün Tutorial XML & Databases WS 05/06 – Christian Grün XML Database – Table Storage Building the Table: • handling startElement events: – new XMLNode is created and stored in the XMLTable – node is pushed to the stack (XMLNodeStack) – current level value is assigned and increased • handling endElement events: – – t – – • last XMLNode is popped from the stack new token is compared with token from stack node error is dumped if names of opening and closing tag differ current post value is assigned and increased current level value is decreased handling content events: – new XMLNode is created and stored in the XMLTable – current level value is assigned – current post value is assigned and increased Universität Konstanz Database & Information Systems Group Prof. Marc H. Scholl XML Database – Implementation <html> <head> <title>XML</title> </head> <body bgcolor="#FFFFFF" text = "#000000"> <h1>Databases & XML</h1> <div align="right"> <b>Assignments</b> <ul> <li>Exercise 1</li> <li>Exercise 2</li> </ul> </div> </body> </html> Architecture (simplified): XMLConstants e ext s nd XMLScanner POST LEVEL TOKEN 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 15 3 2 1 14 5 4 13 7 6 12 9 8 11 10 1 2 3 4 2 3 4 3 4 5 4 5 6 5 6 html head title XML body h1 Databases... div b Assignments ul li Exercise 1 li Exercise 2 Seite 10 XMLNode XMLAttributes implements XMLParser XMLNodeStack PRE XMLParserEvents ext e nd s extends XMLAttributes XMLDB XMLTable XMLNode XMLNode XMLToken XMLToken XMLToken Seite 12 Database & Information Systems Group Prof. Marc H. Scholl Universität Konstanz Database & Information Systems Group Prof. Marc H. Scholl Tutorial XML & Databases WS 05/06 – Christian Grün Tutorial XML & Databases WS 05/06 – Christian Grün XML – Full Output XML – Partial Output State-of-the-Art: Full Output: • • • • we have created a scanner, a parser and a XML table representation we need to handle XML outputs before starting the output, we need to specify a proper file encoding encoding of output stream should match the text declaration's encoding Back to the XML representation: Partial Output: • • • • • • • Universität Konstanz we sequentially read the table a stack is used to remember opened tags the current tag is printed and pushed to the stack before the current tag is processed, all closing nodes with post of stack node < post of current node are taken from the stack when all nodes are processed, the remaining stack nodes must be printed XPath queries find partial matches inside the document tree to support partial matches, we stop output when table end is reached or post value of initial node < post value of current node Formatted Output: • • empty tags are directly closed ( e.g. <br/> ) by checking the next node's post value nodes are relatively indented to the first output node Seite 13 Database & Information Systems Group Prof. Marc H. Scholl Universität Konstanz Seite 15 Database & Information Systems Group Prof. Marc H. Scholl Tutorial XML & Databases WS 05/06 – Christian Grün Universität Konstanz Tutorial XML & Databases WS 05/06 – Christian Grün XML – Full Output write node (initial pre value, initial level depth): initialize stack remember post value of initial node Pseudo Code: while pre < table size: get node of current pre value break when post < post value of current node initialize stack while stack not empty and post value of stack's top node < post value of node: pop node from stack and print indented closing tag for all nodes: get current node while stack not empty and post value of stack's top node < post value of node: pop node from stack and print closing tag if node type is TEXT: print content of node if node type is ELEMENT or DOCUMENT: if post value of next node > post value of current node: print indented empty tag and (if available) node attributes otherwise: print indented opening tag and (if available) node attributes push node to the stack if node type is TEXT: print content of node if node type is ELEMENT or DOCUMENT: print opening tag and (if available) node attributes push node to the stack while stack not empty: pop node from stack and print closing tag increase pre while stack not empty: pop node from stack and print indented closing tag Seite 14 Seite 16