XML Document
Transcription
XML Document
Kosten für Ausfallzeiten (Downtimes) Schwer zu schätzen, beinhaltet Kosten für: • nicht getätigte Geschäfte • Wiederherstellungskosten • Verlust von Kunden Loyalität • Strafen Geschäftsfeld • Fehler im Lagerbestand Aktienhandel •… $6,45 Millionen Kreditkarten-Gesellschaften $2,6 Millionen Home Shopping Channel $113.750 Katalog Verkaufs Center $90.000 Fluglinien Reservierung $89.500 Versand Service $28.250 Geldautomat Service $14.500 Nach: Craig S. Mullins Database Administration – The Complete Guide to Practices and Procedures 2002 – Addison-Wesley – ISBN 0-201-74129-6 WS2008/2009 Geschätzte Kosten pro Stunde Stand: 2002!!!! DBIS/Dr. Karsten Tolle Kosten für Ausfallzeiten (Downtimes) Ausfallzeiten pro Jahr Verfügbarkeit Minuten Stunden Kosten pro Jahr* 99,999% 5 0,08 8.000 $ 99,99% 53 0,88 88.000 $ 99,9% 526 8,77 887.000 $ 99,5% 2628 43,8 4.380.000 $ 99% 5256 87,6 8.760.000 $ * Bei Kosten für eine Stunde Ausfallzeit von 100.000 $ Nach: Craig S. Mullins Database Administration – The Complete Guide to Practices and Procedures 2002 – Addison-Wesley – ISBN 0-201-74129-6 WS2008/2009 Stand: 2002!!!! DBIS/Dr. Karsten Tolle eXtensible Markup Language (XML) WS2008/2009 DBIS/Dr. Karsten Tolle Working process at W3C XML is hosted by the World Wide Web Consortium (W3C) www.w3.org. WS2008/2009 DBIS/Dr. Karsten Tolle Motivation II WS2008/2009 DBIS/Dr. Karsten Tolle • XML also used for defining O/R-Mapping – EJB – Deployment Descriptor <entity> – JDO <ejb-name>Person</ejb-name> … – Hibernate <persistence-type>Container</persistence-type> <cmp-version>2.x</cmp-version> … –… <cmp-field> • XML is basis for Semantic Web WS2008/2009 <field-name>id</field-name> </cmp-field> <cmp-field> <field-name>name</field-name> </cmp-field> <primkey-field>id</primkey-field> … </entity> DBIS/Dr. Karsten Tolle A first example <?xml version="1.0" ?> <contact> <address type="business"> <name>Tolle</name> <firstname>Karsten</firstname> <street>Robert-Mayer-Str.</street> <town>Frankfurt</town> </address> </contact> WS2008/2009 DBIS/Dr. Karsten Tolle XML document classification XML documents can be classified on the base of data they contain: • Data-centric – capture structured data, e.g. product catalog • Document-centric – capture unstructured data as in articles, books, or e-mails • Hybrid documents – are both datacentric and document-centric WS2008/2009 DBIS/Dr. Karsten Tolle History of XML • 1969 Goldfarb, Mosher, Lorie – GML (bei IBM) • 1986 SGML (ISO 8879) • 1989 HTML - Tim Berners-Lee (Cern); W3C • 1997 XML 1.0 (W3C: Jon Bosak, James Clark u.a.) • 1999 Namespaces in XML 1.0 • 2000 XML 1.0 (Second Edition) •… • 2006 XML 1.0 (Forth Ed.) | XML 1.1 (second Ed.) • 2007 XSL 2.0 WS2008/2009 DBIS/Dr. Karsten Tolle Current status of XML There are two W3C Recommendations (both from 16 August 2006): – Extensible Markup Language (XML) 1.0 (Fourth Edition) – Extensible Markup Language (XML) 1.1 (Second Edition) link WS2008/2009 DBIS/Dr. Karsten Tolle Difference 1.0 and 1.1 The overall philosophy of names has changed since XML 1.0. Whereas XML 1.0 provided a rigid definition of names, wherein everything that was not permitted was forbidden, XML 1.1 names are designed so that everything that is not forbidden (for a specific reason) is permitted. Since Unicode will continue to grow past version 4.0, further changes to XML can be avoided by allowing almost any character, including those not yet assigned, in names. WS2008/2009 DBIS/Dr. Karsten Tolle Additional differences … • line-end conventions • XML 1.1 allows the use of character references to the control characters #x1 through #x1F • XML 1.1 defines a set of constraints called "full normalization" on XML documents … see detailed inside XML 1.1 spec If not specified we refer in the following slides to XML 1.0! WS2008/2009 DBIS/Dr. Karsten Tolle Relation of SGML, HTML and XML SGML Application of HTML WS2008/2009 Subset of XML DBIS/Dr. Karsten Tolle Main design goals of XML • XML shall support a wide variety of applications. • It shall be easy to write programs which process XML documents. • XML documents should be human-legible and reasonably clear. • The design of XML shall be formal and concise. • XML documents shall be easy to create. WS2008/2009 DBIS/Dr. Karsten Tolle XML Processor • XML processor – is a software module used to read XML documents and provide access to their content and structure. XML parser ≡ XML processor • The XML specification describes the required behaviour of an XML processor in terms of how it must read XML data and the information it must provide to the application. WS2008/2009 DBIS/Dr. Karsten Tolle XML Tools • http://www.garshol.priv.no/download/xmltools/ WS2008/2009 DBIS/Dr. Karsten Tolle XML Tools • http://www.garshol.priv.no/download/xmltools/ WS2008/2009 DBIS/Dr. Karsten Tolle XML Syntax and Grammar [1] document ::= prolog element Misc* ... [3] S ::= (#x20 | #x9 | #xD | #xA)+ | space, carriage returns, line feeds, or tabs [4] NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender [5] Name ::= (Letter | '_' | ':') (NameChar)* [6] Names ::= Name (#x20 Name)* [7] Nmtoken ::= (NameChar)+ [8] Nmtokens ::= Nmtoken (#x20 Nmtoken)* ... [89] Extender ::= #x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 | #x0E46 | #x0EC6 | #x3005 | [#x3031-#x3035] | [#x309D-#x309E] | [#x30FC-#x30FE] WS2008/2009 DBIS/Dr. Karsten Tolle XML Document • A data object is an XML document if it is wellformed. A well-formed XML document may in addition be valid if it meets certain further constraints. • [1] document ::= prolog element Misc* • Quantifiers ? for 0 or 1 * for 0 or more + for 1 or more WS2008/2009 DBIS/Dr. Karsten Tolle well-formed The most important well-formed constraints (there are further): – At least one element. – There is exactly one document element (root) containing the rest of the document. – Elements are properly nested. If a start-tag is contained in an element its end-tag is in the content of the same element. – Empty elements without end-tag need to be closed by „/>“. – All attributes have quoted values (single or double). – No duplicate attributes on the same element. Not well-formed documents are not XML documents! WS2008/2009 DBIS/Dr. Karsten Tolle A first example – well-formed? <?xml version="1.0" ?> <contact> <address type="business"> <name>Tolle</name> <firstname value= ="Karsten"> <street>Robert-Mayer-Str.</street> <town>Frankfurt</town> </address> </contact> WS2008/2009 DBIS/Dr. Karsten Tolle A first example – well-formed! <?xml version="1.0" ?> <contact> <address type="business"> <name>Tolle</name> <firstname value= ="Karsten"/> <street>Robert-Mayer-Str.</street> <town>Frankfurt</town> </address> </contact> WS2008/2009 DBIS/Dr. Karsten Tolle Element Element is the basic organizational structural unit of an XML document. It consists of an start-tag (might containing attributes), a end-tag, and all of its contents (enclosed by opening and closing tag). [39] element ::= EmptyElemTag | STag content ETag [WFC: Element Type Match][VC: Element Valid] [40] STag ::= '<' Name (S Attribute)* S? '>' [42] ETag ::= '</' Name S? '> ' [44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>' WS2008/2009 DBIS/Dr. Karsten Tolle Element content An element may contain: – nothing (empty element; if no end-tag it must be closed by „/>“) – text – elements – combination of text and elements (mixed content) – CDATA sections – Processing Instructions – Comments [43] content ::= CharData? ((element | Reference | CDSect | PI | Comment) CharData?)* WS2008/2009 DBIS/Dr. Karsten Tolle Element Examples <data></data> ≡ <data/> empty element element containing text <firstName>Karsten</firstName> <phoneNumber> <areaCode>069</areaCode> <local>798-28212</local> </phoneNumber> WS2008/2009 element containing elements DBIS/Dr. Karsten Tolle Attributes • Always applied to the start-tag of an element • Must always have an equals sign followed by a quoted value – value may be quoted with single or double quotes, but not mixed – value can be empty – value may contain only text (no elements, no comments, etc...) • Order is not significant [41] Attribute ::= Name Eq AttValue [10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'" WS2008/2009 DBIS/Dr. Karsten Tolle Attribute Examples • Examples: <phoneNumber areaCode="069" local="798-28434"/> <phoneNumber local="798-28434" areaCode="069" /> <data export='true' expires="2004-01-09"></data> WS2008/2009 DBIS/Dr. Karsten Tolle Naming elements and attributes [4] NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender [5] Name ::= (Letter | '_' | ':') (NameChar)* • • • • Case sensitive Must start with a letter, underscore, or colon No defined size limit Avoid colons (not allowed in some applications, e.g. IE) – Colons are used in XML Namespaces http://www.w3.org/TR/REC-xml-names/ WS2008/2009 DBIS/Dr. Karsten Tolle XML 1.1 • [4] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0 -#xFFFD] | [#x10000-#xEFFFF] • [4a] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040] • [5] Name ::= NameStartChar (NameChar)* WS2008/2009 DBIS/Dr. Karsten Tolle Prolog [1] document ::= prolog element Misc* [22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' [24] VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"'VersionNum'"') [25] Eq ::= S? '=' S? [26] VersionNum ::= ([a-zA-Z0-9_.:] | '-')+ [27] Misc ::= Comment | PI | S • Prolog (all parts optional) – XML Declaration • must be at the beginning of the document – Processing Instructions – Document Type Declarations – Comments WS2008/2009 DBIS/Dr. Karsten Tolle Prolog Example XML Declaration <?xml version="1.0"?> Processing <?xml-stylesheet Instructions type="text/css" href="style.css"?> <!DOCTYPE demo SYSTEM "demo.dtd"> <!-- This is a demonstration --> Document <demo/> Type Comment Declarations WS2008/2009 DBIS/Dr. Karsten Tolle XML Declaration • If it appears, it must be at the very beginning of the document • "Attributes" (exact order is important) – version is required (values: 1.0, 1.1) – encoding is optional (values: UTF-8, UTF-16, ISO-8859-1, ISO-8859-2, etc...) – standalone is optional (values: yes or no) • Examples: <?xml version="1.0" encoding="UTF-8" ?> <?xml version="1.0" standalone="yes" ?> WS2008/2009 DBIS/Dr. Karsten Tolle Sample XML Document in Japanese WS2008/2009 DBIS/Dr. Karsten Tolle Processing Instructions (PI) [16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?> ' [17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l')) • To allow documents to contain instructions for applications • PI has two parts – Target – valid name (same rules as element name) – Instructions – any sequence of characters Examples <?xml-stylesheet type="text/xsl" href="convert.xsl"?> <?myTarget This part contains my instructions?> WS2008/2009 DBIS/Dr. Karsten Tolle Document Type Declaration [28] doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S? ('['(markupdecl | DeclSep)* ']' S?)? '>' • Used for DTD validation (more later) • Can be embedded or external • External can be marked as – SYSTEM (retrievable using a URL) – PUBLIC (allows caching/hardcoding) • Example: <!DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> ]> <greeting>Hello, world!</greeting> WS2008/2009 DBIS/Dr. Karsten Tolle Comments [15] Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->' • Comments can be used to hide tags and data <!-- <hiddenTag> data <notProcessed /> </hiddenTag> --> • Comments may be stripped out by the processor • Comments may appear before or after the root element or inside an element's content <root>data<!--comment--></root> <!--comment after the root element--> WS2008/2009 DBIS/Dr. Karsten Tolle References [67] Reference ::= EntityRef | CharRef Character reference (decimal or hexadecimal) refers to ISO/IEC 10646 character set: [66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ '; ' e.g.: @ ≡ @ ≡ @ Entity reference refers to the content of a named entity, using ampersand (&) and semicolon (;) as delimiters. [68] EntityRef ::= '&' Name ';' WS2008/2009 DBIS/Dr. Karsten Tolle Entity References • Predefined entities < > & " ' WS2008/2009 < > & " ' less than greater than ampersand quote apostrophe DBIS/Dr. Karsten Tolle Entity Examples <root> <if test='size<5'></if> <message title='Don't forget to "backup".'/> <body>If the child's age < 12 then give them M&M's.</body> <german umlaute=" Ä Ö Ü ä ö ü "/> IE </root> <root> <if test='size<5'> <!-- Using a less than sign in an attribute --> </if> <message title='Don't forget to "backup".'/> <!--Using single or double quotes in an attribute --> <body>If the child's age < 12 then give them M&M's.</body> <!--Using an ampersand or less than sign as text content of an element--> <german umlaute=" Ä Ö Ü ä ö ü "/> <!-- Using language specific signs --> </root> IE WS2008/2009 DBIS/Dr. Karsten Tolle External entity references [71] GEDecl ::= '<!ENTITY' S Name S EntityDef S? '>' [73] EntityDef ::= EntityValue | (ExternalID NDataDecl?) [75] ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral Example: <!ENTITY open-hatch SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml"> WS2008/2009 DBIS/Dr. Karsten Tolle Internal entity references example <?xml version="1.0"?> <!DOCTYPE rdf:RDF [ <!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#"> ]> <!-- Universal RDF namespace written by Karsten Tolle, 05.12.2002. --> <rdf:RDF xmlns="&rdf;" xmlns:rdf="&rdf;" xmlns:rdfs="&rdfs;"> ... WS2008/2009 DBIS/Dr. Karsten Tolle What else do we need? • How to specify the structure of a document? – DTD – XML Schema • How to mix sets of elements? – XML Namespaces WS2008/2009 DBIS/Dr. Karsten Tolle DTD Document Type Definition WS2008/2009 DBIS/Dr. Karsten Tolle Document Type Definition • A Document Type Definition DTD defines by using a formal grammar, which is part of the XML specification (markup declaration): – – – – – elements, structure these elements can appear, attributes the elements can/must have, possible and default values for attributes, ... and further more. • An XML document is valid if it has an associated document type declaration and if the document complies with the constraints expressed in it. WS2008/2009 DBIS/Dr. Karsten Tolle Valid XML <!DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> ]> <greeting>Hello, world!</greeting> or <!DOCTYPE greeting SYSTEM "hello.dtd"> <greeting>Hello, world!</greeting> ... and hello.dtd: <!ELEMENT greeting (#PCDATA)> WS2008/2009 DBIS/Dr. Karsten Tolle Element Declaration [45] elementdecl ::= '<!ELEMENT' S Name S contentspec S? '>' [46] contentspec ::= 'EMPTY' | 'ANY' | Mixed | children Validity constraint: Unique Element Type Declaration! No element type may be declared more than once. Examples of element type declarations: <!ELEMENT br EMPTY> <!ELEMENT %name.para; %content.para; > //entity ref. with % <!ELEMENT container ANY> WS2008/2009 DBIS/Dr. Karsten Tolle Element Content children [47] children ::= (choice | seq) ('?' | '*' | '+')? [48] cp ::= (Name | choice | seq) ('?' | '*' | '+')? [49] choice ::= '(' S? cp ( S? '|' S? cp )+ S? ')' [50] seq ::= '(' S? cp ( S? ',' S? cp )* S? ')' • Sequence: <!ELEMENT article (title, subject, date)> • Choice (OR) <!ELEMENT article (title|subject|author))> • Quantifiers (? for 0 or 1, * for 0 or more, + for 1 or more) <!ELEMENT article (title,author+))> <!ELEMENT article (title,subject?,author*))> WS2008/2009 DBIS/Dr. Karsten Tolle Element Content mixed [51] Mixed ::= '(' S? '#PCDATA' (S? '|' S? Name)* S? ')*' | '(' S? '#PCDATA' S? ') ‚ Examples: <!ELEMENT p (#PCDATA|a|ul|b|i|em)*> <!ELEMENT p (#PCDATA | %font; | %phrase; | %special; | %form;)* > <!ELEMENT b (#PCDATA)> Note: The keyword #PCDATA derives historically from the term "parsed character data." WS2008/2009 DBIS/Dr. Karsten Tolle Attribute Declaration [52] AttlistDecl ::= '<!ATTLIST' S Name AttDef* S? '> ' [53] AttDef ::= S Name S AttType S DefaultDecl Attribute Types [54] AttType ::= StringType | TokenizedType | EnumeratedType [55] StringType ::= 'CDATA‚ [56] TokenizedType ::= 'ID'| 'IDREF'| 'IDREFS'| 'ENTITY'| 'ENTITIES'| 'NMTOKEN'| 'NMTOKENS' WS2008/2009 DBIS/Dr. Karsten Tolle Enumerated Attributes Definition: Enumerated attributes can take one of a list of values provided in the declaration]. There are two kinds of enumerated types: Enumerated Attribute Types [57] EnumeratedType ::= NotationType | Enumeration [58] NotationType ::= 'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')' [59] Enumeration ::= '(' S? Nmtoken (S? '|' S? Nmtoken)* S? ')' WS2008/2009 DBIS/Dr. Karsten Tolle Default Declaration An attribute declaration provides information on whether the attribute's presence is required, and if not, how an XML processor should react if a declared attribute is absent in a document. Attribute Defaults [60] DefaultDecl ::= '#REQUIRED' | '#IMPLIED' | (('#FIXED' S)?AttValue) WS2008/2009 DBIS/Dr. Karsten Tolle Default Declaration Examples <!ATTLIST termdef id ID #REQUIRED name CDATA #IMPLIED> <!ATTLIST list type (bullets|ordered|glossary) "ordered"> <!ATTLIST form method CDATA #FIXED "POST"> [52] AttlistDecl ::= '<!ATTLIST' S Name AttDef* S? '> ' [53] AttDef ::= S Name S AttType S DefaultDecl WS2008/2009 DBIS/Dr. Karsten Tolle Example Element and Attributes root element <!DOCTYPE articles [ <!ELEMENT articles (article*)> <!ELEMENT article EMPTY> <!ATTLIST article title CDATA #REQUIRED author CDATA #IMPLIED > ]> <articles> <article title="Extensible Markup Language" author="Karsten Tolle" /> </articles> WS2008/2009 DBIS/Dr. Karsten Tolle IMPLIED or Default <!DOCTYPE articles [ ... <!ATTLIST article title CDATA #REQUIRED author CDATA #IMPLIED or author CDATA “Karsten Tolle” > ]> <articles> <article title="Extensible Markup Language"/> </articles> Note: If a default value is given and the attribute does not appear the processor includes it with default value. If on the other hand #IMPLIED was used omitted attributes will not be included. WS2008/2009 DBIS/Dr. Karsten Tolle external vs local External DTD: <?xml version="1.0"?> <!DOCTYPE greeting SYSTEM "hello.dtd"> <greeting>Hello, world!</greeting> Local DTD: <?xml version="1.0" encoding="UTF-8" ?> <! DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> ]> <greeting>Hello, world!</greeting> WS2008/2009 DBIS/Dr. Karsten Tolle TV Schedule DTD By David Moisan. Copied from his Web: http://www.davidmoisan.org/ <!DOCTYPE TVSCHEDULE [ <!ELEMENT TVSCHEDULE (CHANNEL+)> <!ELEMENT CHANNEL (BANNER,DAY+)> <!ELEMENT BANNER (#PCDATA)> <!ELEMENT DAY (DATE,(HOLIDAY|PROGRAMSLOT+)+)> <!ELEMENT HOLIDAY (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT PROGRAMSLOT (TIME,TITLE,DESCRIPTION?)> <!ELEMENT TIME (#PCDATA)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT DESCRIPTION (#PCDATA)> <!ATTLIST TVSCHEDULE NAME CDATA #REQUIRED> <!ATTLIST CHANNEL CHAN CDATA #REQUIRED> <!ATTLIST PROGRAMSLOT VTR CDATA #IMPLIED> <!ATTLIST TITLE RATING CDATA #IMPLIED> <!ATTLIST TITLE LANGUAGE CDATA #IMPLIED> ]> WS2008/2009 DBIS/Dr. Karsten Tolle DTD – Summary • XML file is valid if file is conform with DTD • This can be tested by so called: Validating XML Parser • For most applications it is useful to test if an XML input file is valid according the expected format/interpretation. DTD Validating XML Parser XML Application WS2008/2009 DBIS/Dr. Karsten Tolle Drawbacks DTD • DTD uses cryptic SGML syntax – difficult to write – difficult to read – differs from the XML syntax • DTD by default provides just a small set of data types • Each XML file can only be based on one DTD! WS2008/2009 DBIS/Dr. Karsten Tolle Namespaces in XML WS2008/2009 DBIS/Dr. Karsten Tolle Namespaces • To help identify origin or meaning of an element or attribute • To allow two sets of elements to be combined even if there are duplicate element names Example: <data xmlns:fruit="http://www.thirdm.com/fruit" xmlns:corp="http://www.thirdm.com/corporations"> <fruit:apple qty="5" type="Granny Smith"/> <corp:apple stockticker="AAPL" exchange="NASDAQ"/> </data> WS2008/2009 DBIS/Dr. Karsten Tolle Namespace URI • URI = Uniform Resource Identifier • Used to uniquely identify the namespace • There is no need of existence (XML), for other applications like RDF this might differ! Example: <food xmlns:fruit="http://www.thirdm.com/fruit" xmlns:veg="http://www.thirdm.com/vegetables"> <fruit:apple qty="5"/> <fruit:pear qty="6"/> <veg:potato qty="7"/> </food> WS2008/2009 DBIS/Dr. Karsten Tolle Namespace Prefix • Used to refer to the the namespace • Typically short, often three letters Example: <rdf:RDF xmlns:rdf = ”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:rdfs = ”http://www.w3.org/2000/01/rdf-schema#”> <rdfs:Class rdf:ID=“Book”> </rdfs:Class> </rdf:RDF> Note: The ‘#’ anchor sign is used in RDF to point directly to resources inside a namespace. WS2008/2009 DBIS/Dr. Karsten Tolle Another example … <?xml version="1.0" encoding="UTF-8" ?> <Order xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2" xmlns:ccts="urn:oasis:names:specification:ubl:schema:xsd:CoreComponentParameters-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2 xmlns:udt="urn:un:unece:uncefact:data:draft:UnqualifiedDataTypesSchemaModule:2" xmlns="urn:oasis:names:specification:ubl:schema:xsd:Order-2"> <cbc:UBLVersionID>2.0</cbc:UBLVersionID> <cbc:CustomizationID>urn:oasis:names:specification:ubl:xpath:Order-2.0:sbs-1.0draft</cbc:CustomizationID> <cbc:ProfileID>bpid:urn:oasis:names:draft:bpss:ubl-2-sbs-order-with-simple-responsedraft</cbc:ProfileID> <cbc:ID>AEG012345</cbc:ID> <cbc:SalesOrderID>CON0095678</cbc:SalesOrderID> <cbc:CopyIndicator>false</cbc:CopyIndicator> … IE WS2008/2009 DBIS/Dr. Karsten Tolle Default Namespace • Used to identify the namespace for elements without a prefix Example: <rdf:RDF xmlns:rdf = ”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns = ”http://www.w3.org/2000/01/rdf-schema#”> <Class rdf:ID=“Book”> </Class> </rdf:RDF> WS2008/2009 DBIS/Dr. Karsten Tolle Attributes and Namespaces • Never associated with default namespace! • Can have explicit namespace prefix Example: <!-- http://www.w3.org is bound to n1 and is the default --> <x xmlns:n1="http://www.w3.org" xmlns="http://www.w3.org" > <good a="1" b="2" /> <good a="1" n1:a="2" /> </x> WS2008/2009 DBIS/Dr. Karsten Tolle Namespace Scope • Scope is limited to the element the namespace is defined in • May be overridden by child element <rdf:RDF xmlns:rdf = ”http://www.w3.org/1999/02/22-rdf-syntaxns#” xmlns:rdfs = ”http://www.w3.org/2000/01/rdf-schema#”> <rdfs:Class xmlns:rdfs=“http://www.myrdfs.com” rdf:ID=“Book”> </rdfs:Class> </rdf:RDF> WS2008/2009 DBIS/Dr. Karsten Tolle Notes about Namespaces • Namespaces are not part of XML 1.0 • An XML parser (processor) may or may not support XML Namespaces • Some parsers allow you to check at runtime to ensure it supports namespaces WS2008/2009 DBIS/Dr. Karsten Tolle Notes about Namespaces • DTD's and Namespaces are compatible but do not work well together – For example, the namespace prefix must be static if elements are declared in the DTD Quote form Namespaces in XML 1.0: Note that DTD-based validation is not namespace-aware in the following sense: a DTD constrains the elements and attributes that may appear in a document by their uninterpreted names, not by (namespace name, local name) pairs. To validate a document that uses namespaces against a DTD, the same prefixes must be used in the DTD as in the instance. A DTD may however indirectly constrain the namespaces used in a valid document by providing #FIXED values for attributes that declare namespaces. WS2008/2009 DBIS/Dr. Karsten Tolle XML Schema WS2008/2009 DBIS/Dr. Karsten Tolle XML Schema – Why? • DTD uses cryptic SGML syntax – difficult to write – difficult to read – differs from the XML syntax • DTD provides just a small set of data types • Each XML file can only be based on one DTD! • DTD and XML Namespaces do not work well together WS2008/2009 DBIS/Dr. Karsten Tolle Schema Root Element and targetNamespace <xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema“ targetNamespace=“http://www.dbis.de“> <!-- element and attribute declarations go here --> </xsd:schema> WS2008/2009 DBIS/Dr. Karsten Tolle Element Declaration <xsd:element name="notice" type="xsd:string"/> • Compare to: <!ELEMENT notice (#PCDATA)> • Note that the prefix usage inside the attribute value might not work with any XML application. E.g. in RDF it would not be allowed. Entity references sould be used instead. WS2008/2009 DBIS/Dr. Karsten Tolle Attribute Declaration <xsd:element name="article"> <xsd:complexType> <xsd:attribute name="title" type="xsd:string" use="required"/> <xsd:attribute name="author" type="xsd:string" use="required"/> </xsd:complexType> </xsd:element> WS2008/2009 DBIS/Dr. Karsten Tolle Data Types • XML Schema Part 2: Datatypes Second Edition – W3C Recommendation 28 October 2004 – http://www.w3.org/TR/xmlschema-2/ • Built-in datatypes are those which are defined in this specification, and can be either primitive or derived; • User-derived datatypes are those derived datatypes that are defined by individual schema designers. WS2008/2009 DBIS/Dr. Karsten Tolle WS2008/2009 DBIS/Dr. Karsten Tolle Element Declaration with Children <xsd:element name="publications"> <xsd:complexType> <xsd:sequence> <xsd:choice minOccurs="0" maxOccurs="unbounded"> <xsd:element ref="article"/> <xsd:element ref="book"/> </xsd:choice> <xsd:element ref="notice" minOccurs="0"/> </xsd:sequence> </xsd:complexType> </xsd:element> Compare to DTD: <!ELEMENT publications ((article | book)*, notice?)> WS2008/2009 DBIS/Dr. Karsten Tolle Separation into logical parts An XML Schema might get huge. It is therefore useful to separate the definitions of logical parts, like the definition for an address from other parts. This makes it easier to maintain and reuse. <schema targetNamespace="http://www.example.com/IBEST" xmlns="http:// www.w3.org/2001/XMLSchema" xmlns:ibest="http://www.example.com/IBEST"> <annotation> <documentation xml:lang="DE"> Adressen für das internationale Buchbestellungsschema für Example.com. Copyright 2001 Example.com. Alle Rechte vorbehalten. </documentation> </annotation> WS2008/2009 DBIS/Dr. Karsten Tolle Another example … <?xml version="1.0" encoding="UTF-8"?> <!-Document Type: Order Generated On: Tue Oct 03 2:26:38 P3 2006 view --> <!-- ===== xsd:schema Element With Namespaces Declarations ===== --> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="urn:oasis:names:specification:ubl:schema:xsd:Order-2" xmlns="urn:oasis:names:specification:ubl:schema:xsd:Order-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:udt="urn:un:unece:uncefact:data:specification:UnqualifiedDataTypesSchemaModule:2" xmlns:ccts="urn:un:unece:uncefact:documentation:2" xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2" elementFormDefault="qualifi <!-- ===== Imports ===== --> <xsd:import namespace="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" sch <xsd:import namespace="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" schemaL <xsd:import namespace="urn:un:unece:uncefact:data:specification:UnqualifiedDataTypesSchemaModule:2" sc <xsd:import namespace="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" sch <xsd:import namespace="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2" schemaLocation <!-- ===== Root Element ===== --> <xsd:element name="Order" type="OrderType"> <xsd:annotation> WS2008/2009 DBIS/Dr. Karsten Tolle Include To include separated parts of a schema the main schema uses the include element. <include schemaLocation="http://www.example.com/schemas/adresse.xsd"/ > Main Schema Include Schemas WS2008/2009 DBIS/Dr. Karsten Tolle Validierung <?xml version="1.0" encoding="UTF-8"?> <Order xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2" xmlns:ccts="urn:oasis:names:specification:ubl:schema:xsd:CoreComponentParameters-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:udt="urn:un:unece:uncefact:data:draft:UnqualifiedDataTypesSchemaModule:2" xmlns="urn:oasis:names:specification:ubl:schema:xsd:Order-2"> <cbc:UBLVersionID>2.0</cbc:UBLVersionID> <cbc:CustomizationID>urn:oasis:names:specification:ubl:xpath:Order-2.0:sbs-1.0draft</cbc:CustomizationID> <cbc:ProfileID>bpid:urn:oasis:names:draft:bpss:ubl-2-sbs-order-with-simple-response-dr <cbc:ID>AEG012345</cbc:ID> … </Order> WS2008/2009 DBIS/Dr. Karsten Tolle Processing XML SAX vs DOM vs StAX WS2008/2009 DBIS/Dr. Karsten Tolle DOM (Document Object Model) Generates the tree structure out of the elements contained in the XML document. WS2008/2009 DBIS/Dr. Karsten Tolle DOM (Document Object Model) • Very useful for small documents • Random access to structure using objects • Can read, manipulate, and write XML programmatically • Write recursive code to explore child nodes of unknown or evolving schema • Write hard-coded procedures to handle static well-known schema WS2008/2009 DBIS/Dr. Karsten Tolle SAX (Simple API for XML) Based on events like (default handler): – startDocument () – endDocument () – startElement (java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes attributes) – endElement (java.lang.String uri, java.lang.String localName, java.lang.String qName) – error (SAXParseException e) – fatalError (SAXParseException e) WS2008/2009 DBIS/Dr. Karsten Tolle SAX (Simple API for XML) • Uses much less memory then DOM, especially for large documents (but for some applications more than one pass is needed) WS2008/2009 DBIS/Dr. Karsten Tolle DOM vs SAX DOM SAX memory - + flexibility + - performace - (*) + (*) Standard w3c xml-develop * Depending on the application, if more than one pass needed DOM might be better! WS2008/2009 DBIS/Dr. Karsten Tolle Problems … • What if DOM and SAX are both not acceptable? E.g. mobile devices with J2ME • DOM needs to much memory • Common streaming APIs like SAX are all push APIs – It is the SAX parser pushing the tokens into the application not easy to handle WS2008/2009 DBIS/Dr. Karsten Tolle public class Flour extends DefaultHandler { … public void startElement(String namespaceURI, String localName, String qName, Attributes atts) { … } … public static void main(String[] args) { Flour f = new Flour(); SAXParser p = new SAXParser(); p.setContentHandler(f); try { p.parse(args[0]); } catch (Exception e) { e.printStackTrace(); } System.out.println(f.amount); } … WS2008/2009 DBIS/Dr. Karsten Tolle Alternative: StAX • StAX (Streaming API for XML) – a pull parsing API – With e.g. next() the next token can be called by the application. – JSR 173 (Java Specification Request) http://jcp.org/en/jsr/detail?id=173 WS2008/2009 DBIS/Dr. Karsten Tolle Example for calling next … while (true) { int event = parser.next(); if (event == XMLStreamConstants.END_DOCUMENT) { parser.close(); break; } if (event == XMLStreamConstants.START_ELEMENT) { System.out.println(parser.getLocalName()); } } WS2008/2009 DBIS/Dr. Karsten Tolle Transforming XML WS2008/2009 DBIS/Dr. Karsten Tolle GIF, JPG, NSK-TIFF etc. AVI, AU, WAV, WMA, MP3 etc. DOC, HTML, PDF, etc. MPG, WMV, RM, etc. JPEG, GIF etc. WS2008/2009 DBIS/Dr. Karsten Tolle XSL (Version 2.0 von Jan 2007) The XML Stylesheet Language (XSL) has three subcomponents: • XSL-FO XSL-Formatting Objects, an XML vocabulary for specifying formatting semantics. • XSLT This the transformation language, which lets you transform XML into some other format. • XPath XPath is an addressing mechanism that lets you specify a path to an element. WS2008/2009 DBIS/Dr. Karsten Tolle • Extensible Stylesheet Language (XSL) Version 1.0 – W3C Recommendation 15 October 2001 WS2008/2009 DBIS/Dr. Karsten Tolle XSLT Processing WS2008/2009 DBIS/Dr. Karsten Tolle XSLT (XML Stylesheet Language Transformations) • XSLT is a programming language • Write scripts containing if statements and for-each loops • Uses XPath for querying document, math calculations, and string functions • Can transform XML into HTML or text • Useful for transforming XML to XML WS2008/2009 DBIS/Dr. Karsten Tolle So, what is XML • A meta markup language • Structured information that complies to a standard structure and syntax • “The ASCII of the 21st Century” • Platform independent information for: – – – – – – Presentation instructions User settings Data repository Data transfer RPC calls ... WS2008/2009 DBIS/Dr. Karsten Tolle What XML is not • XML is not tied to any human language or character encoding • XML is not tied to any computing platform or programming language • XML has no semantics WS2008/2009 DBIS/Dr. Karsten Tolle Literature I • XML Professionell; Richard Anderson u.a.; MITP-Verlag; 2000; ISBN 3-8266-0633-7 • XML Data Management; Akmal B. Chaudhri, Awais Rashid and Roberto Zicari; Addison Wesley; 2003; ISBN 0-201-84452-4 WS2008/2009 DBIS/Dr. Karsten Tolle Literature II Resources of DBIS related to XML (German): • Einführung in XML & Document Type Definition; Alexander Semino; Seminar SS 2001 • XML-Schemata; Markus Krauße; Seminar SS 2001 • XSL – Dokumente mit Stil; Fabian Wleklinski; Seminar SS 2001 • HTML und XML; Christina Anthes; Proseminar SS2002 WS2008/2009 DBIS/Dr. Karsten Tolle Literature III • Read recommendations at W3C: www.w3.org • … search the Web! WS2008/2009 DBIS/Dr. Karsten Tolle