XML Document

Transcription

XML Document
Kosten für Ausfallzeiten
(Downtimes)
Schwer zu schätzen, beinhaltet Kosten für:
• nicht getätigte Geschäfte
• Wiederherstellungskosten
• Verlust von Kunden Loyalität
• Strafen
Geschäftsfeld
• Fehler im Lagerbestand
Aktienhandel
•…
$6,45 Millionen
Kreditkarten-Gesellschaften
$2,6 Millionen
Home Shopping Channel
$113.750
Katalog Verkaufs Center
$90.000
Fluglinien Reservierung
$89.500
Versand Service
$28.250
Geldautomat Service
$14.500
Nach: Craig S. Mullins
Database Administration – The Complete Guide to Practices and Procedures
2002 – Addison-Wesley – ISBN 0-201-74129-6
WS2008/2009
Geschätzte Kosten pro Stunde
Stand: 2002!!!!
DBIS/Dr. Karsten Tolle
Kosten für Ausfallzeiten
(Downtimes)
Ausfallzeiten pro Jahr
Verfügbarkeit
Minuten
Stunden
Kosten pro Jahr*
99,999%
5
0,08
8.000 $
99,99%
53
0,88
88.000 $
99,9%
526
8,77
887.000 $
99,5%
2628
43,8
4.380.000 $
99%
5256
87,6
8.760.000 $
* Bei Kosten für eine Stunde Ausfallzeit von 100.000 $
Nach: Craig S. Mullins
Database Administration – The Complete Guide to Practices and Procedures
2002 – Addison-Wesley – ISBN 0-201-74129-6
WS2008/2009
Stand: 2002!!!!
DBIS/Dr. Karsten Tolle
eXtensible Markup Language
(XML)
WS2008/2009
DBIS/Dr. Karsten Tolle
Working process at W3C
XML is hosted by the World Wide Web Consortium (W3C)
 www.w3.org.
WS2008/2009
DBIS/Dr. Karsten Tolle
Motivation II
WS2008/2009
DBIS/Dr. Karsten Tolle
• XML also used for defining O/R-Mapping
– EJB – Deployment Descriptor
<entity>
– JDO
<ejb-name>Person</ejb-name>
…
– Hibernate
<persistence-type>Container</persistence-type>
<cmp-version>2.x</cmp-version>
…
–…
<cmp-field>
• XML is basis for
Semantic Web
WS2008/2009
<field-name>id</field-name>
</cmp-field>
<cmp-field>
<field-name>name</field-name>
</cmp-field>
<primkey-field>id</primkey-field>
…
</entity>
DBIS/Dr. Karsten Tolle
A first example
<?xml version="1.0" ?>
<contact>
<address type="business">
<name>Tolle</name>
<firstname>Karsten</firstname>
<street>Robert-Mayer-Str.</street>
<town>Frankfurt</town>
</address>
</contact>
WS2008/2009
DBIS/Dr. Karsten Tolle
XML document classification
XML documents can be classified on the
base of data they contain:
• Data-centric – capture structured data,
e.g. product catalog
• Document-centric – capture unstructured
data as in articles, books, or e-mails
• Hybrid documents – are both datacentric and document-centric
WS2008/2009
DBIS/Dr. Karsten Tolle
History of XML
• 1969 Goldfarb, Mosher, Lorie – GML (bei IBM)
• 1986 SGML (ISO 8879)
• 1989 HTML - Tim Berners-Lee (Cern); W3C
• 1997 XML 1.0 (W3C: Jon Bosak, James Clark u.a.)
• 1999 Namespaces in XML 1.0
• 2000 XML 1.0 (Second Edition)
•…
• 2006 XML 1.0 (Forth Ed.) | XML 1.1 (second Ed.)
• 2007 XSL 2.0
WS2008/2009
DBIS/Dr. Karsten Tolle
Current status of XML
There are two W3C Recommendations
(both from 16 August 2006):
– Extensible Markup Language (XML) 1.0
(Fourth Edition)
– Extensible Markup Language (XML) 1.1
(Second Edition)
link
WS2008/2009
DBIS/Dr. Karsten Tolle
Difference 1.0 and 1.1
The overall philosophy of names has changed
since XML 1.0. Whereas XML 1.0 provided a
rigid definition of names, wherein everything
that was not permitted was forbidden, XML
1.1 names are designed so that everything
that is not forbidden (for a specific reason)
is permitted. Since Unicode will continue to
grow past version 4.0, further changes to
XML can be avoided by allowing almost any
character, including those not yet assigned, in
names.
WS2008/2009
DBIS/Dr. Karsten Tolle
Additional differences …
• line-end conventions
• XML 1.1 allows the use of character
references to the control characters #x1
through #x1F
• XML 1.1 defines a set of constraints called
"full normalization" on XML documents
… see detailed inside XML 1.1 spec
If not specified we refer in the following slides to XML 1.0!
WS2008/2009
DBIS/Dr. Karsten Tolle
Relation of SGML, HTML and
XML
SGML
Application of
HTML
WS2008/2009
Subset of
XML
DBIS/Dr. Karsten Tolle
Main design goals of XML
•
XML shall support a wide variety of applications.
•
It shall be easy to write programs which process
XML documents.
•
XML documents should be human-legible and
reasonably clear.
•
The design of XML shall be formal and concise.
•
XML documents shall be easy to create.
WS2008/2009
DBIS/Dr. Karsten Tolle
XML Processor
• XML processor – is a software module used to
read XML documents and provide access to
their content and structure.
 XML parser ≡ XML processor
• The XML specification describes the required
behaviour of an XML processor in terms of how
it must read XML data and the information it
must provide to the application.
WS2008/2009
DBIS/Dr. Karsten Tolle
XML Tools
• http://www.garshol.priv.no/download/xmltools/
WS2008/2009
DBIS/Dr. Karsten Tolle
XML Tools
• http://www.garshol.priv.no/download/xmltools/
WS2008/2009
DBIS/Dr. Karsten Tolle
XML Syntax and Grammar
[1] document ::= prolog element Misc*
...
[3] S ::= (#x20 | #x9 | #xD | #xA)+ | space, carriage returns, line feeds, or tabs
[4] NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender
[5] Name ::= (Letter | '_' | ':') (NameChar)*
[6] Names ::= Name (#x20 Name)*
[7] Nmtoken ::= (NameChar)+
[8] Nmtokens ::= Nmtoken (#x20 Nmtoken)*
...
[89] Extender ::= #x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 | #x0E46
| #x0EC6 | #x3005 | [#x3031-#x3035] | [#x309D-#x309E] | [#x30FC-#x30FE]
WS2008/2009
DBIS/Dr. Karsten Tolle
XML Document
• A data object is an XML document if it is wellformed. A well-formed XML document may in
addition be valid if it meets certain further
constraints.
• [1] document ::= prolog element Misc*
• Quantifiers
? for 0 or 1
* for 0 or more
+ for 1 or more
WS2008/2009
DBIS/Dr. Karsten Tolle
well-formed
The most important well-formed constraints (there are
further):
– At least one element.
– There is exactly one document element (root) containing the rest of
the document.
– Elements are properly nested.  If a start-tag is contained in an
element its end-tag is in the content of the same element.
– Empty elements without end-tag need to be closed by „/>“.
– All attributes have quoted values (single or double).
– No duplicate attributes on the same element.
Not well-formed documents are not XML documents!
WS2008/2009
DBIS/Dr. Karsten Tolle
A first example –
well-formed?
<?xml version="1.0" ?>
<contact>
<address type="business">
<name>Tolle</name>
<firstname value= ="Karsten">
<street>Robert-Mayer-Str.</street>
<town>Frankfurt</town>
</address>
</contact>
WS2008/2009
DBIS/Dr. Karsten Tolle
A first example –
well-formed!
<?xml version="1.0" ?>
<contact>
<address type="business">
<name>Tolle</name>
<firstname value= ="Karsten"/>
<street>Robert-Mayer-Str.</street>
<town>Frankfurt</town>
</address>
</contact>
WS2008/2009
DBIS/Dr. Karsten Tolle
Element
Element is the basic organizational structural unit of
an XML document. It consists of an start-tag
(might containing attributes), a end-tag, and all of
its contents (enclosed by opening and closing tag).
[39] element ::= EmptyElemTag | STag content ETag
[WFC: Element Type Match][VC: Element Valid]
[40] STag ::= '<' Name (S Attribute)* S? '>'
[42] ETag ::= '</' Name S? '> '
[44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>'
WS2008/2009
DBIS/Dr. Karsten Tolle
Element content
An element may contain:
– nothing (empty element; if no end-tag it must be
closed by „/>“)
– text
– elements
– combination of text and elements (mixed content)
– CDATA sections
– Processing Instructions
– Comments
[43] content ::= CharData? ((element | Reference | CDSect | PI
| Comment) CharData?)*
WS2008/2009
DBIS/Dr. Karsten Tolle
Element Examples
<data></data> ≡ <data/>
empty element
element containing text
<firstName>Karsten</firstName> <phoneNumber>
<areaCode>069</areaCode>
<local>798-28212</local>
</phoneNumber>
WS2008/2009
element
containing
elements
DBIS/Dr. Karsten Tolle
Attributes
• Always applied to the start-tag of an element
• Must always have an equals sign followed by
a quoted value
– value may be quoted with single or double quotes,
but not mixed
– value can be empty
– value may contain only text (no elements, no
comments, etc...)
• Order is not significant
[41] Attribute ::= Name Eq AttValue
[10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"
WS2008/2009
DBIS/Dr. Karsten Tolle
Attribute Examples
• Examples:
<phoneNumber areaCode="069"
local="798-28434"/>
<phoneNumber local="798-28434"
areaCode="069" />
<data export='true'
expires="2004-01-09"></data>
WS2008/2009
DBIS/Dr. Karsten Tolle
Naming elements and attributes
[4] NameChar ::= Letter | Digit | '.' | '-' | '_' |
':' |
CombiningChar |
Extender
[5] Name ::= (Letter | '_' | ':') (NameChar)*
•
•
•
•
Case sensitive
Must start with a letter, underscore, or colon
No defined size limit
Avoid colons (not allowed in some applications, e.g. IE)
– Colons are used in XML Namespaces
http://www.w3.org/TR/REC-xml-names/
WS2008/2009
DBIS/Dr. Karsten Tolle
XML 1.1
• [4] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] |
[#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] |
[#x370-#x37D] | [#x37F-#x1FFF] | [#x200C#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] |
[#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0
-#xFFFD] | [#x10000-#xEFFFF]
• [4a] NameChar ::= NameStartChar | "-" | "." |
[0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
• [5] Name ::= NameStartChar (NameChar)*
WS2008/2009
DBIS/Dr. Karsten Tolle
Prolog
[1] document ::= prolog element Misc*
[22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?
[23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
[24] VersionInfo ::= S 'version' Eq ("'" VersionNum "'" |
'"'VersionNum'"')
[25] Eq ::= S? '=' S?
[26] VersionNum ::= ([a-zA-Z0-9_.:] | '-')+
[27] Misc ::= Comment | PI | S
• Prolog (all parts optional)
– XML Declaration
• must be at the beginning of the document
– Processing Instructions
– Document Type Declarations
– Comments
WS2008/2009
DBIS/Dr. Karsten Tolle
Prolog Example
XML Declaration
<?xml version="1.0"?>
Processing
<?xml-stylesheet
Instructions
type="text/css"
href="style.css"?>
<!DOCTYPE demo SYSTEM "demo.dtd">
<!-- This is a demonstration -->
Document
<demo/>
Type
Comment
Declarations
WS2008/2009
DBIS/Dr. Karsten Tolle
XML Declaration
• If it appears, it must be at the very beginning
of the document
• "Attributes" (exact order is important)
– version is required (values: 1.0, 1.1)
– encoding is optional (values: UTF-8, UTF-16,
ISO-8859-1, ISO-8859-2, etc...)
– standalone is optional (values: yes or no)
• Examples:
<?xml version="1.0" encoding="UTF-8" ?>
<?xml version="1.0" standalone="yes" ?>
WS2008/2009
DBIS/Dr. Karsten Tolle
Sample XML Document in
Japanese
WS2008/2009
DBIS/Dr. Karsten Tolle
Processing Instructions (PI)
[16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>
'
[17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
• To allow documents to contain instructions for
applications
• PI has two parts
– Target – valid name (same rules as element name)
– Instructions – any sequence of characters
Examples
<?xml-stylesheet type="text/xsl" href="convert.xsl"?>
<?myTarget This part contains my instructions?>
WS2008/2009
DBIS/Dr. Karsten Tolle
Document Type Declaration
[28] doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S?
('['(markupdecl | DeclSep)* ']' S?)? '>'
• Used for DTD validation (more later)
• Can be embedded or external
• External can be marked as
– SYSTEM (retrievable using a URL)
– PUBLIC (allows caching/hardcoding)
• Example:
<!DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> ]>
<greeting>Hello, world!</greeting>
WS2008/2009
DBIS/Dr. Karsten Tolle
Comments
[15] Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'
• Comments can be used to hide tags and data
<!-- <hiddenTag> data
<notProcessed /> </hiddenTag> -->
• Comments may be stripped out by the processor
• Comments may appear before or after the root
element or inside an element's content
<root>data<!--comment--></root>
<!--comment after the root element-->
WS2008/2009
DBIS/Dr. Karsten Tolle
References
[67] Reference ::= EntityRef | CharRef
Character reference (decimal or hexadecimal) refers to
ISO/IEC 10646 character set:
[66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ '; '
e.g.: &#64; ≡ &#x40; ≡ @
Entity reference refers to the content of a named entity,
using ampersand (&) and semicolon (;) as delimiters.
[68] EntityRef ::= '&' Name ';'
WS2008/2009
DBIS/Dr. Karsten Tolle
Entity References
• Predefined entities
&lt;
&gt;
&amp;
&quot;
&apos;
WS2008/2009
<
>
&
"
'
less than
greater than
ampersand
quote
apostrophe
DBIS/Dr. Karsten Tolle
Entity Examples
<root>
<if test='size<5'></if>
<message
title='Don't forget to "backup".'/>
<body>If the child's age < 12 then give them M&M's.</body>
<german umlaute=" Ä Ö Ü ä ö ü "/>
IE
</root>
<root>
<if test='size&lt;5'>
<!-- Using a less than sign in an attribute -->
</if>
<message title='Don&apos;t forget to "backup".'/>
<!--Using single or double quotes in an attribute -->
<body>If the child's age &lt; 12 then give them M&amp;M's.</body>
<!--Using an ampersand or less than sign as text content of an element-->
<german umlaute=" &#xc4; &#xd6; &#xdc; &#xe4; &#xf6; &#xfc; "/>
<!-- Using language specific signs -->
</root>
IE
WS2008/2009
DBIS/Dr. Karsten Tolle
External entity references
[71] GEDecl ::= '<!ENTITY' S Name S EntityDef S? '>'
[73] EntityDef ::= EntityValue | (ExternalID NDataDecl?)
[75] ExternalID ::= 'SYSTEM' S SystemLiteral
| 'PUBLIC' S PubidLiteral S SystemLiteral
Example:
<!ENTITY open-hatch SYSTEM
"http://www.textuality.com/boilerplate/OpenHatch.xml">
WS2008/2009
DBIS/Dr. Karsten Tolle
Internal entity references
example
<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [
<!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#">
]>
<!-- Universal RDF namespace written by Karsten Tolle,
05.12.2002. -->
<rdf:RDF
xmlns="&rdf;"
xmlns:rdf="&rdf;"
xmlns:rdfs="&rdfs;">
...
WS2008/2009
DBIS/Dr. Karsten Tolle
What else do we need?
• How to specify the structure of a
document?
– DTD
– XML Schema
• How to mix sets of elements?
– XML Namespaces
WS2008/2009
DBIS/Dr. Karsten Tolle
DTD
Document Type Definition
WS2008/2009
DBIS/Dr. Karsten Tolle
Document Type Definition
• A Document Type Definition DTD defines by
using a formal grammar, which is part of the
XML specification (markup declaration):
–
–
–
–
–
elements,
structure these elements can appear,
attributes the elements can/must have,
possible and default values for attributes,
... and further more.
• An XML document is valid if it has an
associated document type declaration and if
the document complies with the constraints
expressed in it.
WS2008/2009
DBIS/Dr. Karsten Tolle
Valid XML
<!DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> ]>
<greeting>Hello, world!</greeting>
or
<!DOCTYPE greeting SYSTEM "hello.dtd">
<greeting>Hello, world!</greeting>
... and hello.dtd:
<!ELEMENT greeting (#PCDATA)>
WS2008/2009
DBIS/Dr. Karsten Tolle
Element Declaration
[45] elementdecl ::= '<!ELEMENT' S Name S contentspec S? '>'
[46] contentspec ::= 'EMPTY' | 'ANY' | Mixed | children
Validity constraint: Unique Element Type
Declaration!  No element type may be
declared more than once.
Examples of element type declarations:
<!ELEMENT br EMPTY>
<!ELEMENT %name.para; %content.para; > //entity ref. with %
<!ELEMENT container ANY>
WS2008/2009
DBIS/Dr. Karsten Tolle
Element Content children
[47] children ::= (choice | seq) ('?' | '*' | '+')?
[48] cp ::= (Name | choice | seq) ('?' | '*' | '+')?
[49] choice ::= '(' S? cp ( S? '|' S? cp )+ S? ')'
[50] seq ::= '(' S? cp ( S? ',' S? cp )* S? ')'
• Sequence:
<!ELEMENT article (title, subject, date)>
• Choice (OR)
<!ELEMENT article (title|subject|author))>
• Quantifiers (? for 0 or 1,
* for 0 or more, + for 1 or more)
<!ELEMENT article (title,author+))>
<!ELEMENT article (title,subject?,author*))>
WS2008/2009
DBIS/Dr. Karsten Tolle
Element Content mixed
[51] Mixed ::= '(' S? '#PCDATA' (S? '|' S? Name)* S? ')*' | '(' S?
'#PCDATA' S? ') ‚
Examples:
<!ELEMENT p (#PCDATA|a|ul|b|i|em)*>
<!ELEMENT p (#PCDATA | %font; | %phrase; | %special; | %form;)* >
<!ELEMENT b (#PCDATA)>
Note: The keyword #PCDATA derives historically from the
term "parsed character data."
WS2008/2009
DBIS/Dr. Karsten Tolle
Attribute Declaration
[52] AttlistDecl ::= '<!ATTLIST' S Name AttDef* S? '>
'
[53] AttDef ::= S Name S AttType S DefaultDecl
Attribute Types
[54] AttType ::= StringType | TokenizedType |
EnumeratedType
[55] StringType ::= 'CDATA‚
[56] TokenizedType ::= 'ID'| 'IDREF'| 'IDREFS'|
'ENTITY'| 'ENTITIES'| 'NMTOKEN'|
'NMTOKENS'
WS2008/2009
DBIS/Dr. Karsten Tolle
Enumerated Attributes
Definition: Enumerated attributes can take one of a
list of values provided in the declaration]. There are
two kinds of enumerated types:
Enumerated Attribute Types
[57] EnumeratedType ::= NotationType | Enumeration
[58] NotationType ::= 'NOTATION' S '(' S? Name (S? '|' S?
Name)* S? ')'
[59] Enumeration ::= '(' S? Nmtoken (S? '|' S? Nmtoken)*
S? ')'
WS2008/2009
DBIS/Dr. Karsten Tolle
Default Declaration
An attribute declaration provides
information on whether the attribute's
presence is required, and if not, how an
XML processor should react if a
declared attribute is absent in a
document.
Attribute Defaults
[60] DefaultDecl ::= '#REQUIRED'
| '#IMPLIED' | (('#FIXED' S)?AttValue)
WS2008/2009
DBIS/Dr. Karsten Tolle
Default Declaration
Examples
<!ATTLIST termdef
id ID #REQUIRED
name CDATA #IMPLIED>
<!ATTLIST list
type (bullets|ordered|glossary) "ordered">
<!ATTLIST form
method CDATA #FIXED "POST">
[52] AttlistDecl ::= '<!ATTLIST' S Name AttDef* S? '> '
[53] AttDef ::= S Name S AttType S DefaultDecl
WS2008/2009
DBIS/Dr. Karsten Tolle
Example Element and Attributes
root element
<!DOCTYPE articles [
<!ELEMENT articles (article*)>
<!ELEMENT article EMPTY>
<!ATTLIST article
title CDATA #REQUIRED
author CDATA #IMPLIED
>
]>
<articles>
<article title="Extensible Markup Language"
author="Karsten Tolle" />
</articles>
WS2008/2009
DBIS/Dr. Karsten Tolle
IMPLIED or Default
<!DOCTYPE articles [
...
<!ATTLIST article
title CDATA #REQUIRED
author CDATA #IMPLIED or author CDATA “Karsten Tolle”
>
]>
<articles>
<article title="Extensible Markup Language"/>
</articles>
Note: If a default value is given and the attribute does not appear the
processor includes it with default value. If on the other hand
#IMPLIED was used omitted attributes will not be included.
WS2008/2009
DBIS/Dr. Karsten Tolle
external vs local
External DTD:
<?xml version="1.0"?>
<!DOCTYPE greeting SYSTEM "hello.dtd">
<greeting>Hello, world!</greeting>
Local DTD:
<?xml version="1.0" encoding="UTF-8" ?> <!
DOCTYPE greeting
[ <!ELEMENT greeting (#PCDATA)> ]>
<greeting>Hello, world!</greeting>
WS2008/2009
DBIS/Dr. Karsten Tolle
TV Schedule DTD
By David Moisan. Copied from his Web: http://www.davidmoisan.org/
<!DOCTYPE TVSCHEDULE [
<!ELEMENT TVSCHEDULE (CHANNEL+)>
<!ELEMENT CHANNEL (BANNER,DAY+)>
<!ELEMENT BANNER (#PCDATA)>
<!ELEMENT DAY (DATE,(HOLIDAY|PROGRAMSLOT+)+)>
<!ELEMENT HOLIDAY (#PCDATA)>
<!ELEMENT DATE (#PCDATA)>
<!ELEMENT PROGRAMSLOT (TIME,TITLE,DESCRIPTION?)>
<!ELEMENT TIME (#PCDATA)>
<!ELEMENT TITLE (#PCDATA)> <!ELEMENT DESCRIPTION (#PCDATA)>
<!ATTLIST TVSCHEDULE NAME CDATA #REQUIRED>
<!ATTLIST CHANNEL CHAN CDATA #REQUIRED>
<!ATTLIST PROGRAMSLOT VTR CDATA #IMPLIED>
<!ATTLIST TITLE RATING CDATA #IMPLIED>
<!ATTLIST TITLE LANGUAGE CDATA #IMPLIED> ]>
WS2008/2009
DBIS/Dr. Karsten Tolle
DTD – Summary
• XML file is valid if file is conform with DTD
• This can be tested by so called: Validating XML Parser
• For most applications it is useful to test if an XML input file is valid
according the expected format/interpretation.
DTD
Validating XML Parser
XML
Application
WS2008/2009
DBIS/Dr. Karsten Tolle
Drawbacks DTD
• DTD uses cryptic SGML syntax
– difficult to write
– difficult to read
– differs from the XML syntax
• DTD by default provides just a small set
of data types
• Each XML file can only be based on
one DTD!
WS2008/2009
DBIS/Dr. Karsten Tolle
Namespaces in XML
WS2008/2009
DBIS/Dr. Karsten Tolle
Namespaces
• To help identify origin or meaning of an element or
attribute
• To allow two sets of elements to be combined even
if there are duplicate element names
Example:
<data xmlns:fruit="http://www.thirdm.com/fruit"
xmlns:corp="http://www.thirdm.com/corporations">
<fruit:apple qty="5" type="Granny Smith"/>
<corp:apple stockticker="AAPL" exchange="NASDAQ"/>
</data>
WS2008/2009
DBIS/Dr. Karsten Tolle
Namespace URI
• URI = Uniform Resource Identifier
• Used to uniquely identify the namespace
• There is no need of existence (XML), for
other applications like RDF this might
differ!
Example:
<food xmlns:fruit="http://www.thirdm.com/fruit"
xmlns:veg="http://www.thirdm.com/vegetables">
<fruit:apple qty="5"/>
<fruit:pear qty="6"/>
<veg:potato qty="7"/>
</food>
WS2008/2009
DBIS/Dr. Karsten Tolle
Namespace Prefix
• Used to refer to the the namespace
• Typically short, often three letters
Example:
<rdf:RDF
xmlns:rdf = ”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:rdfs = ”http://www.w3.org/2000/01/rdf-schema#”>
<rdfs:Class rdf:ID=“Book”>
</rdfs:Class>
</rdf:RDF>
Note: The ‘#’ anchor sign is used in RDF to point directly
to resources inside a namespace.
WS2008/2009
DBIS/Dr. Karsten Tolle
Another example …
<?xml version="1.0" encoding="UTF-8" ?>
<Order xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2"
xmlns:ccts="urn:oasis:names:specification:ubl:schema:xsd:CoreComponentParameters-2"
xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2
xmlns:udt="urn:un:unece:uncefact:data:draft:UnqualifiedDataTypesSchemaModule:2"
xmlns="urn:oasis:names:specification:ubl:schema:xsd:Order-2">
<cbc:UBLVersionID>2.0</cbc:UBLVersionID>
<cbc:CustomizationID>urn:oasis:names:specification:ubl:xpath:Order-2.0:sbs-1.0draft</cbc:CustomizationID>
<cbc:ProfileID>bpid:urn:oasis:names:draft:bpss:ubl-2-sbs-order-with-simple-responsedraft</cbc:ProfileID>
<cbc:ID>AEG012345</cbc:ID>
<cbc:SalesOrderID>CON0095678</cbc:SalesOrderID>
<cbc:CopyIndicator>false</cbc:CopyIndicator>
…
IE
WS2008/2009
DBIS/Dr. Karsten Tolle
Default Namespace
• Used to identify the namespace for
elements without a prefix
Example:
<rdf:RDF
xmlns:rdf = ”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns = ”http://www.w3.org/2000/01/rdf-schema#”>
<Class rdf:ID=“Book”>
</Class>
</rdf:RDF>
WS2008/2009
DBIS/Dr. Karsten Tolle
Attributes and Namespaces
• Never associated with default namespace!
• Can have explicit namespace prefix
Example:
<!-- http://www.w3.org is bound to n1 and is the default -->
<x xmlns:n1="http://www.w3.org" xmlns="http://www.w3.org" >
<good a="1" b="2" />
<good a="1" n1:a="2" />
</x>
WS2008/2009
DBIS/Dr. Karsten Tolle
Namespace Scope
• Scope is limited to the element the
namespace is defined in
• May be overridden by child element
<rdf:RDF
xmlns:rdf = ”http://www.w3.org/1999/02/22-rdf-syntaxns#”
xmlns:rdfs = ”http://www.w3.org/2000/01/rdf-schema#”>
<rdfs:Class xmlns:rdfs=“http://www.myrdfs.com”
rdf:ID=“Book”>
</rdfs:Class>
</rdf:RDF>
WS2008/2009
DBIS/Dr. Karsten Tolle
Notes about Namespaces
• Namespaces are not part of XML 1.0
• An XML parser (processor) may or may
not support XML Namespaces
• Some parsers allow you to check at
runtime to ensure it supports namespaces
WS2008/2009
DBIS/Dr. Karsten Tolle
Notes about Namespaces
• DTD's and Namespaces are compatible but do not
work well together
– For example, the namespace prefix must be static if
elements are declared in the DTD
Quote form Namespaces in XML 1.0: Note that DTD-based
validation is not namespace-aware in the following sense: a
DTD constrains the elements and attributes that may appear
in a document by their uninterpreted names, not by
(namespace name, local name) pairs. To validate a document
that uses namespaces against a DTD, the same prefixes must
be used in the DTD as in the instance. A DTD may however
indirectly constrain the namespaces used in a valid document
by providing #FIXED values for attributes that declare
namespaces.
WS2008/2009
DBIS/Dr. Karsten Tolle
XML Schema
WS2008/2009
DBIS/Dr. Karsten Tolle
XML Schema – Why?
• DTD uses cryptic SGML syntax
– difficult to write
– difficult to read
– differs from the XML syntax
• DTD provides just a small set of data
types
• Each XML file can only be based on
one DTD!
• DTD and XML Namespaces do not
work well together
WS2008/2009
DBIS/Dr. Karsten Tolle
Schema Root Element
and
targetNamespace
<xsd:schema
xmlns:xsd=“http://www.w3.org/2001/XMLSchema“
targetNamespace=“http://www.dbis.de“>
<!-- element and attribute declarations go here -->
</xsd:schema>
WS2008/2009
DBIS/Dr. Karsten Tolle
Element Declaration
<xsd:element name="notice"
type="xsd:string"/>
• Compare to: <!ELEMENT notice (#PCDATA)>
• Note that the prefix usage inside the attribute
value might not work with any XML application.
E.g. in RDF it would not be allowed. Entity
references sould be used instead.
WS2008/2009
DBIS/Dr. Karsten Tolle
Attribute Declaration
<xsd:element name="article">
<xsd:complexType>
<xsd:attribute name="title"
type="xsd:string" use="required"/>
<xsd:attribute name="author"
type="xsd:string" use="required"/>
</xsd:complexType>
</xsd:element>
WS2008/2009
DBIS/Dr. Karsten Tolle
Data Types
• XML Schema Part 2: Datatypes Second
Edition
– W3C Recommendation 28 October 2004
– http://www.w3.org/TR/xmlschema-2/
• Built-in datatypes are those which are
defined in this specification, and can be
either primitive or derived;
• User-derived datatypes are those derived
datatypes that are defined by individual
schema designers.
WS2008/2009
DBIS/Dr. Karsten Tolle
WS2008/2009
DBIS/Dr. Karsten Tolle
Element Declaration
with Children
<xsd:element name="publications">
<xsd:complexType>
<xsd:sequence>
<xsd:choice minOccurs="0"
maxOccurs="unbounded">
<xsd:element ref="article"/>
<xsd:element ref="book"/>
</xsd:choice>
<xsd:element ref="notice" minOccurs="0"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Compare to DTD: <!ELEMENT publications ((article | book)*,
notice?)>
WS2008/2009
DBIS/Dr. Karsten Tolle
Separation into logical parts
An XML Schema might get huge. It is therefore useful
to separate the definitions of logical parts, like the
definition for an address from other parts. This
makes it easier to maintain and reuse.
<schema targetNamespace="http://www.example.com/IBEST" xmlns="http://
www.w3.org/2001/XMLSchema"
xmlns:ibest="http://www.example.com/IBEST">
<annotation>
<documentation xml:lang="DE"> Adressen für das internationale
Buchbestellungsschema für Example.com. Copyright 2001 Example.com.
Alle Rechte vorbehalten. </documentation>
</annotation>
WS2008/2009
DBIS/Dr. Karsten Tolle
Another example …
<?xml version="1.0" encoding="UTF-8"?>
<!-Document Type: Order
Generated On: Tue Oct 03 2:26:38 P3 2006
view
-->
<!-- ===== xsd:schema Element With Namespaces Declarations ===== -->
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="urn:oasis:names:specification:ubl:schema:xsd:Order-2"
xmlns="urn:oasis:names:specification:ubl:schema:xsd:Order-2"
xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2"
xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
xmlns:udt="urn:un:unece:uncefact:data:specification:UnqualifiedDataTypesSchemaModule:2"
xmlns:ccts="urn:un:unece:uncefact:documentation:2"
xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2"
xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2" elementFormDefault="qualifi
<!-- ===== Imports ===== -->
<xsd:import namespace="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" sch
<xsd:import namespace="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" schemaL
<xsd:import namespace="urn:un:unece:uncefact:data:specification:UnqualifiedDataTypesSchemaModule:2" sc
<xsd:import namespace="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" sch
<xsd:import namespace="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2" schemaLocation
<!-- ===== Root Element ===== -->
<xsd:element name="Order" type="OrderType">
<xsd:annotation>
WS2008/2009
DBIS/Dr. Karsten Tolle
Include
To include separated parts of a schema the
main schema uses the include element.
<include schemaLocation="http://www.example.com/schemas/adresse.xsd"/
>
Main Schema
Include Schemas
WS2008/2009
DBIS/Dr. Karsten Tolle
Validierung
<?xml version="1.0" encoding="UTF-8"?>
<Order xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2"
xmlns:ccts="urn:oasis:names:specification:ubl:schema:xsd:CoreComponentParameters-2"
xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2"
xmlns:udt="urn:un:unece:uncefact:data:draft:UnqualifiedDataTypesSchemaModule:2"
xmlns="urn:oasis:names:specification:ubl:schema:xsd:Order-2">
<cbc:UBLVersionID>2.0</cbc:UBLVersionID>
<cbc:CustomizationID>urn:oasis:names:specification:ubl:xpath:Order-2.0:sbs-1.0draft</cbc:CustomizationID>
<cbc:ProfileID>bpid:urn:oasis:names:draft:bpss:ubl-2-sbs-order-with-simple-response-dr
<cbc:ID>AEG012345</cbc:ID>
…
</Order>
WS2008/2009
DBIS/Dr. Karsten Tolle
Processing XML
SAX vs DOM vs StAX
WS2008/2009
DBIS/Dr. Karsten Tolle
DOM (Document Object Model)
Generates the tree structure out
of the elements contained in the
XML document.
WS2008/2009
DBIS/Dr. Karsten Tolle
DOM (Document Object Model)
• Very useful for small documents
• Random access to structure using objects
• Can read, manipulate, and write XML
programmatically
• Write recursive code to explore child
nodes of unknown or evolving schema
• Write hard-coded procedures to handle
static well-known schema
WS2008/2009
DBIS/Dr. Karsten Tolle
SAX (Simple API for XML)
Based on events like (default handler):
– startDocument () – endDocument () – startElement (java.lang.String uri,
java.lang.String localName,
java.lang.String qName,
Attributes attributes)
– endElement
(java.lang.String uri,
java.lang.String localName,
java.lang.String qName)
– error (SAXParseException e) – fatalError (SAXParseException e) WS2008/2009
DBIS/Dr. Karsten Tolle
SAX (Simple API for XML)
• Uses much less memory then DOM,
especially for large documents (but for
some applications more than one pass is
needed)
WS2008/2009
DBIS/Dr. Karsten Tolle
DOM vs SAX
DOM
SAX
memory
-
+
flexibility
+
-
performace
- (*)
+ (*)
Standard
w3c
xml-develop
* Depending on the application, if more than one pass needed DOM
might be better!
WS2008/2009
DBIS/Dr. Karsten Tolle
Problems …
• What if DOM and SAX are both not
acceptable? E.g. mobile devices with
J2ME
• DOM needs to much memory
• Common streaming APIs like SAX are all
push APIs
– It is the SAX parser pushing the tokens into
the application  not easy to handle
WS2008/2009
DBIS/Dr. Karsten Tolle
public class Flour extends DefaultHandler {
…
public void startElement(String namespaceURI, String localName,
String qName, Attributes atts) { … }
…
public static void main(String[] args) {
Flour f = new Flour();
SAXParser p = new SAXParser();
p.setContentHandler(f);
try {
p.parse(args[0]);
} catch (Exception e) {
e.printStackTrace();
}
System.out.println(f.amount);
}
… WS2008/2009
DBIS/Dr. Karsten Tolle
Alternative: StAX
• StAX (Streaming API for XML) – a pull
parsing API
– With e.g. next() the next token can be called
by the application.
– JSR 173 (Java Specification Request)
http://jcp.org/en/jsr/detail?id=173
WS2008/2009
DBIS/Dr. Karsten Tolle
Example for calling next …
while (true) {
int event = parser.next();
if (event == XMLStreamConstants.END_DOCUMENT) {
parser.close();
break;
}
if (event == XMLStreamConstants.START_ELEMENT) {
System.out.println(parser.getLocalName());
}
}
WS2008/2009
DBIS/Dr. Karsten Tolle
Transforming XML
WS2008/2009
DBIS/Dr. Karsten Tolle
GIF, JPG, NSK-TIFF etc.
AVI, AU, WAV,
WMA, MP3 etc.
DOC, HTML,
PDF, etc.
MPG, WMV,
RM, etc.
JPEG, GIF
etc.
WS2008/2009
DBIS/Dr. Karsten Tolle
XSL (Version 2.0 von Jan 2007)
The XML Stylesheet Language (XSL) has three
subcomponents:
• XSL-FO
XSL-Formatting Objects, an XML vocabulary for
specifying formatting semantics. • XSLT
This the transformation language, which lets you
transform XML into some other format.
• XPath
XPath is an addressing mechanism that lets you
specify a path to an element.
WS2008/2009
DBIS/Dr. Karsten Tolle
• Extensible Stylesheet Language (XSL)
Version 1.0
– W3C Recommendation 15 October 2001
WS2008/2009
DBIS/Dr. Karsten Tolle
XSLT Processing
WS2008/2009
DBIS/Dr. Karsten Tolle
XSLT
(XML Stylesheet Language Transformations)
• XSLT is a programming language
• Write scripts containing if statements and
for-each loops
• Uses XPath for querying document, math
calculations, and string functions
• Can transform XML into HTML or text
• Useful for transforming XML to XML
WS2008/2009
DBIS/Dr. Karsten Tolle
So, what is XML
• A meta markup language
• Structured information that complies to a
standard structure and syntax
• “The ASCII of the 21st Century”
• Platform independent information for:
–
–
–
–
–
–
Presentation instructions
User settings
Data repository
Data transfer
RPC calls
...
WS2008/2009
DBIS/Dr. Karsten Tolle
What XML is not
• XML is not tied to any human language or
character encoding
• XML is not tied to any computing platform
or programming language
• XML has no semantics
WS2008/2009
DBIS/Dr. Karsten Tolle
Literature I
• XML Professionell; Richard Anderson u.a.;
MITP-Verlag; 2000; ISBN 3-8266-0633-7
• XML Data Management; Akmal B.
Chaudhri, Awais Rashid and Roberto
Zicari; Addison Wesley; 2003; ISBN
0-201-84452-4
WS2008/2009
DBIS/Dr. Karsten Tolle
Literature II
Resources of DBIS related to XML (German):
• Einführung in XML & Document Type Definition;
Alexander Semino; Seminar SS 2001
• XML-Schemata; Markus Krauße; Seminar SS
2001
• XSL – Dokumente mit Stil; Fabian Wleklinski;
Seminar SS 2001
• HTML und XML; Christina Anthes; Proseminar
SS2002
WS2008/2009
DBIS/Dr. Karsten Tolle
Literature III
• Read recommendations at W3C:
www.w3.org
• … search the Web!
WS2008/2009
DBIS/Dr. Karsten Tolle

Similar documents