What is XML?
Transcription
What is XML?
Informatica II Anisa Rula [email protected] ••• ITIS Lab ••• http://www.itis.disco.unimib.it 1 Agenda XML XML Schema XML Path XML Query Language 2 What is XML? XML (Extensible Markup Language) is a metadata language - a language for providing data about data • W3C standard around 1998 It looks a bit like HTML, but with XML the tags are user-defined and therefore extensible • HTML marks up logical presentation • CSS specifies presentation style • XML marks up meaning (semantics) XML has no mechanism to specify the format for presenting data to the user Precise definition of valid tags and their grammar. • Document Type Definitions (DTD). • XML Schema Definition (XSD). System-independent and vendor-independent. • Product of the World Wide Web Consortium (W3C), trademarked by MIT. An XML document resides in its own file with an ‗.xml‘ extension Why XML? Separates content from presentation General - can be applied to anything Adds value to semi-structured data • E.g. Product Catalogue Enables an enterprise to mark up all its data Using XML greatly simplifies encoding of data • (c.f. ad hoc text representations) Ubiquitous - everybody is using it! Where does XML fit? Why not put everything in a relational or OO database? XML is a global standard: • offers better information transfer between different applications and enterprises than proprietary databases XML is flexible and easily applied • (which also presents dangers - data does NOT become more valuable just because it is marked up in XML - the XML structures have to be well designed). Applications of XML Data-oriented languages • Used in web services • Communication between applications • Data export from databases Document-oriented languages • To add structure to natural language text documents • E.g. content for web pages (XHTML), lecture notes, product catalogues Emerging XML databases such as Xindice http://xml.apache.org/xindice/ store XML directly (don‘t have to map to relational DB) Protocols and programming languages • E.g. XML schema, XSLT, WSDL XML Basic Syntax An XML document consists of a number of declarations followed by a tree of elements. Every document must contain a root element Each element is delimited between begin and end tags. Each element may contain attributes Elements may contain text or other elements (or a mixture of the two) Attributes may only contain text XML Sample <?xml version="1.0"?> <PUBLICATION> <TITLE>Why I am Overworked</TITLE> <AUTHOR role="author"> <FIRSTNAME>Fred</FIRSTNAME> <LASTNAME>Smith</LASTNAME> <COMPANY>Jones and Associates</COMPANY> </AUTHOR> <ABSTRACT>This is the abstract</ABSTRACT> </PUBLICATION> 8 XML Element Has a name Has a begin tag <elementName> Then text and/or child elements Has an end tag </elementName> • E.g. <name> Simon </name> Elements can also be empty • E.g. <person name=―Simon‖ /> Tips • Do not use white space when creating names for elements • Element names cannot begin with a digit, although names can contain digits • Only certain punctuation allowed – periods, colons, and hyphens Elements or Attributes Information can either be stored in elements or attributes Structured information is stored in elements Primitive information (i.e. a single atomic value or list of values) can either be stored in an element or an attribute Perhaps better to store primitives in attributes XML Attributes Element start tags may also contain attributes An attribute consists of an attribute name followed by an attribute value Attributes are only allowed in the start tags E.g.: • <person email=―[email protected]‖> • <name>Simon</name> • </person> Well-Formed and Valid Elements tags must be properly nested • E.g. <a> <b> text </b> </a> is ok • But <a> <b> text </a> </b> is NOT Attribute values enclosed in string quotes • <item id=―33905‖> A document where all the tags are properly nested is wellformed If a document is well-formed, and obeys the syntax rules of a specified DTD, then it is also Valid Document Type Definition (DTD) A Document Type Definition (DTD) allows the developer to create a set of rules to specify legal content and place restrictions on an XML file If the XML document does not follow the rules contained within the DTD, a parser generates an error An XML document that conforms to the rules within a DTD is said to be valid A DTD • Provides a concise way to specify the syntax of a given document type • Declares how the elements can include other elements • And the attributes allowed for each element • Special operators specify the order and cardinality of each item (see below) CDATA and PCDATA CDATA – Character Data Attributes declared with CDATA may contain any text characters PCData – Parsed Character Data Elements declared PCDATA do not contain other elements • i.e. no other mark-up within them In tree-terms, these are LEAF-nodes Document Type Definition (DTD) <?xml version="1.0" ?> <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <message>Don't forget me this weekend!</message> </note> Why Use a DTD? • A single DTD ensures a common format for each XML document that references it • An application can use a standard DTD to verify that data that it receives from the outside world is valid • A description of legal, valid data further contributes to the interoperability and efficiency of using XML DTD Elements All element declarations begin with <!ELEMENT and end with > The ELEMENT declaration is case sensitive The programmer must declare all elements within an XML file Elements declared with the #PCDATA content model can not have children When describing sequences, the XML document must contain exactly those elements in exactly that order Operator Meaning + One or more times * Zero or more times ? Zero or once | Or: (a | b)? Either a or b or nothing (no operator) Exactly once: (a , b) Exactly one a followed by exactly one b Some Example DTD Declarations Example 1: The Empty Element <!ELEMENT Bool (EMPTY)> <Bool Value="True"></Bool> <!--DTD declaration of empty element--> <!--Usage with attribute in XML file--> Example 2: Elements with Data <!ELEMENT Month (#PCDATA)> <Month>April</Month> <Month>This is a month</Month> <Month> <January>Jan</January> <March>March</March> </Month> <!--DTD declaration of an element-> <!—Valid usage within XML file--> <!—Valid usage within XML file--> <!—Invalid usage within XML file, can‘t have children!--> Some Example DTD Declarations Example 3: Elements with Children To specify that an element must have a single child element, include the element name within the parenthesis. <!ELEMENT House (Address)> <!—A house has a single address--> <House> <!—Valid usage within XML file--> <Address>1345 Preston Ave Charlottesville Va 22903</Address> </House> An element can have multiple children. A DTD describes multiple children using a sequence, or a list of elements separated by commas. The XML file must contain one of each element in the specified order. <!--DTD declaration of an element--> <!ELEMENT address (person,street,city, zip)> <!ELEMENT person (#PCDATA)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT zip (#PCDATA)> <!—Valid usage within XML file--> <address> <person>John Doe</person> <street>1234 Preston Ave.</street> <city>Charlottesville, Va</city> <zip>22903</zip> </address> DTD for Address Book Example <!-- DTD for simple address book --> <!ELEMENT AddressBook (Title, Person*)> <!ELEMENT Title (#PCDATA)> <!ELEMENT Person EMPTY> <!ATTLIST Person name CDATA #REQUIRED> <!ATTLIST Person email CDATA #IMPLIED> Tip: Enter the Address Book DTD and XML as files in Intellij, then use the tools -> validate command to perform validation on the document. Try to modify the DTD and/or XML document to make it invalid. Address Book – XML <!DOCTYPE AddressBook SYSTEM "AddressBook.dtd"> <AddressBook> <Title>Simon's address book</Title> <Person name="Simon― email="[email protected]" /> <Person name="Anna" /> </AddressBook> Alternative Address Book What about this version: <AddressBook> <Simon email=―[email protected]‖ /> <Anna email=―[email protected]‖ /> </AddressBook> Is it well formed? Is it valid (with respect to previous DTD?) Is it well designed? XML Schema An XML Schema: • • • • • • • • defines defines defines defines defines defines defines defines elements that can appear in a document attributes that can appear in a document which elements are child elements the order of child elements the number of child elements whether an element is empty or can include text data types for elements and attributes default and fixed values for elements and attributes 23 Sample XML Schema <xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> Schema vs. DTD XML Schemas are the Successors of DTDs • • • • • XML XML XML XML XML Schemas Schemas Schemas Schemas Schemas are extensible to future additions are richer and more useful than DTDs are written in XML support data types support namespaces 25 XML types Includes primitive data types (integers, strings, dates, etc.) Supports value-based constraints (integers > 100) User-definable structured types Inheritance (extension or restriction) Foreign keys Element-type reference constraints XML namespaces Problem: the meaning of a tag depends on its context • Combining elements from different documents may erase conflicting interpretation The definition of name spaces give precise context to tags • http://www.w3.org/1999/xhtml defines HTML tags The notation {URI}tag fully qualify a tag • {http://www.w3.org/1999/xhtml }head Namespace declarations increase readability • <… xmlns:myns="http://www.w3.org/1999/xhtml"> • <myns:head> … </myns:head> XML namespaces This XML carries HTML table information: <table> <tr> <td>Apples</td> <td>Bananas</td> </tr> </table> This XML carries information about a table (a piece of furniture): <table> <name>African Coffee Table</name> <width>80</width> <length>120</length> </table> Solving the Name Conflict Using a Prefix This XML carries information about an HTML table, and a piece of furniture: <h:table> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr> </h:table> <f:table> <f:name>African Coffee Table</f:name> <f:width>80</f:width> <f:length>120</f:length> </f:table> XML Namespaces - The xmlns Attribute <root> <h:table xmlns:h="http://www.w3.org/TR/html4/"> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr> </h:table> <f:table xmlns:f="http://www.w3schools.com/furniture"> <f:name>African Coffee Table</f:name> <f:width>80</f:width> <f:length>120</f:length> </f:table> </root> XML Namespaces - The xmlns Attribute <root xmlns:h="http://www.w3.org/TR/html4/" xmlns:f="http://www.w3schools.com/furniture"> <h:table> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr> </h:table> <f:table> <f:name>African Coffee Table</f:name> <f:width>80</f:width> <f:length>120</f:length> </f:table> </root> What Makes XML Portable? XSDs or DTDs associated with a document allow the receiver to perform validation on the document. Human-readable/writable. Independent of presentation (formatting). 32 Syntactic vs Semantic Interoperability While XML is portable, communicating parties still need to agree on: • • • • Document type definitions Meaning of tags ―Operations‖ on data (interfaces). Meaning of those operations. Semantic interoperability is still a problem! 33 Querying XML XQuery concepts A query in XQuery is an expression that: • • reads a sequence of XML fragments or atomic values returns a sequence of XML fragments or atomic values • • • • • • • path expressions element constructors FLWOR ("flower") expressions (For-Let-Where-Order-Return) list expressions conditional expressions quantified expressions datatype expressions • • • • • • • namespaces variables functions date and time context item (current node or atomic value) context position (in the sequence being processed) context size (of the sequence being processed) The principal forms of XQuery expressions are: Expressions are evaluated relative to a context: XML vs. Relational Data name phone John 3634 row phone name Sue 6343 Dick 6363 Relation … in XML row “John” name row phone name 3634 “Sue” 6343 “Dick” phone 6363 { row: { name: “John”, phone: 3634 }, row: { name: “Sue”, phone: 6343 }, row: { name: “Dick”, phone: 6363 } } Relational to XML Data • A relation instance is basically a tree with: – Unbounded fanout at level 1 (i.e., any # of rows) – Fixed fanout at level 2 (i.e., fixed # fields) • XML data is essentially an arbitrary tree – Unbounded fanout at all nodes/levels – Any number of levels – Variable # of children at different nodes, variable path lengths Displaying XML with XSLT • With XSLT you can transform an XML document into HTML. • XSLT is the recommended style sheet language of XML. • XSLT (eXtensible Stylesheet Language Transformations) is far more sophisticated than CSS. • XSLT can be used to transform XML into HTML, before it is displayed by a browser Esempio 1 <?xml version="1.0" encoding="ISO-8859-1"?> - <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> Esempio 2 <?xml version="1.0" encoding="ISO-8859-1"?> - <note> <to>Tove</to> <from>Jani</Ffrom> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>