第一部分:XML语言

Comments

Transcription

第一部分:XML语言
XML语言及其应用
主要内容
1.
2.
3.
4.
XML概述
XML语法
DTD的建立和使用
XML的解析器DOM
1. XML概述
1) 什么是XML
• 什么是Html(例1-1)
• 什么是Xml(例1-2)
– 需要DTD作为置标的语法
– 需要样式单来显示
– DTD的描述(例1-3)
2)XML应用实例
例1-1
<UL>
<LI>张三</LI>
<UL>
<LI>用户ID:001</LI>
<LI>公司:A公司</LI>
<LI>EMAIL:zhang@aaa.com</LI
>
<LI>电话:(010)62345678</LI>
<II>地址:五街1234号</LI>
<U>城市:北京市</LI>
<LI>省份:北京</LI>
</Ul>
<LI>李四</LI>
<UL>
<U>用户ID:002</LI>
<LI>公司:B公司</LI>
<LI>EMAIL:li@bbb.org</LI>
<LI>电话:(021)87654321</LI>
<LI>地址:南京路9876号</LI>
<LI>城市:上海市</LI>
<LI>省份:上海<LI>
</UL>
</UL>
返回
返回
例1-2
<联系人列表>
<联系人>
<联系人>
<姓名>李四</姓名>
<姓名>张三</姓名>
<ID>002</ID>
<公司>B公司</公司>
<ID>001</ID>
<EMAII>1i@bbb.org</EMAII>
<公司>A公司</公司>
<电话>(021)87654321</电话>
<EMAIL>zhang@aaa.com</EMAIL>
<地址>
<电话>(010)62345678</电话>
<街道>南京路9876号</街道>
<地址>
<城市>上海市</城市>
<街道>五街1234号</街道>
<省份>上海</省份>
<城市>北京市</城市>
</地址>
<省份>北京</省份>
</联系人>
</地址>
</联系人列表>
</联系人>
返回
例1-3
<!ELEMENT 联系人列表(联系人)*>
<!ELEMENT 联系人(姓名,ID,公司,EMAIL,电
话,地址)>
<!ELEMENT 地址(街道,城市,省份)>
<!ELEMENT 姓名(#PCDATA)>
<!ELEMENT ID(#PCDATA)>
<!ELEMENT 公司(#PCDATA)>
<!ELEMENT EMAIL(#PCDATA)>
<!ELEMENT 电话(#PCDATA)>
<!ELEMENT 街道(#PCDATA)>
<!ELEMENT 城市(#PCDATA)>
<!ELEMENT 省份(#PCDATA)>
2) XML应用实例
• 为置标语言FCLML公司的客户列表置表语
言制定的,文档类型定义DTD,其程序为
Com.dtd
• 客户联系信息的XML文档Com.xml (例
1-5)
• 为Com.xml制定一个样式Com.xsl (例
1-6)
• Html格式及显示(例1-7,例1-8)
3)XML和Html比较
返回
Fclml.dtd
例1-4
<? xml version=“1.0” encoding=“GB2312”? >
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
联系人列表(联系人)*>
联系人(姓名,ID,公司,EMAIL,电话,地址)>
地址(街道,城市,省份)>
姓名(#PCDATA)>
ID(#PCDATA)>
公司(#PCDATA)>
EMAIL(#PCDATA)>
电话(#PCDATA)>
街道(#PCDATA)>
城市(#PCDATA)>
省份(#PCDATA)>
例1-5
返回
Com.xml
< ? xml version=“1.0”encoding=”GB2312”
standalone=”no”?>
<!DOCTYPE 联系人列表 SYSTEM”com.dtd”> <联系人>
<?xml—stylesheet type=”text/xsl” href=
<姓名>李四</姓名>
“mystyle.xsl”?>
<ID>002</ID>
<公司>B公司</公司>
<联系人列表>
<EMAII>1i@bbb.org</EMAII>
<联系人>
<电话>(021)87654321</电话>
<姓名>张三</姓名>
<地址>
<ID>001</ID>
<街道>南京路9876号</街道>
<公司>A公司</公司>
<城市>上海市</城市>
<EMAIL>zhang@aaa.com</EMAIL>
<省份>上海</省份>
<电话>(010)62345678</电话>
</地址>
<地址>
</联系人>
<街道>五街1234号</街道>
</联系人列表>
<城市>北京市</城市>
<省份>北京</省份>
</地址>
</联系人>
返回
例1-6
MyStyle.xsl
<?xml version="1.0"encOding="GB2312"?>
<xsl:stylesheet xmlns:xsl=”http://www.w3.org/TR/WDxsl"
xmlHs="http://www.w3.org/TR/REC—html40"
result—ns:="">
<xst:template><xsI:apply—templates/></xsl:template>
<xsl:template match="/">
<HTML>
<HEAD>
<TITLE>F公司的客户联系信息</TITlE>
</HEAD>
<BODY>
<xsl:apply—templates select="联系人列表"/>
</BODY>
</HTMI>
</xsl:template>
例1-6
<xsl:stemplat match="联系人列表">
<xsl:for—each select="联系人">
<UL>
<LI><xsl:value—of select="姓名"/><LI>
<UL>
<LI>用户ID:<xsl:value—of select="ID"/></LI>
<LI>公司:<xsl:value—of select=“公司"/></LI>
<LI>EMAIL:<xsl:value—of select=“EMAIL"/></LI>
<LI>电话:<xsl:value—of select=“电话"/></LI>
<LI>街道:<xsl:value—of select=“地址/街道"/></LI>
<LI>城市:<xsl:value—of select=“地址/城市"/></LI>
<LI>省份:<xsl:value—of select=“地址/省份"/></LI>
</UL>
</UL>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
返回
例1-7
<HTML>
<HEAD>
<TITLE>F公司的客户联系信息</TITLE>
</HEAD>
<BODY>
<UL>
<LI>张三</LI>
<UI>
<LI>用户ID:001</LI>
<LI>公司:A公司</LI>
<LI>EMAIL:zhang@aaa.com</LI>
<LI>电话:(010)62345678</LI>
<LI>地址:五街1234号</LI>
<LI>城市:北京市</LI>
<LI>省份:北京</LI>
</UL>
<LI>李四</LI>
<UL>
<LI>ID:002</LI>
<LI>公司:B公司</LI>
<LI>EMAIL:1i@bbb.or8</LI>
<LI>电话:(021)87654321</LI>
<LI>地址:南京路9876号</LI>
<LI>城市:上海市</LI>
<LI>省份:上海</LI>
</UL>
</UL>
</BODY>
</HTML>
返回
例1-8
• 张三
• 李四
o 用户ID:001
o ID:002
o 公司:A公司
o 公司:B公司
o EMAIL:zhang@aaa.com
o EMAIL:1i@bbb.or8
o 电话:(010)62345678
o 电话:(021)87654321
o 地址:南京路9876号
o 地址:五街1234号
o 城市:上海市
o 城市:北京市
o 省份:上海
o 省份:北京
3) XML和Html比较
比较内容
HTML
XML
可扩展性
不具有扩展性
是元置标语言,可用于
定义新的置标语言
侧重点
侧重于如何表现信息
侧重于如何结构化地描
述信息
语法要求
不要求标记的嵌套、配
对等,不要求标记之间
具有一定的顺序
严格要求嵌套、配对,
并遵循DTD的树形结构
可读性及可维护性
难于阅读、维护
结构清晰,便于阅读、
维护
数据和显示的关系
内容描述与显示方式整
合为一体
内容描述与显示方式相
分离
保值性
不具有保值性
具有保值性
编辑及浏览工具
已有大量的编辑、浏览
工具
编辑、浏览工具尚不成
熟
XML和Html比较
置标语言家谱表
GML(1969)
超文本置标语言
SGML(1985)
通用置标语言
标准通用置标语言
HTML(1993)
XML(1998)
XHTML
SVG
SMIL
可扩展置标语言
HDML
可扩展超文 可缩放矢量 同步多媒体 手持设备
置标语言
本置标语言 图形语言 综合语言
…
OEB
开放电子
结构规范
Sources
• Major Sources:
– http://www.cis.upenn.edu/~cis550/slides/xml.p
pt CIS550 Course Notes, U. Penn, source for
many slides
– http://www.cs.technion.ac.il/~oshmu/
236804 - Seminar in Computer Science 4:
XML - Technology, Systems and Theory
– http://dom4j.org
Agenda
• Short Introduction to XML
– What is XML
– Structure and Terminology
– JAVA APIs for XML: an Overview
• dom4j
– Parsing an XML document
– Writing to an XML document
• Xpath
– Xpath Queries
– Xpath in dom4j
• References
The Structure of XML
• XML consists of tags and text
• Tags come in pairs <date> ...</date>
• They must be properly nested
<date> <day> ... </day> ... </date> --good
<date> <day> ... </date>... </day> --bad
XML text
• XML has only one “basic” type -- text. It is
bounded by tags e.g.
<title> The Big Sleep </title>
<year> 1935 </ year> --- 1935 is still text
• XML text is called PCDATA (for parsed character
data).
• It uses a 16-bit encoding.
XML structure
• Nesting tags can be used to express various
structures. E.g. A tuple (record):
<person>
<name> Jeff Cohen</name>
<tel> 04-828-1345 </tel>
<tel> 054-470-778 </tel>
<email> [email protected] </email>
</person>
XML structure (cont.)
• We can represent a list by using the same
tag repeatedly:
<addresses>
<person> ... </person>
<person> ... </person>
<person> ... </person>
...
</addresses>
XML structure (cont.)
• Nested tags can be part of a list too:
<addresses>
<person>
<name> Yossi Orr</name>
<tel> 04-828-1345 </tel>
<email> [email protected] </email>
</person>
<person>
<name> Irma Levy</name>
<tel> 03-426-1142 </tel>
<email>[email protected]</email>
</person>
</addresses>
Terminology
• The segment of an XML document between an opening and
a corresponding closing tag is called an element.
• Meta date about an element can appear in an attribute.
attribute
<person type=“Friend”>
<name>Ortal Derech</name>
<tel>04-8732122</tel>
element
<tel>054-646888</tel>
<email>[email protected]</email>
</person>
text
element, a sub-element
of
XML is tree-like
person
name
tel
tel
email
Malcolm Atchison
(215) 898 4321
(215) 898 4321
[email protected]
A Complete XML Document
<?XMLversion ="1.0" encoding="UTF-8"
standalone="no"?>
<!DOCTYPE addresses SYSTEM
"http://www.technion.ac.il/~erant/addresses.dtd">
Tells whether or not
this document
references an
external entity or an
external data type
specification
<addresses>
<person>
<name> Jeff Cohen</name>
<tel> 04-828-1345 </tel>
<tel> 054-470-778 </tel>
<email> [email protected] </email>
</person>
</addresses>
XML Structure Definitions
• DTD
– Document Type Definition – defines structure
constraints for XML documents
• XML Schema
– Same as DTD, more powerful because it includes
facilities to specify the data type of elements and it is
based on XML.
• Namespaces
– Namespaces are a way of preventing name clashes
among elements from more than one source within the
same XML document.
More Standards
• Xpath
– XML Path Language, a language for locating parts of
an XML document.
• Xquery
– A query language for XML documents (like SQL…).
• XSLT
– XSL Transformations, a language for transforming
XML documents into other XML documents.
• RDF
– Resource Description Framework. A formal knowledge
model from the World Wide Web.
Why Is XML Important?
• Because it exists, and everybody uses it.
• Plain Text - you can create and edit files
with anything.
• Data Identification - XML tells you what
kind of data you have, not how to display it.
• Separation from style.
• Hierarchical, and easily processed.
An Overview of the APIs
• JAXP: Java API for XML Processing
– It provides a common interface for creating and using
the standard SAX, DOM, and XSLT APIs.
• JAXB: Java Architecture for XML Binding
– defines a mechanism for writing out Java objects as
XML.
• JDOM
– Represents an XML file as a tree of objects
(sophisticated version of DOM)
• dom4j
– Lightweight version of JDOM.
Agenda
• Introduction to XML
– What is XML
– Structure and Terminology
– JAVA APIs for XML: an Overview
• dom4j
– Parsing an XML document
– Writing to an XML document
• Xpath
– Xpath Queries
– Xpath in dom4j
• References
dom4j
• An Open Source XML framework for Java.
• Allows you to read, write, navigate, create
and modify XML documents.
• Integrates with DOM and SAX.
• Full XPath support.
• XSLT Support.
Download and Use
• Go to: http://dom4j.org.
• Go to http://dom4j.org/download.html, and
download the latest release (current = 1.4).
• Unzip.
• Don’t forget the classpath. When working in an
IDE, don’t forget to add the log4j.jar library.
• Javadoc: http://dom4j.org/apidocs/index.html.
• Quick start guide: http://dom4j.org/guide.html.
Opening an XML Document
import org.dom4j.*;
public class TestDom4j {
public Document parse(String id)
throws DocumentException{
SAXReader reader = new SAXReader();
Document document = reader.read(id);
return document;
}
}
We can read: file,
URL, InputStream,
String
Example XML File
<?xml version="1.0" encoding="UTF-8" ?>
<salesdata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="C:\Documents and Settings\eran\
My Documents\Academic\Courses\XML\xpath_ass_schema.xsd">
<year>
<theyear>1997</theyear>
<region><name>central</name><sales unit="millions">34</sales></region>
<region><name>east</name><sales unit="millions">34</sales></region>
<region><name>west</name><sales unit="millions">32</sales></region>
</year>
<year>
<theyear>1998</theyear>
<region><name>east</name><sales unit="millions">35</sales></region>
region><name>west</name><sales unit="millions">42</sales> </region>
</year>
</salesdata>
Accessing XML Elements
Accessing root
element
Retrieving child
elements
public void dump(Document document)
throws DocumentException{
Element root = document.getRootElement();
for (Iterator i = root.elementIterator(); i.hasNext(); ) {
Element element = (Element)i.next();
System.out.println(element.getQualifiedName());
System.out.println(element.getTextTrim());
System.out.println(element.elementText("theyear"));
}
}
Retrieving element
name
Retrieving element
text
Retrieving the text
of the child
element “theyear”
Accessing XML Elements –
cont’d
• What will be the output of dump()?
year
1997
year
1998
Why?
Accessing XML Elements
Recursively
public void go(Element element, int depth){
for (int d=0; d<depth; d++){
System.out.print("
");
}
System.out.print(element.getQualifiedName());
System.out.println(" "+ element.getTextTrim());
for (Iterator i = element.elementIterator(); i.hasNext(); ) {
Element son = (Element)i.next();
go(son, depth+1);
}
}
What will be the
output?
Accessing Recursively – cont’d
salesdata
year
theyear 1997
region
name central
sales 34
region
name east
sales 34
region
name west
sales 32
year
theyear 1998
region
name east
sales 35
region
name west
sales 42
The whole XML
tree, element
names + values
Creating an XML document
Creating root
element
public Document createDocument() {
Document document = DocumentHelper.createDocument();
Element root = document.addElement("phonebook");
Element address1 = root.addElement("address")
.addAttribute("name", "Yuval")
.addAttribute("category", "family")
.addText("Ehud 3, Jerusalem");
Element address2 = root.addElement("address")
.addAttribute("name", "Ortal")
.addAttribute("category", "friends")
.addText("Kibbutz Givaat Haim");
return document;
Adding elements
}
What will we get
when running go()?
Creating an XML document –
cont’d
phonebook
address Ehud 3, Jerusalem
address Kibbutz Givaat Haim
XML tree
structure of the
new document
FileWriter out = new FileWriter("C:\\addresses.xml");
document.write(out);
String XML = document.asXML()
Retrieving the
XML itself as
string
Writing the XML
document to a
file
Client Program
public static void main(String[] args) {
Foo foo = new Foo();
try{
Document doc = foo.parse("C\\sales.xml");
foo.dump(doc);
Opening the
foo.go(doc.getRootElement(), 0);
file
foo.xpath(doc);
Document newDoc = foo.createDocument();
foo.go(newDoc.getRootElement(), 0);
FileWriter out = new FileWriter( "C:\\addresses.xml" );
Dumping
newDoc.write(out);
and printed
recursively }
catch (Exception E){
System.out.println(E);
}
Creating a
}
new
document
Agenda
• Introduction to XML
– What is XML
– Structure and Terminology
– JAVA APIs for XML: an Overview
• dom4j
– Parsing an XML document
– Writing to an XML document
• Xpath
– Xpath Queries
– Xpath in dom4j
• References
Xpath - Introduction
• XML Path Language. XPath is a language
for addressing parts of an XML document.
• Enables node locating and retrieving, very
much like directory accessing in file
systems.
• Limited (but not bad) filtering and querying
abilities.
• Retrieved the actual PCDATA or node sets
Xpath – Simple Path Selection
Xpath Expression:
/salesdata/year/theyear
<theyear>1997</theyear>
<theyear>1998</theyear>
/salesdata/year[2]/theyear
<theyear>1998</theyear>
“/” signifies child-of
Filtering the level –
getting only the second
year element
Xpath – Conditions
/salesdata/year/region[sales > 34]
Going down to region, and
filtering according to the
sales element
<region>
<name>east</name>
<sales unit="millions">35</sales>
</region>
<region>
<name>west</name>
/salesdata/year/region[sales > 34]/name
<sales unit="millions">42</sales>
?
</region>
Xpath – Traveling Up the Tree
/salesdata/year/region[sales >
34]/parent::year/theyear
<theyear>1998</theyear>
Going up the XML tree (and
then down again)
Xpath – Traveling Down Fast
/descendant::sales
<sales
<sales
<sales
<sales
<sales
unit="millions">34</sales>
unit="millions">34</sales>
unit="millions">32</sales>
unit="millions">35</sales>
unit="millions">42</sales>
./*/sales
Same same
Going all the way down,
until the sales element
Xpath – Advanced Queries
• The years (text nodes) for which sales data exists:
Logical operators
ancestor is same
as parent but
goes all the way
up to year
//region[name=\"west\" and sales >
32]/sales[@unit='millions']/ancestor::
year
Accessing attributes
/theyear
<theyear>1998</theyear>
Xpath – Advanced Queries
(cont’d)
•
The years (text nodes) in which the west region
sales were higher than the east region sales; sales
may be expressed in thousands or in millions:
year[region[name="west"]/sales[@unit='m
illions'*1000 or @unit='thousands']
>
region[name="east"]/sales[@unit='mil
lions‘
*1000 or
@unit='thousands']]/theyear/text()
Xpath in dom4j
Xpath expression
• Xpath queries can be used in dom4j:
is fed to the
xpathSelector
public void xpath(Document document) {
XPath xpathSelector =
DocumentHelper.createXPath("/salesdata/year/theyear");
List results =
xpathSelector.selectNodes(document);
for (Iterator iter = results.iterator(); The nodes are selec
iter.hasNext(); ) {
from the document,
Element element = (Element) iter.next();
according to the xpa
System.out.println(element.asXML());
query
}
}
Agenda
• Introduction to XML
– What is XML
– Structure and Terminology
– JAVA APIs for XML: an Overview
• dom4j
– Parsing an XML document
– Writing to an XML document
• Xpath
– Xpath Queries
– Xpath in dom4j
• References
References - XML
• XML tutorial:
– http://www.w3schools.com/xml/default.asp
• XML Specification from w3c:
– http://www.w3.org/XML/
• The Java/XML Tutorial:
– http://java.sun.com/xml/tutorial_intro.html
• DTD Tutorial:
– http://www.xmlfiles.com/dtd/
• XML Schema Tutorial:
– http://www.w3schools.com/schema/default.asp
• XML Schema Resource Page:
– http://www.w3.org/XML/Schema
dom4j
• Web site:
– http://dom4j.org/
• Javadocs:
– http://dom4j.org/apidocs/index.html
• Quick Start:
– http://dom4j.org/guide.html
• Cookbook (main functionality):
– http://dom4j.org/cookbook.html
Xpath
• Xpath specification:
– http://www.w3.org/TR/xpath
• Xpath tutorial:
– http://www.w3schools.com/xpath/default.asp
• Xpath tutorial (extended):
– http://www.zvon.org/xxl/XPathTutorial/General
/examples.html
• Xpath reference:
– http://www.vbxml.com/xsl/XPathRef.asp

Similar documents