The automaton approach to XML schema languages: from practice

Transcription

The automaton approach to XML schema languages: from practice
The automaton approach to XML schema
languages:
from practice to theory
Frank Neven1
1 Theoretical
Computer Science Group
Hasselt University
Agoralaan, 3590 Diepenbeek, Belgium
27 February 2006
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
1 / 109
Introduction to XML
Outline
1
Introduction to XML
2
Document Type Definitions
3
Unranked Tree Automata
4
Extended Document Type Definitions
Definition
XML Schema
Properties of single-type EDTDs
Single-type EDTDs in practice
1-pass preorder typing
Relax NG
5
Decision problems for XML schema languages
6
Conclusion
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
2 / 109
Introduction to XML
XML is a data exchange format
W3C standard
geographical db
XML
user
XML
INTERNET
OODB
Rel DB
car retailer
car reviews
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
3 / 109
Introduction to XML
A self-describing data format
Example
<store>
<dvd>
<title> Fabuleux destin d’Amelie </title>
<price> 17 </price>
</dvd>
<dvd>
<title> Goodbye Lenin </title>
<price> 20 </price>
<discount> 4 </discount>
</dvd>
</store>
start tag: <title>
end tag: </title>
Frank Neven (Hasselt University)
element: <title>...</title>
Automata and XML schema languages
27 February 2006
4 / 109
Introduction to XML
XML as a hierarchical structure
Example
store
dvd
title
dvd
price
title
price discount
“Amélie" 17 “Good bye, Lenin!" 20
Frank Neven (Hasselt University)
Automata and XML schema languages
4
27 February 2006
5 / 109
Introduction to XML
Attributes
Example
<store name=“DVDPlanet”>
<dvd category=“romance”>
<title> Fabuleux ... d’Amelie </title>
<price> 17 </price>
</dvd>
<dvd category=“drama” >
<title> Goodbye Lenin </title>
<price> 20 </price>
<discount> 4 </discount>
</dvd>
</store>
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
6 / 109
Introduction to XML
XML as a hierarchical structure
Example
store[name=“DVDPlanet”]
dvd[category=“romance”]
title
price
dvd[category=“drama”]
title
price discount
“Amélie" 17 “Good bye, Lenin!" 20
Frank Neven (Hasselt University)
Automata and XML schema languages
4
27 February 2006
7 / 109
Introduction to XML
Trees as conceptual abstraction of XML documents
XML documents are ordered unranked trees over a
finite alphabet Σ of tag names.
We assume an infinite set of data values D for attribute and leaf
values.
store[name=“DVDPlanet”]
dvd[category=“romance”]
title
price
dvd[category=“drama”]
title
price discount
“Amélie" 17 “Good bye, Lenin!" 20
Frank Neven (Hasselt University)
Automata and XML schema languages
4
27 February 2006
8 / 109
Introduction to XML
Flexibility of XML
Representation of the relational model
Relation
R
A
a1
a2
XML encoding
B
b1
b2
XML Tree
R
tuple
tuple
A
A
B
B
a1 b1 a2 b2
Frank Neven (Hasselt University)
<R>
<tuple>
<A> a1
<B> b1
</tuple>
<tuple>
<A> a2
<B> b2
</tuple>
</R>
Automata and XML schema languages
</A>
</B>
</A>
</B>
27 February 2006
10 / 109
Introduction to XML
XML schema languages
Schema
A schema defines the set of allowable tags and the way they can be
structured.
Advantages
automatic validation
automatic integration of data
automatic translation
query optimization
provides a user with a concrete semantics of the document
aids in the specification of meaningful queries over XML data
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
11 / 109
Introduction to XML
XML schema languages
Example
DTDs (W3C)
XML Schema (W3C)
Relax NG (Clark, Murata)
several dozen others (DSD, Schematron, . . . )
In formal language theoretic terms
A schema defines a tree language.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
12 / 109
Introduction to XML
Overview of XML Theory
Cross fertilization
XML
Automata
Frank Neven (Hasselt University)
Logic
Automata and XML schema languages
27 February 2006
13 / 109
Introduction to XML
Overview of XML Theory
Cross fertilization
XML
Automata
Logic
Different sorts of automata: grammars, tree automata, tree-walking
automata, register automata, transducers, . . .
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
13 / 109
Introduction to XML
Overview of XML Theory
Cross fertilization
XML
Automata
Logic
Different sorts of automata: grammars, tree automata, tree-walking
automata, register automata, transducers, . . .
Automata serve as
an algorithmic toolbox
an abstract formal model of schema languages, query and pattern
languages
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
13 / 109
Introduction to XML
Summary slide
What to remember?
XML is an international standard
XML documents or XML data are simply ordered unranked
labeled trees with data values
a schema defines a tree language (no data values)
Focus of this talk
Automata as a formal model for schema languages
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
14 / 109
Introduction to XML
Outline
1
Introduction to XML
2
Document Type Definitions
3
Unranked Tree Automata
4
Extended Document Type Definitions
Definition
XML Schema
Properties of single-type EDTDs
Single-type EDTDs in practice
1-pass preorder typing
Relax NG
5
Decision problems for XML schema languages
6
Conclusion
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
15 / 109
Document Type Definitions
Outline
1
Introduction to XML
2
Document Type Definitions
3
Unranked Tree Automata
4
Extended Document Type Definitions
Definition
XML Schema
Properties of single-type EDTDs
Single-type EDTDs in practice
1-pass preorder typing
Relax NG
5
Decision problems for XML schema languages
6
Conclusion
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
16 / 109
Document Type Definitions
Document Type Definitions (DTDs)
Example
<!DOCTYPE store [
<!ELEMENT store
<!ELEMENT dvd
<!ELEMENT title
<!ELEMENT price
<!ELEMENT discount
]>
(dvd,dvd*)>
(title,price,discount?)>
(#PCDATA)>
(#PCDATA)>
(#PCDATA)>
Corresponding grammar (start symbol store)
Frank Neven
store
→ dvd dvd∗
dvd
→ title price(discount + ε)
title
→ DATA
price
→ DATA
(Hasselt University)
27 February 2006
XML schema languages
discountAutomata
→ andDATA
18 / 109
Document Type Definitions
Document Type Definitions (DTDs)
XML Document
store
dvd
title
dvd
price
title
price
"Amélie" 17 "Good bye, Lenin!" 20
Corresponding grammar (start symbol store)
store
dvd
title
price
discount
Frank Neven (Hasselt University)
→
→
→
→
→
dvd dvd∗
title price(discount + ε)
DATA
DATA
DATA
Automata and XML schema languages
27 February 2006
19 / 109
Document Type Definitions
Document Type Definitions (DTDs)
No data values
XML Document
store
dvd
dvd
title price title price
Corresponding grammar (start symbol store))
store → dvd dvd∗
dvd
→ title price(discount + ε)
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
20 / 109
Document Type Definitions
Extended Context-free grammars as a formal
abstraction
Definition
A DTD is a pair (d, sd ) where
sd ∈ Σ is the start symbol
d maps every Σ-symbol to a regular expression over Σ
Definition
A tree t satisfies d (is valid) iff
the root of t is labeled sd
for every vertex v labeled a the string formed by the children of v
belongs to d(a).
DTD validator
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
21 / 109
Document Type Definitions
Optimization questions
Schema containment (⊆)
Given: schema’s d1 , d2
Question: Is L(d1 ) ⊆ L(d2 )?
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
22 / 109
Document Type Definitions
Optimization questions
Schema containment (⊆)
Given: schema’s d1 , d2
Question: Is L(d1 ) ⊆ L(d2 )?
DTD containment reduces to containment of regular expressions
d1 ⊆ d2
iff
d1 (a) ⊆ d2 (a), ∀a ∈ Σ
(when d1 and d2 are trimmed).
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
22 / 109
Document Type Definitions
Optimization questions
Schema containment (⊆)
Given: schema’s d1 , d2
Question: Is L(d1 ) ⊆ L(d2 )?
DTD containment reduces to containment of regular expressions
d1 ⊆ d2
iff
d1 (a) ⊆ d2 (a), ∀a ∈ Σ
(when d1 and d2 are trimmed).
Theorem (Meyer, Stockmeyer, 1973)
Containment of regular expressions is PSPACE-complete.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
22 / 109
Document Type Definitions
Optimization questions
Schema containment (⊆)
Given: schema’s d1 , d2
Question: Is L(d1 ) ⊆ L(d2 )?
DTD containment reduces to containment of regular expressions
d1 ⊆ d2
iff
d1 (a) ⊆ d2 (a), ∀a ∈ Σ
(when d1 and d2 are trimmed).
Theorem (Meyer, Stockmeyer, 1973)
Containment of regular expressions is PSPACE-complete.
Corollary
DTD containment is PSPACE-complete.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
22 / 109
Document Type Definitions
Regular Expressions in DTDs Should Be Deterministic
How accurate is our abstraction?
Backward compatibility with SGML
The XML specifications requires regular expressions to be
deterministic: for every input symbol in the input string we can uniquely
determine by which symbol in the regular expression it should match
without looking ahead in the input string.
Example
The expression (a + b)∗ a is not deterministic.
Counterexample: baa.
The expression b∗ a(b∗ a)∗ is deterministic.
Why this restriction?
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
23 / 109
Document Type Definitions
Regular Expressions in DTDs Should Be Deterministic
Relevant questions
1
How do we recognize deterministic regular expressions?
DTD validator
2
Can every regular language be denoted by a deterministic regular
expression?
3
Are deterministic regular languages a robust class?
4
If a regular expression is not deterministic, can you find an
equivalent one that is?
smart DTD validator
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
24 / 109
Document Type Definitions
Formalization by Brüggemann-Klein and Wood [1998]
Definition
A marking r 0 of a regular expression r is an assignment of
numbers to every symbol in r .
Example
(a1 + b2 )∗ a3 is a marking of (a + b)∗ a
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
25 / 109
Document Type Definitions
Formalization by Brüggemann-Klein and Wood [1998]
Definition
A marking r 0 of a regular expression r is an assignment of
numbers to every symbol in r .
For w ∈ L(r 0 ), we denote by w # the corresponding unmarked
string in L(r ).
Example
(a1 + b2 )∗ a3 is a marking of (a + b)∗ a
For w = b2 a1 a3 , w # = baa
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
25 / 109
Document Type Definitions
Formalization by Brüggemann-Klein and Wood [1998]
Definition
A regular expression r is deterministic (one-unambiguous) iff there are
no strings uxv , uyw ∈ L(r 0 ) with
|x| = |y | = 1,
x 6= y ,
(x and y are different marked symbols)
x# = y#
(their unmarking is the same).
Example
(a + b)∗ a is not deterministic:
u x v
u y w
take
and
b2 a1 a3
b2 a3 ε
Tool
Glushkov construction preserves one-step unambiguity.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
26 / 109
Document Type Definitions
Glushkov automaton for b∗ a(b∗ a)∗
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
27 / 109
Document Type Definitions
Glushkov automaton for b∗ a(b∗ a)∗
b1∗ a2 (b3∗ a4 )∗
a4
b1
a2
b3
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
27 / 109
Document Type Definitions
Glushkov automaton for b∗ a(b∗ a)∗
b1∗ a2 (b3∗ a4 )∗
a4
b1
a2
b3
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
27 / 109
Document Type Definitions
Glushkov automaton for b∗ a(b∗ a)∗
b1∗ a2 (b3∗ a4 )∗
a4
b1
a2
b3
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
28 / 109
Document Type Definitions
Glushkov automaton for b∗ a(b∗ a)∗
b1∗ a2 (b3∗ a4 )∗
a4
b1
a2
b3
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
28 / 109
Document Type Definitions
Glushkov automaton for b∗ a(b∗ a)∗
b1∗ a2 (b3∗ a4 )∗
a4
b1
a2
b3
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
28 / 109
Document Type Definitions
Glushkov automaton for b∗ a(b∗ a)∗
b1∗ a2 (b3∗ a4 )∗
a4
b1
a2
b3
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
28 / 109
Document Type Definitions
Glushkov automaton for b∗ a(b∗ a)∗
b1∗ a2 (b3∗ a4 )∗
a4
b1
a2
b3
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
28 / 109
Document Type Definitions
Glushkov automaton for b∗ a(b∗ a)∗
b1∗ a2 (b3∗ a4 )∗
a
b
a
b
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
28 / 109
Document Type Definitions
Glushkov automaton construction
b1∗ a2 (b3∗ a4 )∗
(a1 + b2 )∗ a3
a4
a1
b1
a2
a3
b3
b2
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
29 / 109
Document Type Definitions
Recognition of deterministic regular expressions
Theorem (Book et al 1971, Brüggemann-Klein, Wood, 1998)
A regular expression is deterministic (one-unambiguous) iff its
Glushkov automaton is deterministic.
It is decidable in quadratic time whether a regular expression is
deterministic.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
30 / 109
Document Type Definitions
Properties of deterministic regular languages
Theorem (Brüggemann-Klein, Wood, 1998)
Not every regular language can be denoted by a deterministic
regular expression.
E.g., (a + b)∗ a(a + b).
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
31 / 109
Document Type Definitions
Properties of deterministic regular languages
Theorem (Brüggemann-Klein, Wood, 1998)
Not every regular language can be denoted by a deterministic
regular expression.
E.g., (a + b)∗ a(a + b).
Deterministic regular languages are not closed under union,
concatenation, or Kleene-star.
No syntax for deterministic regular languages
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
31 / 109
Document Type Definitions
Properties of deterministic regular languages
Theorem (Brüggemann-Klein, Wood, 1998)
Not every regular language can be denoted by a deterministic
regular expression.
E.g., (a + b)∗ a(a + b).
Deterministic regular languages are not closed under union,
concatenation, or Kleene-star.
No syntax for deterministic regular languages
It can be decided in
regular language.
PTIME
whether a DFA denotes a deterministic
Orbit property.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
31 / 109
Document Type Definitions
Properties of deterministic regular languages
Theorem (Brüggemann-Klein, Wood, 1998)
Not every regular language can be denoted by a deterministic
regular expression.
E.g., (a + b)∗ a(a + b).
Deterministic regular languages are not closed under union,
concatenation, or Kleene-star.
No syntax for deterministic regular languages
It can be decided in
regular language.
PTIME
whether a DFA denotes a deterministic
Orbit property.
If it exists, an equivalent deterministic regular expression can be
constructed in exponential time.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
31 / 109
Document Type Definitions
Properties of deterministic regular languages
Theorem (Brüggemann-Klein, Wood, 1998)
Not every regular language can be denoted by a deterministic
regular expression.
E.g., (a + b)∗ a(a + b).
Deterministic regular languages are not closed under union,
concatenation, or Kleene-star.
No syntax for deterministic regular languages
It can be decided in
regular language.
PTIME
whether a DFA denotes a deterministic
Orbit property.
If it exists, an equivalent deterministic regular expression can be
constructed in exponential time.
Results provide formal machinery for dealing with DTDs.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
31 / 109
Document Type Definitions
Complexity of basic decision problems revisit
Schema containment (⊆)
Given: Schema’s d1 , d2
Question: Is L(d1 ) ⊆ L(d2 )?
DTD containment reduces to containment of regular expressions
d1 ⊆ d2
iff
d1 (a) ⊆ d2 (a), ∀a ∈ Σ
(when d1 and d2 are trimmed).
Theorem
Containment of DTDs with deterministic regular expressions is in
PTIME.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
32 / 109
Document Type Definitions
Summary slide
What to remember?
XML DTDs are context-free grammars with deterministic regular
expressions
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
33 / 109
Document Type Definitions
Summary slide
What to remember?
XML DTDs are context-free grammars with deterministic regular
expressions
deterministic regular expressions are a semantical notion: no
easy syntax – non-transparent to users
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
33 / 109
Document Type Definitions
Summary slide
What to remember?
XML DTDs are context-free grammars with deterministic regular
expressions
deterministic regular expressions are a semantical notion: no
easy syntax – non-transparent to users
advantage: optimization problems are tractable
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
33 / 109
Document Type Definitions
Summary slide
What to remember?
XML DTDs are context-free grammars with deterministic regular
expressions
deterministic regular expressions are a semantical notion: no
easy syntax – non-transparent to users
advantage: optimization problems are tractable
Question
What is the largest robust class of regular expressions that can be
translated to DFAs in PTIME?
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
33 / 109
Unranked Tree Automata
Outline
1
Introduction to XML
2
Document Type Definitions
3
Unranked Tree Automata
4
Extended Document Type Definitions
Definition
XML Schema
Properties of single-type EDTDs
Single-type EDTDs in practice
1-pass preorder typing
Relax NG
5
Decision problems for XML schema languages
6
Conclusion
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
34 / 109
Unranked Tree Automata
Deterministic Tree Automata over Binary Trees
Definition
Formally,
M = (Q, Σ, δ, F )
∧
∨
0
∧
1
1
1
Frank Neven (Hasselt University)
with
Q = {f , t}, Σ = {0, 1, ∧, ∨},
F = {t}, and
δ(0) = f
δ(1) = t
δ(f , f , ∧) = f
δ(f , f , ∨) = f
δ(t, f , ∧) = f
δ(t, f , ∨) = t
δ(f , t, ∧) = f
δ(f , t, ∨) = t
δ(t, t, ∧) = t
δ(t, t, ∨) = t
Automata and XML schema languages
27 February 2006
35 / 109
Unranked Tree Automata
Deterministic Tree Automata over Binary Trees
Definition
Formally,
M = (Q, Σ, δ, F )
∧
∨
∧
0
f
1
t
1
1
t
Frank Neven (Hasselt University)
t
with
Q = {f , t}, Σ = {0, 1, ∧, ∨},
F = {t}, and
δ(0) = f
δ(1) = t
δ(f , f , ∧) = f
δ(f , f , ∨) = f
δ(t, f , ∧) = f
δ(t, f , ∨) = t
δ(f , t, ∧) = f
δ(f , t, ∨) = t
δ(t, t, ∧) = t
δ(t, t, ∨) = t
Automata and XML schema languages
27 February 2006
35 / 109
Unranked Tree Automata
Deterministic Tree Automata over Binary Trees
Definition
Formally,
M = (Q, Σ, δ, F )
∧
∨
t
t
0
f
1
t
1
∧
1
t
Frank Neven (Hasselt University)
t
with
Q = {f , t}, Σ = {0, 1, ∧, ∨},
F = {t}, and
δ(0) = f
δ(1) = t
δ(f , f , ∧) = f
δ(f , f , ∨) = f
δ(t, f , ∧) = f
δ(t, f , ∨) = t
δ(f , t, ∧) = f
δ(f , t, ∨) = t
δ(t, t, ∧) = t
δ(t, t, ∨) = t
Automata and XML schema languages
27 February 2006
35 / 109
Unranked Tree Automata
Deterministic Tree Automata over Binary Trees
Definition
Formally,
M = (Q, Σ, δ, F )
∧
t
∨
t
t
0
f
1
t
1
∧
1
t
Frank Neven (Hasselt University)
t
with
Q = {f , t}, Σ = {0, 1, ∧, ∨},
F = {t}, and
δ(0) = f
δ(1) = t
δ(f , f , ∧) = f
δ(f , f , ∨) = f
δ(t, f , ∧) = f
δ(t, f , ∨) = t
δ(f , t, ∧) = f
δ(f , t, ∨) = t
δ(t, t, ∧) = t
δ(t, t, ∨) = t
Automata and XML schema languages
27 February 2006
35 / 109
Unranked Tree Automata
Tree Automata over Binary Trees
Definition
A set of binary trees is regular iff it is accepted by a tree automaton.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
36 / 109
Unranked Tree Automata
Tree Automata over Binary Trees
Definition
A set of binary trees is regular iff it is accepted by a tree automaton.
Deterministic versus non-deterministic
Det: δ : Q × Q × Σ → Q
Non-Det: δ : Q × Q × Σ → 2Q
Semantics: tree is accepted if there is a labeling of states
consistent with the transition function, and root is labeled with
accepting state
top-down: δ : Q × Σ → 2Q×Q
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
36 / 109
Unranked Tree Automata
Tree Automata over Binary Trees
Robust class
det. bottom-up TA = non-det. bottom-up TA (subset construction)
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
37 / 109
Unranked Tree Automata
Tree Automata over Binary Trees
Robust class
det. bottom-up TA = non-det. bottom-up TA (subset construction)
non-det. top-down TA = non-det bottom up TA
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
37 / 109
Unranked Tree Automata
Tree Automata over Binary Trees
Robust class
det. bottom-up TA = non-det. bottom-up TA (subset construction)
non-det. top-down TA = non-det bottom up TA
Closed under Boolean operations:
Union, intersection: product construction
Complement: complete automaton, determinize, swap final and
non-final states
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
37 / 109
Unranked Tree Automata
Tree Automata over Binary Trees
Robust class
det. bottom-up TA = non-det. bottom-up TA (subset construction)
non-det. top-down TA = non-det bottom up TA
Closed under Boolean operations:
Union, intersection: product construction
Complement: complete automaton, determinize, swap final and
non-final states
Many equivalent notions: alternating, two-way, tree-walking +
restricted pushdown, MSO, . . .
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
37 / 109
Unranked Tree Automata
Tree Automata over Binary Trees
Robust class
det. bottom-up TA = non-det. bottom-up TA (subset construction)
non-det. top-down TA = non-det bottom up TA
Closed under Boolean operations:
Union, intersection: product construction
Complement: complete automaton, determinize, swap final and
non-final states
Many equivalent notions: alternating, two-way, tree-walking +
restricted pushdown, MSO, . . .
Decision problems: containment is EXPTIME-complete for
non-det TA [Seidl 1990], PTIME-complete for det TA.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
37 / 109
Unranked Tree Automata
Tree Automata over Binary Trees
Robust class
det. bottom-up TA = non-det. bottom-up TA (subset construction)
non-det. top-down TA = non-det bottom up TA
Closed under Boolean operations:
Union, intersection: product construction
Complement: complete automaton, determinize, swap final and
non-final states
Many equivalent notions: alternating, two-way, tree-walking +
restricted pushdown, MSO, . . .
Decision problems: containment is EXPTIME-complete for
non-det TA [Seidl 1990], PTIME-complete for det TA.
PTIME minimization for det TA, unique minimal TA
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
37 / 109
Unranked Tree Automata
Bottom-up Tree Automata over Unranked Trees
Binary versus unranked
binary tree: δ : Q × Q × Σ → Q
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
38 / 109
Unranked Tree Automata
Bottom-up Tree Automata over Unranked Trees
Binary versus unranked
binary tree: δ : Q × Q × Σ → Q
S
i
unranked tree: δ : ∞
i=0 Q × Σ → Q
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
38 / 109
Unranked Tree Automata
Bottom-up Tree Automata over Unranked Trees
Binary versus unranked
binary tree: δ : Q × Q × Σ → Q
S
i
unranked tree: δ : ∞
i=0 Q × Σ → Q
specify transition functions by regular string languages over
states:
δ(q, a) ⊆ Q ∗ is a regular language
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
38 / 109
Unranked Tree Automata
Bottom-up Tree Automata over Unranked Trees
Binary versus unranked
binary tree: δ : Q × Q × Σ → Q
S
i
unranked tree: δ : ∞
i=0 Q × Σ → Q
specify transition functions by regular string languages over
states:
δ(q, a) ⊆ Q ∗ is a regular language
q
a
q1
Frank Neven (Hasselt University)
q2
q3
Automata and XML schema languages
∈ δ(q, a)
27 February 2006
38 / 109
Unranked Tree Automata
Bottom-up Tree Automata over Unranked Trees
∧
∨
0
1
∧
0
1
1
∨
1
0
1
1
Transition function, F = {t}
δ(f , 0) = {ε}; δ(f , 1) = ∅
δ(t, 1) = {ε}; δ(t, 0) = ∅
δ(f , ∧) = (f + t)∗ f (f + t)∗
δ(t, ∧) = t ∗
δ(f , ∨) = f ∗
δ(t, ∨) = (f + t)∗ t(f + t)∗
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
39 / 109
Unranked Tree Automata
Bottom-up Tree Automata over Unranked Trees
∨
t
0
f
0
1
t
t
∧
t
∧
1
t
f
1
t
∨
t
1
0
t
1
f
1
t
t
Transition function, F = {t}
δ(f , 0) = {ε}; δ(f , 1) = ∅
δ(t, 1) = {ε}; δ(t, 0) = ∅
δ(f , ∧) = (f + t)∗ f (f + t)∗
δ(t, ∧) = t ∗
δ(f , ∨) = f ∗
δ(t, ∨) = (f + t)∗ t(f + t)∗
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
39 / 109
Unranked Tree Automata
Bottom-up Tree Automata over Unranked Trees
∨
t
0
f
0
1
t
t
∧
t
∧
1
t
f
1
t
∨
t
1
0
t
1
f
1
t
t
Transition function, F = {t}
δ(f , 0) = {ε}; δ(f , 1) = ∅
δ(t, 1) = {ε}; δ(t, 0) = ∅
δ(f , ∧) = (f + t)∗ f (f + t)∗
δ(t, ∧) = t ∗
δ(f , ∨) = f ∗
δ(t, ∨) = (f + t)∗ t(f + t)∗
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
39 / 109
Unranked Tree Automata
Bottom-up Tree Automata over Unranked Trees
Definition
A non-deterministic tree automaton (NTA) is a tuple B = (Q, Σ, δ, F ),
Q is a finite set of states,
F ⊆ Q is the set of final states,
δ is a function Q × Σ → 2Q such that δ(q, a) is a regular string
language over Q for every a ∈ Σ and q ∈ Q.
∗
History
Resurrected by Brüggemann-Klein, Murata, Wood [1995-2001] in
the context of XML
Originally: Pair and Quere [1968], Takahashi [1975], Thatcher
[1967], . . .
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
40 / 109
Unranked Tree Automata
Unranked versus Binary
Trading width for depth: first-child next-sibling encoding
b
#
b
b
enc
#
dec
a
−→
b
a
b
←−
a
a
a
b
b
a
#
a
#
Frank Neven (Hasselt University)
#
b
Automata and XML schema languages
a
27 February 2006
41 / 109
Unranked Tree Automata
Unranked versus Binary
Trading width for depth: first-child next-sibling encoding
b
#
b
b
enc
#
dec
a
−→
b
a
b
←−
a
a
a
b
b
a
#
a
#
Frank Neven (Hasselt University)
#
b
Automata and XML schema languages
a
27 February 2006
41 / 109
Unranked Tree Automata
Unranked versus Binary
Trading width for depth: first-child next-sibling encoding
b
#
b
b
enc
#
dec
a
−→
b
a
b
←−
a
a
a
b
b
a
#
a
#
Frank Neven (Hasselt University)
#
b
Automata and XML schema languages
a
27 February 2006
41 / 109
Unranked Tree Automata
Binary Regular ≡ Unranked Regular
Theorem [Folklore]
For every unranked NTA B there is a binary TA A such that
L(A) = {enc(t) | t ∈ L(B)}.
For every binary TA A there is an unranked NTA B such that
L(B) = {dec(t) | t ∈ L(A)}.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
42 / 109
Unranked Tree Automata
Binary Regular ≡ Unranked Regular
Theorem [Folklore]
For every unranked NTA B there is a binary TA A such that
L(A) = {enc(t) | t ∈ L(B)}.
For every binary TA A there is an unranked NTA B such that
L(B) = {dec(t) | t ∈ L(A)}.
Encoding preserving properties
closure properties (e.g., Boolean closure)
equivalent characterizations (e.g., MSO definability),
decidability (e.g., containment)
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
42 / 109
Unranked Tree Automata
Binary Regular ≡ Unranked Regular
Theorem [Folklore]
For every unranked NTA B there is a binary TA A such that
L(A) = {enc(t) | t ∈ L(B)}.
For every binary TA A there is an unranked NTA B such that
L(B) = {dec(t) | t ∈ L(A)}.
Encoding preserving properties
closure properties (e.g., Boolean closure)
equivalent characterizations (e.g., MSO definability),
decidability (e.g., containment)
not everything carries over
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
42 / 109
Unranked Tree Automata
Encoding does not preserve complexity
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
43 / 109
Unranked Tree Automata
Encoding does not preserve complexity
Representation
NTA(S) is the class of NTAs where the transition functions are
represented by elements from S.
E.g., NTA(NFA), NTA(REG), NTA(2AFA), . . .
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
43 / 109
Unranked Tree Automata
Encoding does not preserve complexity
Representation
NTA(S) is the class of NTAs where the transition functions are
represented by elements from S.
E.g., NTA(NFA), NTA(REG), NTA(2AFA), . . .
Emptiness
Given: automaton A
Question: Is L(A) = ∅?
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
43 / 109
Unranked Tree Automata
Encoding does not preserve complexity
Representation
NTA(S) is the class of NTAs where the transition functions are
represented by elements from S.
E.g., NTA(NFA), NTA(REG), NTA(2AFA), . . .
Emptiness
Given: automaton A
Question: Is L(A) = ∅?
Theorem
Emptiness of NTA(2AFA) is PSPACE-complete. [Martens, Nev.
2003]
Emptiness of two-way alternating tree automata is
EXPTIME-complete. [Vardi 1998, Kupferman, Piterman, Vardi
2002]
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
43 / 109
Unranked Tree Automata
Deterministic unranked tree automata are not so
deterministic
Definition
An NTA(DFA) is bottom-up deterministic iff δ(q, a) ∩ δ(q 0 , a) = ∅ for all
q, q 0 ∈ Q and a ∈ Σ.
q
a
q1
Frank Neven (Hasselt University)
q2
q3
Automata and XML schema languages
∈ δ(q, a)
27 February 2006
44 / 109
Unranked Tree Automata
Equivalence of deterministic tree automata
Equivalence
Given: DTA A and B
Question: Is L(A) = L(B)?
Equivalence of deterministic unranked tree automata
Compute complement ¬A and ¬B:
Make automaton complete: add δ(q, q 0 , a) = qtrash for every
undefined triple
Exchange final and non-final states.
in PTIME
Test whether
S symmetric difference is empty:
(A ∩ ¬B) (B ∩ ¬A) = ∅
Frank Neven (Hasselt University)
Automata and XML schema languages
in PTIME
27 February 2006
45 / 109
Unranked Tree Automata
Equivalence of deterministic unranked tree automata
Completing unranked deterministic automata is problematic
δ(qtrash , a) = Q ∗ −
S
Frank Neven (Hasselt University)
q∈Q
δ(q, a)
is exponentially bigger.
Automata and XML schema languages
27 February 2006
46 / 109
Unranked Tree Automata
Equivalence of deterministic unranked tree automata
Completing unranked deterministic automata is problematic
δ(qtrash , a) = Q ∗ −
S
q∈Q
δ(q, a)
is exponentially bigger.
Solution
The binary encoding of a DTA is unambiguous.
Testing equivalence of unambiguous binary TAs is in PTIME.
[Seidl 1990]
Unranked bottom-up DTA(DFA)s are exponentially more succinct than
binary bottom-up DTAs [Martens, Niehren 2005]
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
46 / 109
Unranked Tree Automata
Minimization
Theorem (Martens, Niehren 2005)
Minimization of DTA(DFA) is NP-complete.
There does not always exists a unique minimal DTA(DFA).
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
47 / 109
Unranked Tree Automata
Minimization
Theorem (Martens, Niehren 2005)
Minimization of DTA(DFA) is NP-complete.
There does not always exists a unique minimal DTA(DFA).
Crux
Minimizing DTA(DFA)s is related to minimizing disjoint unions of DFAs:
δ(q1 , a) ∪ · · · ∪ δ(qn , a).
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
47 / 109
Unranked Tree Automata
Minimization
Theorem (Martens, Niehren 2005)
Minimization of DTA(DFA) is NP-complete.
There does not always exists a unique minimal DTA(DFA).
Crux
Minimizing DTA(DFA)s is related to minimizing disjoint unions of DFAs:
δ(q1 , a) ∪ · · · ∪ δ(qn , a).
Other models
Stepwise tree automata [Carme, Niehren, Tommasi 2004]
Instead of n automata representing δ(q1 , a), . . . , δ(qn , a), use one
automaton Na with an output function [Cristau, Löding, Thomas
2005]
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
47 / 109
Unranked Tree Automata
Summary slide
What to remember?
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
48 / 109
Unranked Tree Automata
Summary slide
What to remember?
Tree automata are a very robust class (much like string automata).
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
48 / 109
Unranked Tree Automata
Summary slide
What to remember?
Tree automata are a very robust class (much like string automata).
Many properties for unranked automata carry over from the
ranked case through the encoding, . . . but not all.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
48 / 109
Unranked Tree Automata
Summary slide
What to remember?
Tree automata are a very robust class (much like string automata).
Many properties for unranked automata carry over from the
ranked case through the encoding, . . . but not all.
A DTA is not 100 % deterministic.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
48 / 109
Unranked Tree Automata
Summary slide
What to remember?
Tree automata are a very robust class (much like string automata).
Many properties for unranked automata carry over from the
ranked case through the encoding, . . . but not all.
A DTA is not 100 % deterministic.
XML Schema is usually abstracted by unranked tree automata
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
48 / 109
Unranked Tree Automata
Summary slide
What to remember?
Tree automata are a very robust class (much like string automata).
Many properties for unranked automata carry over from the
ranked case through the encoding, . . . but not all.
A DTA is not 100 % deterministic.
XML Schema is usually abstracted by unranked tree automata
. . . but this is not entirely accurate (as we will explain next)
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
48 / 109
Unranked Tree Automata
Summary slide
What to remember?
Tree automata are a very robust class (much like string automata).
Many properties for unranked automata carry over from the
ranked case through the encoding, . . . but not all.
A DTA is not 100 % deterministic.
XML Schema is usually abstracted by unranked tree automata
. . . but this is not entirely accurate (as we will explain next)
Questions
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
48 / 109
Unranked Tree Automata
Summary slide
What to remember?
Tree automata are a very robust class (much like string automata).
Many properties for unranked automata carry over from the
ranked case through the encoding, . . . but not all.
A DTA is not 100 % deterministic.
XML Schema is usually abstracted by unranked tree automata
. . . but this is not entirely accurate (as we will explain next)
Questions
Given an DTA A. Can you compute ¬A in PTIME?
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
48 / 109
Unranked Tree Automata
Summary slide
What to remember?
Tree automata are a very robust class (much like string automata).
Many properties for unranked automata carry over from the
ranked case through the encoding, . . . but not all.
A DTA is not 100 % deterministic.
XML Schema is usually abstracted by unranked tree automata
. . . but this is not entirely accurate (as we will explain next)
Questions
Given an DTA A. Can you compute ¬A in PTIME?
What is the right notion of deterministic unranked TA?
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
48 / 109
Extended Document Type Definitions
Outline
1
Introduction to XML
2
Document Type Definitions
3
Unranked Tree Automata
4
Extended Document Type Definitions
Definition
XML Schema
Properties of single-type EDTDs
Single-type EDTDs in practice
1-pass preorder typing
Relax NG
5
Decision problems for XML schema languages
6
Conclusion
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
49 / 109
Extended Document Type Definitions
Definition
Outline
1
Introduction to XML
2
Document Type Definitions
3
Unranked Tree Automata
4
Extended Document Type Definitions
Definition
XML Schema
Properties of single-type EDTDs
Single-type EDTDs in practice
1-pass preorder typing
Relax NG
5
Decision problems for XML schema languages
6
Conclusion
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
50 / 109
Extended Document Type Definitions
Definition
Extended DTDs
Grammar based approach to unranked regular tree languages
Definition (Papakonstantinou, Vianu, 2000)
Let ΣN := {σ n | σ ∈ Σ, n ∈ N} be the alphabet of types.
An extended DTD (EDTD) is a tuple D = (Σ, d, sd ), where (d, sd ) is a
(finite) DTD over Σ ∪ ΣN .
A tree t is valid w.r.t. D if there is an assignment of types such that the
typed tree is a derivation tree of d.
Example
store → (dvd1 + dvd2 )∗ dvd2 (dvd1 + dvd2 )∗
dvd1 → title price
dvd2 → title price discount
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
51 / 109
Extended Document Type Definitions
Definition
Extended DTDs
Grammar based approach to unranked regular tree languages
tree t
store
dvd
dvd
title
price
title
dvd
price
title
dvd
price discount
"Amélie" 17 "Good bye, Lenin!" 20 "Gothika" 15
title
price discount
4 "Pulp Fiction" 11
6
Example
store → (dvd1 + dvd2 )∗ dvd2 (dvd1 + dvd2 )∗
dvd1 → title price
dvd2 → title price discount
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
52 / 109
Extended Document Type Definitions
Definition
Extended DTDs
Grammar based approach to unranked regular tree languages
Typed tree t 0
store
dvd1
title
price
dvd1
title
dvd2
price
title
dvd2
price discount
"Amélie" 17 "Good bye, Lenin!" 20 "Gothika" 15
title
price discount
4 "Pulp Fiction" 11
6
Example
store → (dvd1 + dvd2 )∗ dvd2 (dvd1 + dvd2 )∗
dvd1 → title price
dvd2 → title price discount
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
52 / 109
Extended Document Type Definitions
Definition
EDTDs versus Tree Automata
Theorem (Papakonstantinou, Vianu, 2000)
NTAs and EDTDs define precisely the class of (homogeneous) regular
unranked tree languages.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
53 / 109
Extended Document Type Definitions
Definition
EDTDs versus Tree Automata
Theorem (Papakonstantinou, Vianu, 2000)
NTAs and EDTDs define precisely the class of (homogeneous) regular
unranked tree languages.
Example
EDTD
00
NTA
11
→ ε,
→ε
0
→ .∗ (0 +∨0 +∧0 ) .∗
1
∧ → (11 + ∨1 + ∧1 )∗
∨1 → .∗ (11 +∨1 +∧1 ) .∗
∨0 → (00 + ∨0 + ∧0 )∗
∧0
Frank Neven (Hasselt University)
δ(f , 0) = {ε}; δ(t, 1) = {ε};
δ(f , ∧) = . ∗ f .∗
δ(t, ∧) = t ∗
δ(t, ∨) = . ∗ t.∗
δ(f , ∨) = f ∗
Automata and XML schema languages
27 February 2006
53 / 109
Extended Document Type Definitions
XML Schema
Outline
1
Introduction to XML
2
Document Type Definitions
3
Unranked Tree Automata
4
Extended Document Type Definitions
Definition
XML Schema
Properties of single-type EDTDs
Single-type EDTDs in practice
1-pass preorder typing
Relax NG
5
Decision problems for XML schema languages
6
Conclusion
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
54 / 109
Extended Document Type Definitions
XML Schema
XML Schema
<xs:element name="store">
<xs:complexType>
<xs:sequence>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="dvd" type="1"/>
<xs:element name="dvd" type="2"/>
</xs:choice>
<xs:element name="dvd" type="2"/>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="dvd" type="1"/>
<xs:element name="dvd" type="2"/>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:element>
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
56 / 109
Extended Document Type Definitions
XML Schema
XML Schema
<xs:element name="store">
<xs:complexType>
<xs:sequence>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="dvd" type="1"/>
<xs:element name="dvd" type="2"/>
</xs:choice>
<xs:element name="dvd" type="2"/>
<xs:choice
minOccurs="0"
Rejected
by XML Schema
validator maxOccurs="unbounded">
name="dvd"
type="1"/>
Violates the<xs:element
Element Declarations
Consistent
Constraint.
<xs:element name="dvd" type="2"/>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:element>
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
56 / 109
Extended Document Type Definitions
XML Schema
A formalization of XML Schema: single-type EDTDs
XML Schema 1: Element Declarations Consistent constraint
(Section 3.8.6)
It is illegal to have two elements of the same name [. . . ] but different
types in a content model [. . . ].
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
57 / 109
Extended Document Type Definitions
XML Schema
A formalization of XML Schema: single-type EDTDs
XML Schema 1: Element Declarations Consistent constraint
(Section 3.8.6)
It is illegal to have two elements of the same name [. . . ] but different
types in a content model [. . . ].
Definition (Murata, Lee, Mani, 2001)
A single-type EDTD is an EDTD for which in no regular expression two
types bi and bj with i 6= j occur.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
57 / 109
Extended Document Type Definitions
XML Schema
A formalization of XML Schema: single-type EDTDs
XML Schema 1: Element Declarations Consistent constraint
(Section 3.8.6)
It is illegal to have two elements of the same name [. . . ] but different
types in a content model [. . . ].
Definition (Murata, Lee, Mani, 2001)
A single-type EDTD is an EDTD for which in no regular expression two
types bi and bj with i 6= j occur.
Not single-type
store → (dvd1 + dvd2 )∗ dvd2 (dvd1 + dvd2 )∗
dvd1 → title price
dvd2 → title price discount
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
57 / 109
Extended Document Type Definitions
XML Schema
A formalization of XML Schema: single-type EDTDs
Definition (Murata, Lee, Mani, 2001)
A single-type EDTD is an EDTD in which in no regular expression two
types bi and bj with i 6= j occur.
Example
store
regulars
discounts
dvd1
dvd2
Frank Neven (Hasselt University)
→
→
→
→
→
regulars discounts
(dvd1 )∗
dvd2 (dvd2 )∗
title price
title price discount
Automata and XML schema languages
27 February 2006
58 / 109
Extended Document Type Definitions
XML Schema
A formalization of XML Schema: single-type EDTDs
Formal abstraction
XML Schema ≈ single-type EDTDs
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
59 / 109
Extended Document Type Definitions
XML Schema
A formalization of XML Schema: single-type EDTDs
Formal abstraction
XML Schema ≈ single-type EDTDs
Immediate Questions
Can you recognize single-type EDTDs?
Trivial
XML Schema validator
What kind of languages can be defined by single-type EDTDs?
Is it decidable whether an EDTD is equivalent to a single-type
EDTD?
smart XML Schema validator
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
59 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
Outline
1
Introduction to XML
2
Document Type Definitions
3
Unranked Tree Automata
4
Extended Document Type Definitions
Definition
XML Schema
Properties of single-type EDTDs
Single-type EDTDs in practice
1-pass preorder typing
Relax NG
5
Decision problems for XML schema languages
6
Conclusion
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
60 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
Validation and typing
Validation and typing:
Given a tree t and an EDTD D = (Σ, d, a0 )
validation: does t ∈ L(D), i.e., does there exist a typed tree
t 0 ∈ L(d)?
typing: compute for every b-labeled node its type bi in t 0
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
61 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
Single-type EDTDs: simple top-down typing
Algorithm to validate and type a tree
[Murata et al., 2001]
Given: tree t and single-type EDTD D = (Σ, d, a0 )
1
2
Check if root of t is labeled with a, assign type a0
for every interior node u with type bi , test whether the children of u
match µ(d(bj )). If so, assign unique type to every child. Else fail.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
62 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
Single-type EDTDs: simple top-down typing
Algorithm to validate and type a tree
[Murata et al., 2001]
Given: tree t and single-type EDTD D = (Σ, d, a0 )
1
2
Check if root of t is labeled with a, assign type a0
for every interior node u with type bi , test whether the children of u
match µ(d(bj )). If so, assign unique type to every child. Else fail.
µ(a1 + b1 c 2 ) = a + bc
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
62 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
Single-type EDTDs: simple top-down typing
Algorithm to validate and type a tree
[Murata et al., 2001]
Given: tree t and single-type EDTD D = (Σ, d, a0 )
1
2
Check if root of t is labeled with a, assign type a0
for every interior node u with type bi , test whether the children of u
match µ(d(bj )). If so, assign unique type to every child. Else fail.
µ(a1 + b1 c 2 ) = a + bc
Corollary
Single-typedness implies unique top-down typing.
Motivation
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
62 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
Two-pass and ambiguous typing
Example
a → b1 + b2 ,
b1 → c,
b2 → d
Tree
a
b1 or 2?
c
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
63 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
Two-pass and ambiguous typing
Example
a → b1 + b2 ,
b1 → c,
b2 → d
Example
a → b1 + b2 ,
b1 → c ∗ ,
b2 → d ∗
Tree
a
b1 or 2?
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
63 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
Towards a characterization of single-type EDTDs
The Ancestor-String
a
Notation
anc-strt (u) = the ancestor-string of a tree t at node u
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
64 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
Single-type EDTDs: simple top-down typing
Definition
An EDTD D = (Σ, d, sd ) has ancestor-based types if there is a function
f : Σ∗ → ΣN such that, for each tree t ∈ L(D),
t has exactly one witness t 0 ∈ L(d), and
t 0 results from t by assigning to each node v the type
f (anc-strt (v )).
Intuition:
The type of a node depends on its ancestor-string, and on nothing else
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
65 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
Single-type EDTDs: simple top-down typing
Proposition
When a tree language T is definable by a single-type EDTD, then it
has ancestor based types.
Proof
Let T be defined by the single-type EDTD D = (Σ, d, a0 ). Then define
f inductively as follows:
f (a) = a0
for any string w · a · b with w ∈ Σ∗ and a, b ∈ Σ,
f (w · a · b) = bj
where bj occurs in d(ai ) and f (w · a) = ai .
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
66 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
An exchange property for single-type EDTDs
Ancestor-Guarded Subtree Exchange
T is a regular tree language
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
67 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
An exchange property for single-type EDTDs
Theorem (Martens, Nev., Schw., 2005)
A regular tree language is definable by a single-type EDTD iff it is
closed under ancestor-guarded subtree exchange.
Proof
⇒ single-type EDTD has ancestor-based types.
⇐ Compute single-type closure D 0 of given EDTD D:
E.g, a1 → b1 b2 and a2 → b3 becomes
a{1} → b{1} b{2}
a{2} → b{3}
a{1,2} → b{1,2,3} b{1,2,3} + b{1,2,3}
Obviously, L(D) ⊆ L(D 0 ). Now, L(D) ⊇ L(D 0 ) iff L(D) is closed under
ancestor-guarded subtree exchange.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
68 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
Tool for proving inexpressibility
Evaluation of Boolean circuits is not single-type
store
dvd
title
price
store
dvd
dvd
title
price
Frank Neven (Hasselt University)
discount
title
price
Automata and XML schema languages
dvd
discount
title
price
27 February 2006
69 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
Tool for proving inexpressibility
Evaluation of Boolean circuits is not single-type
store
dvd
title
price
store
dvd
dvd
title
price
Frank Neven (Hasselt University)
discount
title
price
Automata and XML schema languages
dvd
discount
title
price
27 February 2006
69 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
Tool for proving inexpressibility
Evaluation of Boolean circuits is not single-type
store
dvd
title
price
store
dvd
dvd
title
price
discount
title
price
dvd
discount
title
price
store
dvd
title
Frank Neven (Hasselt University)
price
dvd
title
price
Automata and XML schema languages
27 February 2006
69 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
Single-type EDTDs are not closed under union
Example
D1 : a → b,
b→c
D2 : a → bb,
a
b→d
a
b
b
b
c
d
d
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
70 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
Single-type EDTDs are not closed under union
Example
D1 : a → b,
b→c
D2 : a → bb,
a
b→d
a
b
b
b
c
d
d
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
70 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
Single-type EDTDs are not closed under union
Example
D1 : a → b,
b→c
D2 : a → bb,
a
b→d
a
b
b
b
c
d
d
6∈ L(D1 ) ∪ L(D2 )
a
b
d
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
70 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
Characterization of DTDs
DTDs define precisely the local tree languages
Theorem (Papakonstantinou, Vianu, 2000)
A regular tree language is definable by a DTD iff it is closed under
subtree exchange.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
71 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
Smart validator
Theorem (Martens, Nev., Schw., 2005)
Deciding whether an EDTD is equivalent to a single-type EDTD or a
DTD is EXPTIME-complete.
Upper bound
Compute single-type closure D 0 of given EDTD D:
E.g, a1 → b1 b2 and a2 → b3 becomes
a{1} → b{1} b{2}
a{2} → b{3}
a{1,2} → b{1,2,3} b{1,2,3} + b{1,2,3}
L(D 0 ) = L(D) iff L(D) is single-type.
We know that L(D) ⊆ L(D 0 ).
So, only need to test L(D 0 ) ⊆ L(D): D 0 ∩ ¬D = ∅.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
72 / 109
Extended Document Type Definitions
Properties of single-type EDTDs
Smart validator
Theorem (Martens, Nev., Schw., 2005)
Deciding whether an EDTD is equivalent to a single-type EDTD or a
DTD is EXPTIME-complete.
Lower bound
For r and s arbitrary regular expressions over Σ − {b}, the EDTD
a → r · b1 + s · b2
b1 → c
b2 → d
is equivalent to a single-type EDTD iff L(r ) = L(s) (a PSPACE-hard
problem). The equivalent DTD is a → r · b, b → c + d.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
73 / 109
Extended Document Type Definitions
Single-type EDTDs in practice
Outline
1
Introduction to XML
2
Document Type Definitions
3
Unranked Tree Automata
4
Extended Document Type Definitions
Definition
XML Schema
Properties of single-type EDTDs
Single-type EDTDs in practice
1-pass preorder typing
Relax NG
5
Decision problems for XML schema languages
6
Conclusion
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
74 / 109
Extended Document Type Definitions
Single-type EDTDs in practice
A practical study of XSDs
XML Schema: successor of DTDs
data types, referencing mechanism, modularity, XML Syntax, more
expressive power
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
75 / 109
Extended Document Type Definitions
Single-type EDTDs in practice
A practical study of XSDs
XML Schema: successor of DTDs
data types, referencing mechanism, modularity, XML Syntax, more
expressive power
Corpus
819 XSDs from the Cover pages.
726 XSDs through Google’s web services.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
75 / 109
Extended Document Type Definitions
Single-type EDTDs in practice
A practical study of XSDs
XML Schema: successor of DTDs
data types, referencing mechanism, modularity, XML Syntax, more
expressive power
Corpus
819 XSDs from the Cover pages.
726 XSDs through Google’s web services.
Only 225 are syntactically correct.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
75 / 109
Extended Document Type Definitions
Single-type EDTDs in practice
A practical study of XSDs
XML Schema: successor of DTDs
data types, referencing mechanism, modularity, XML Syntax, more
expressive power
Corpus
819 XSDs from the Cover pages.
726 XSDs through Google’s web services.
Only 225 are syntactically correct.
Practical XSDs are local
85% of the XSDs are structurally equivalent to a DTD: at most one
type is associated to every element name.
One example used types: a1 → b and a2 → b.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
75 / 109
Extended Document Type Definitions
Single-type EDTDs in practice
How do the 15% non-local XSDs look like?
90% of the cases, types only depend on parent context:
store
regulars
discounts
dvd1
dvd2
Frank Neven (Hasselt University)
→
→
→
→
→
regulars discounts
(dvd1 )∗
dvd2 dvd2 (dvd2 )∗
title price
title price discount
Automata and XML schema languages
27 February 2006
76 / 109
Extended Document Type Definitions
Single-type EDTDs in practice
How do the 15% non-local XSDs look like?
90% of the cases, types only depend on parent context:
store
regulars
discounts
dvd1
dvd2
→
→
→
→
→
regulars discounts
(dvd1 )∗
dvd2 dvd2 (dvd2 )∗
title price
title price discount
Remaining 10% are of the following form:
a
b
c
d1
d2
Frank Neven1 (Hasselt University)
→
→
→
→
→
b+c
e d1 f
e d2 f
g h1 i
g h2 i
h1
h2
j1
j2
→
→
→
→
Automata and XML schema languages
j1
j2
kl
mn
27 February 2006
76 / 109
Extended Document Type Definitions
Single-type EDTDs in practice
Why isn’t the expressiveness of XSDs used to its full
extend?
Two possible reasons
1
Extra non-local expressiveness is simply not needed in practice.
2
Users are not aware of the possibilities of XSDs: provide simple
formalism that make types dependent on ancestors.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
77 / 109
Extended Document Type Definitions
Single-type EDTDs in practice
Making dependencies explicit
Definition
An ancestor-based DTD A is a set of rules r → s where r and s are
regular expressions over Σ.
Definition
A tree t is valid w.r.t. A iff for every vertex v there is some r → s such
that anc-strt (v ) ∈ L(r ) and the children of v match s.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
78 / 109
Extended Document Type Definitions
Single-type EDTDs in practice
Making dependencies explicit
Theorem
Ancestor-based DTDs and single-type EDTDs define the same class
of tree languages.
Ancestor-guarded DTDs can be used as a light-weight front-end for
XML Schema
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
79 / 109
Extended Document Type Definitions
Single-type EDTDs in practice
Making dependencies explicit
single-type EDTD
store
regulars
discounts
dvd1
dvd2
→
→
→
→
→
regulars discounts
(dvd1 )∗
dvd2 dvd2 (dvd2 )∗
title price
title price discount
Ancestor-guarded DTD
store
regulars
discounts
∗ · regulars · dvd
∗ · discounts · dvd
Frank Neven (Hasselt University)
→
→
→
⇒
⇒
regulars discounts
dvd∗
dvd dvd dvd∗
title price
title price discount
Automata and XML schema languages
27 February 2006
80 / 109
Extended Document Type Definitions
Single-type EDTDs in practice
Making dependencies explicit
single-type EDTD
a
b
c
d1
d2
→
→
→
→
→
b+c
e d1 f
e d2 f
g h1 i
g h2 i
h1
h2
j1
j2
→
→
→
→
j1
j2
kl
mn
Ancestor-guarded DTD
a
b
c
d
→
→
→
→
Frank Neven (Hasselt University)
b+c
ed f
ed f
ghi
h → j
∗·b·∗·j ⇒ kl
∗·c ·∗·j ⇒ mn
Automata and XML schema languages
27 February 2006
81 / 109
Extended Document Type Definitions
1-pass preorder typing
Outline
1
Introduction to XML
2
Document Type Definitions
3
Unranked Tree Automata
4
Extended Document Type Definitions
Definition
XML Schema
Properties of single-type EDTDs
Single-type EDTDs in practice
1-pass preorder typing
Relax NG
5
Decision problems for XML schema languages
6
Conclusion
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
82 / 109
Extended Document Type Definitions
1-pass preorder typing
1-Pass preorder typing
<store><regulars><dvd>
<title>Amelie</title>
<price>17</price>
</dvd></regulars>
<discounts>...
Streaming
XML as an unparsed sequence of start and stop tags (SAX).
XML stream
validation
XPath routing
XML stream
typing
XML stream
XML stream
XML stream
Typing as the first operator in a pipeline
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
84 / 109
Extended Document Type Definitions
1-pass preorder typing
1-Pass Preorder Typing versus single-type EDTDs
Observations
Streaming (preorder) typing is not possible for every EDTD:
a → b1 + b2
b1 → c
b2 → d
a
b
c
Every single-type EDTD is preorder typable: type of child depends
only on type of parent
Single-type EDTDs are not the largest class which is preorder
typeable:
a
a → b1 b2
c b
b
b1 → c
2
b →d
c d
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
85 / 109
Extended Document Type Definitions
1-pass preorder typing
Restrained Competition EDTDs: left-to-right unique
typing
Definition (Murata, Lee, Mani, 2001)
A regular expression r restrains competition if there are no strings
wai v and waj v 0 in L(r ) with i 6= j.
An EDTD is restrained competition iff all regular expressions occurring
in rules restrain competition.
Not restrained-competition
store → (dvd1 + dvd2 )∗ dvd2 (dvd1 + dvd2 )∗
dvd2 (dvd1 + dvd2 )∗
1
dvd
→ title price
2
dvd
→ title price discount
dvd1 dvd2 dvd2
dvd2 dvd2 dvd2
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
86 / 109
Extended Document Type Definitions
1-pass preorder typing
Restrained Competition EDTDs
Definition (Murata, Lee, Mani, 2001)
A regular expression r restrains competition if there are no strings
wai v and waj v 0 in L(r ) with i 6= j.
An EDTD is restrained competition iff all regular expressions occurring
in rules restrain competition.
Restrained-competition
store
discounts
dvd1
dvd2
Frank Neven (Hasselt University)
→
→
→
→
(dvd1 )∗ discounts dvd2 dvd2 (dvd2 )∗
ε
title price
title price discount
Automata and XML schema languages
27 February 2006
87 / 109
Extended Document Type Definitions
1-pass preorder typing
Towards characterizations of 1-pass preorder typing
The ancestor-sibling string
a
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
88 / 109
Extended Document Type Definitions
1-pass preorder typing
Towards characterizations of 1-pass preorder typing
Theorem (Martens, Nev., Schw., 2005)
For a regular tree language T , the following are equivalent
T is 1-pass preorder typable
T is definable by a restrained-competition EDTD
T is closed under ancestor-sibling-guarded subtree exchange
T is definable by an ancestor-sibling-based DTD
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
89 / 109
Extended Document Type Definitions
1-pass preorder typing
Summary slide
What to remember?
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
90 / 109
Extended Document Type Definitions
1-pass preorder typing
Summary slide
What to remember?
DTD ≈ extended context-free grammars
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
90 / 109
Extended Document Type Definitions
1-pass preorder typing
Summary slide
What to remember?
DTD ≈ extended context-free grammars
XML Schema ≈ single-type EDTDs
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
90 / 109
Extended Document Type Definitions
1-pass preorder typing
Summary slide
What to remember?
DTD ≈ extended context-free grammars
XML Schema ≈ single-type EDTDs
XML Schema is much closer to DTDs than to tree automata
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
90 / 109
Extended Document Type Definitions
1-pass preorder typing
Summary slide
What to remember?
DTD ≈ extended context-free grammars
XML Schema ≈ single-type EDTDs
XML Schema is much closer to DTDs than to tree automata
single-typedness is not the most liberal restriction to get unique
top-down (1-pass) typing: restrained-competition EDTDs.
actually, determinism constraint alone already implies 1-pass
typing
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
90 / 109
Extended Document Type Definitions
Relax NG
Outline
1
Introduction to XML
2
Document Type Definitions
3
Unranked Tree Automata
4
Extended Document Type Definitions
Definition
XML Schema
Properties of single-type EDTDs
Single-type EDTDs in practice
1-pass preorder typing
Relax NG
5
Decision problems for XML schema languages
6
Conclusion
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
91 / 109
Extended Document Type Definitions
Relax NG
Relax NG
James Clark and Makoto Murata [2001]
based on RELAX (Regular Language description for XML) and
TREX (Tree Regular Expressions for XML)
Clean specification: 40 pages, XML Schema: 170 pages
O’Reilly book by Eric Van der Vlist
Motivated by unranked regular tree languages. Very similar to
extended DTDs.
Closed under Boolean operations.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
92 / 109
Extended Document Type Definitions
Relax NG
Relax NG: abbreviated syntax
store = element store
{ (dvd1 | dvd2)*, dvd2, (dvd1 | dvd2)* }
dvd1 =
element dvd {
element title { xsd:NCName },
element price { xsd:integer } }
dvd2 =
element dvd {
element title { xsd:NCName },
element price { xsd:integer },
element discount { xsd:integer } }
EDTD
store → (dvd1 + dvd2 )∗ dvd2 (dvd1 + dvd2 )∗
dvd1 → title price
dvd2 → title price discount
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
94 / 109
Extended Document Type Definitions
Relax NG
Relax NG: XML syntax
<define name="store">
<element name="store">
<zeroOrMore>
<choice>
<ref name="dvd1"/>
<ref name="dvd2"/>
</choice>
</zeroOrMore>
<ref name="dvd2"/>
<zeroOrMore>
<choice>
<ref name="dvd1"/>
<ref name="dvd2"/>
</choice>
</zeroOrMore>
</element>
</define>
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
96 / 109
Decision problems for XML schema languages
Outline
1
Introduction to XML
2
Document Type Definitions
3
Unranked Tree Automata
4
Extended Document Type Definitions
Definition
XML Schema
Properties of single-type EDTDs
Single-type EDTDs in practice
1-pass preorder typing
Relax NG
5
Decision problems for XML schema languages
6
Conclusion
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
97 / 109
Decision problems for XML schema languages
Complexity of basic decision problems
Schema CONTAINMENT (⊆)
Given: Schema’s d1 , d2
Question: Is L(d1 ) ⊆ L(d2 )?
Schema EQUIVALENCE (=)
Given: Schema’s d1 , d2
Question: Is L(d1 ) = L(d2 )?
Schema intersection (∩)
Given: Schema’s
T d1 , . . . , dn
Question: Is ni=1 L(di ) = ∅?
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
98 / 109
Decision problems for XML schema languages
Complexity of basic decision problems
Theorem (Seidl 1990, 1994)
CONTAINMENT, EQUIVALENCE, and INTERSECTION are
EXPTIME-complete for EDTDs and NTA(NFA)s.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
99 / 109
Decision problems for XML schema languages
Complexity of basic decision problems
Theorem (Seidl 1990, 1994)
CONTAINMENT, EQUIVALENCE, and INTERSECTION are
EXPTIME-complete for EDTDs and NTA(NFA)s.
Proposition
Let R be a class of regular expressions and C a complexity class. Then
the following are equivalent:
CONTAINMENT for R is in C;
CONTAINMENT for DTD(R) is in C;
CONTAINMENT for single-type EDTD(R) is in C;
CONTAINMENT for restrained-competition EDTD(R) is in C.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
99 / 109
Decision problems for XML schema languages
Complexity of basic decision problems
Proposition
Let R be a class of regular expressions and C a complexity class. Then
the following are equivalent:
INTERSECTION for R is in C;
INTERSECTION for DTD(R) is in C.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
100 / 109
Decision problems for XML schema languages
Complexity of basic decision problems
Proposition
Let R be a class of regular expressions and C a complexity class. Then
the following are equivalent:
INTERSECTION for R is in C;
INTERSECTION for DTD(R) is in C.
Theorem (Martens, Nev., Schw. 2005)
There is a class of regular expressions X such that
INTERSECTION for X is NP-complete;
INTERSECTION for single-type EDTD(X ) is EXPTIME-complete.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
100 / 109
Decision problems for XML schema languages
Complexity of regular expressions
Basic decision problems of regular expressions carry over to
schema languages
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
101 / 109
Decision problems for XML schema languages
Complexity of regular expressions
Basic decision problems of regular expressions carry over to
schema languages
Problem has been studied in depth (Hunt III et al., Kozen, Meyer
and Stockmeyer, . . . )
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
101 / 109
Decision problems for XML schema languages
Complexity of regular expressions
Basic decision problems of regular expressions carry over to
schema languages
Problem has been studied in depth (Hunt III et al., Kozen, Meyer
and Stockmeyer, . . . )
more than ninety percent of the regular expressions occurring in
practical DTDs and XSDs are Chain Regular Expressions
(CHAREs).
(Bex et al. 2004)
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
101 / 109
Decision problems for XML schema languages
Complexity of regular expressions
Definition
A base symbol is a regular expression s, s∗ , s+ , or s?, where s is
a non-empty string;
a factor is of the form e, e∗ , e+ , or e? where e is a disjunction of
base symbols.
A chain regular expression (CHARE) is ∅, ε, or a sequence of
factors.
Example
((abc)∗ + b∗ )(a + b)?(ab)+ (ac + b)∗ is a CHARE
(a + b) + (a∗ b∗ ) is not a CHARE.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
102 / 109
Decision problems for XML schema languages
Chain Regular Expressions (CHAREs)
Abbreviations
Factor
(a1 + · · · + an )
(a1 + · · · + an )∗
(a1 + · · · + an )+
(a1 + · · · + an )?
(a1∗ + · · · + an∗ )
(a1+ + · · · + an+ )
Frank Neven (Hasselt University)
Abbr.
(+a)
(+a)∗
(+a)+
(+a)?
(+a∗ )
(+a+ )
Factor
(w1 + · · · + wn )
(w1 + · · · + wn )∗
(w1 + · · · + wn )+
(w1 + · · · + wn )?
(w1∗ + · · · + wn∗ )
(w1+ + · · · + wn+ )
Automata and XML schema languages
Abbr.
(+w)
(+w)∗
(+w)+
(+w)?
(+w ∗ )
(+w + )
27 February 2006
103 / 109
Decision problems for XML schema languages
Complexity of CHAREs
Known results
CONTAINMENT for RE(a?, (+a)∗ ) is in PTIME [Abdulla, Bouajjani,
Jonsson 1998]
CONTAINMENT for RE(a, Σ, Σ∗ ) is in PTIME [Milo, Suciu 1999]
INTERSECTION for RE((+w)∗ ) is PSPACE-hard [Bala 2002]
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
104 / 109
Decision problems for XML schema languages
Complexity of CHAREs [Martens, Nev., Schw. 2004]
RE-fragment
a, a+
a, a∗
a, a?
CHAREs − {(+a)∗ , (+w)∗ ,
(+a)+ , (+w)+ }
a, (+a)∗
CHAREs − {(+w)∗ , (+w)+ }
a, (+w)∗
CHAREs
RE≤k (k ≥ 3)
deterministic
Frank Neven (Hasselt University)
Inclusion
in PTIME (DFA!)
coNP
coNP
Equivalence
in PTIME
in PTIME
in PTIME
Intersection
in PTIME
NP
NP
coNP
in coNP
NP
PSPACE
PSPACE
PSPACE
PSPACE
in PTIME
in PTIME
in PSPACE
in PSPACE
in PSPACE
in PSPACE
in PTIME
in PTIME
NP
NP
PSPACE
PSPACE
PSPACE
PSPACE
Automata and XML schema languages
27 February 2006
105 / 109
Decision problems for XML schema languages
Equivalence of a, a∗ is in PTIME
Put expression in sequence normal form.
E.g., aaa∗ bb∗ cccc ∗ becomes a≥2 b≥1 c ≥3 .
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
106 / 109
Decision problems for XML schema languages
Equivalence of a, a∗ is in PTIME
Put expression in sequence normal form.
E.g., aaa∗ bb∗ cccc ∗ becomes a≥2 b≥1 c ≥3 .
There are equivalent expressions with a different sequence
normal form:
a≥i b∗ a∗ b≥1 a≥j
Frank Neven (Hasselt University)
=
Automata and XML schema languages
a≥i b≥1 a∗ b∗ a≥j
27 February 2006
106 / 109
Decision problems for XML schema languages
Equivalence of a, a∗ is in PTIME
Put expression in sequence normal form.
E.g., aaa∗ bb∗ cccc ∗ becomes a≥2 b≥1 c ≥3 .
There are equivalent expressions with a different sequence
normal form:
a≥i b∗ a∗ b≥1 a≥j
=
a≥i b≥1 a∗ b∗ a≥j
Good news: this is the only exception. Non-trivial proof.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
106 / 109
Decision problems for XML schema languages
Equivalence of a, a∗ is in PTIME
Put expression in sequence normal form.
E.g., aaa∗ bb∗ cccc ∗ becomes a≥2 b≥1 c ≥3 .
There are equivalent expressions with a different sequence
normal form:
a≥i b∗ a∗ b≥1 a≥j
=
a≥i b≥1 a∗ b∗ a≥j
Good news: this is the only exception. Non-trivial proof.
Conjecture: equivalence is tractable for much larger fragments
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
106 / 109
Decision problems for XML schema languages
Summary slide
What to remember?
Decision problems for XML Schema translate to decision
problems for regular expressions.
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
107 / 109
Decision problems for XML schema languages
Summary slide
What to remember?
Decision problems for XML Schema translate to decision
problems for regular expressions.
Question
What is the largest class of regular expressions for which equivalence
is in PTIME?
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
107 / 109
Conclusion
Outline
1
Introduction to XML
2
Document Type Definitions
3
Unranked Tree Automata
4
Extended Document Type Definitions
Definition
XML Schema
Properties of single-type EDTDs
Single-type EDTDs in practice
1-pass preorder typing
Relax NG
5
Decision problems for XML schema languages
6
Conclusion
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
108 / 109
Conclusion
Conclusion
DTDs are almost extended context-free grammars
Unranked tree automata are a robust class – questions remain
XML Schema is closer to DTDs than to tree automata
XML (schema) research is a good excuse to do theory
Frank Neven (Hasselt University)
Automata and XML schema languages
27 February 2006
109 / 109