The automaton approach to XML schema languages: from practice
Transcription
The automaton approach to XML schema languages: from practice
The automaton approach to XML schema languages: from practice to theory Frank Neven1 1 Theoretical Computer Science Group Hasselt University Agoralaan, 3590 Diepenbeek, Belgium 27 February 2006 Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 1 / 109 Introduction to XML Outline 1 Introduction to XML 2 Document Type Definitions 3 Unranked Tree Automata 4 Extended Document Type Definitions Definition XML Schema Properties of single-type EDTDs Single-type EDTDs in practice 1-pass preorder typing Relax NG 5 Decision problems for XML schema languages 6 Conclusion Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 2 / 109 Introduction to XML XML is a data exchange format W3C standard geographical db XML user XML INTERNET OODB Rel DB car retailer car reviews Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 3 / 109 Introduction to XML A self-describing data format Example <store> <dvd> <title> Fabuleux destin d’Amelie </title> <price> 17 </price> </dvd> <dvd> <title> Goodbye Lenin </title> <price> 20 </price> <discount> 4 </discount> </dvd> </store> start tag: <title> end tag: </title> Frank Neven (Hasselt University) element: <title>...</title> Automata and XML schema languages 27 February 2006 4 / 109 Introduction to XML XML as a hierarchical structure Example store dvd title dvd price title price discount “Amélie" 17 “Good bye, Lenin!" 20 Frank Neven (Hasselt University) Automata and XML schema languages 4 27 February 2006 5 / 109 Introduction to XML Attributes Example <store name=“DVDPlanet”> <dvd category=“romance”> <title> Fabuleux ... d’Amelie </title> <price> 17 </price> </dvd> <dvd category=“drama” > <title> Goodbye Lenin </title> <price> 20 </price> <discount> 4 </discount> </dvd> </store> Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 6 / 109 Introduction to XML XML as a hierarchical structure Example store[name=“DVDPlanet”] dvd[category=“romance”] title price dvd[category=“drama”] title price discount “Amélie" 17 “Good bye, Lenin!" 20 Frank Neven (Hasselt University) Automata and XML schema languages 4 27 February 2006 7 / 109 Introduction to XML Trees as conceptual abstraction of XML documents XML documents are ordered unranked trees over a finite alphabet Σ of tag names. We assume an infinite set of data values D for attribute and leaf values. store[name=“DVDPlanet”] dvd[category=“romance”] title price dvd[category=“drama”] title price discount “Amélie" 17 “Good bye, Lenin!" 20 Frank Neven (Hasselt University) Automata and XML schema languages 4 27 February 2006 8 / 109 Introduction to XML Flexibility of XML Representation of the relational model Relation R A a1 a2 XML encoding B b1 b2 XML Tree R tuple tuple A A B B a1 b1 a2 b2 Frank Neven (Hasselt University) <R> <tuple> <A> a1 <B> b1 </tuple> <tuple> <A> a2 <B> b2 </tuple> </R> Automata and XML schema languages </A> </B> </A> </B> 27 February 2006 10 / 109 Introduction to XML XML schema languages Schema A schema defines the set of allowable tags and the way they can be structured. Advantages automatic validation automatic integration of data automatic translation query optimization provides a user with a concrete semantics of the document aids in the specification of meaningful queries over XML data Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 11 / 109 Introduction to XML XML schema languages Example DTDs (W3C) XML Schema (W3C) Relax NG (Clark, Murata) several dozen others (DSD, Schematron, . . . ) In formal language theoretic terms A schema defines a tree language. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 12 / 109 Introduction to XML Overview of XML Theory Cross fertilization XML Automata Frank Neven (Hasselt University) Logic Automata and XML schema languages 27 February 2006 13 / 109 Introduction to XML Overview of XML Theory Cross fertilization XML Automata Logic Different sorts of automata: grammars, tree automata, tree-walking automata, register automata, transducers, . . . Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 13 / 109 Introduction to XML Overview of XML Theory Cross fertilization XML Automata Logic Different sorts of automata: grammars, tree automata, tree-walking automata, register automata, transducers, . . . Automata serve as an algorithmic toolbox an abstract formal model of schema languages, query and pattern languages Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 13 / 109 Introduction to XML Summary slide What to remember? XML is an international standard XML documents or XML data are simply ordered unranked labeled trees with data values a schema defines a tree language (no data values) Focus of this talk Automata as a formal model for schema languages Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 14 / 109 Introduction to XML Outline 1 Introduction to XML 2 Document Type Definitions 3 Unranked Tree Automata 4 Extended Document Type Definitions Definition XML Schema Properties of single-type EDTDs Single-type EDTDs in practice 1-pass preorder typing Relax NG 5 Decision problems for XML schema languages 6 Conclusion Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 15 / 109 Document Type Definitions Outline 1 Introduction to XML 2 Document Type Definitions 3 Unranked Tree Automata 4 Extended Document Type Definitions Definition XML Schema Properties of single-type EDTDs Single-type EDTDs in practice 1-pass preorder typing Relax NG 5 Decision problems for XML schema languages 6 Conclusion Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 16 / 109 Document Type Definitions Document Type Definitions (DTDs) Example <!DOCTYPE store [ <!ELEMENT store <!ELEMENT dvd <!ELEMENT title <!ELEMENT price <!ELEMENT discount ]> (dvd,dvd*)> (title,price,discount?)> (#PCDATA)> (#PCDATA)> (#PCDATA)> Corresponding grammar (start symbol store) Frank Neven store → dvd dvd∗ dvd → title price(discount + ε) title → DATA price → DATA (Hasselt University) 27 February 2006 XML schema languages discountAutomata → andDATA 18 / 109 Document Type Definitions Document Type Definitions (DTDs) XML Document store dvd title dvd price title price "Amélie" 17 "Good bye, Lenin!" 20 Corresponding grammar (start symbol store) store dvd title price discount Frank Neven (Hasselt University) → → → → → dvd dvd∗ title price(discount + ε) DATA DATA DATA Automata and XML schema languages 27 February 2006 19 / 109 Document Type Definitions Document Type Definitions (DTDs) No data values XML Document store dvd dvd title price title price Corresponding grammar (start symbol store)) store → dvd dvd∗ dvd → title price(discount + ε) Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 20 / 109 Document Type Definitions Extended Context-free grammars as a formal abstraction Definition A DTD is a pair (d, sd ) where sd ∈ Σ is the start symbol d maps every Σ-symbol to a regular expression over Σ Definition A tree t satisfies d (is valid) iff the root of t is labeled sd for every vertex v labeled a the string formed by the children of v belongs to d(a). DTD validator Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 21 / 109 Document Type Definitions Optimization questions Schema containment (⊆) Given: schema’s d1 , d2 Question: Is L(d1 ) ⊆ L(d2 )? Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 22 / 109 Document Type Definitions Optimization questions Schema containment (⊆) Given: schema’s d1 , d2 Question: Is L(d1 ) ⊆ L(d2 )? DTD containment reduces to containment of regular expressions d1 ⊆ d2 iff d1 (a) ⊆ d2 (a), ∀a ∈ Σ (when d1 and d2 are trimmed). Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 22 / 109 Document Type Definitions Optimization questions Schema containment (⊆) Given: schema’s d1 , d2 Question: Is L(d1 ) ⊆ L(d2 )? DTD containment reduces to containment of regular expressions d1 ⊆ d2 iff d1 (a) ⊆ d2 (a), ∀a ∈ Σ (when d1 and d2 are trimmed). Theorem (Meyer, Stockmeyer, 1973) Containment of regular expressions is PSPACE-complete. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 22 / 109 Document Type Definitions Optimization questions Schema containment (⊆) Given: schema’s d1 , d2 Question: Is L(d1 ) ⊆ L(d2 )? DTD containment reduces to containment of regular expressions d1 ⊆ d2 iff d1 (a) ⊆ d2 (a), ∀a ∈ Σ (when d1 and d2 are trimmed). Theorem (Meyer, Stockmeyer, 1973) Containment of regular expressions is PSPACE-complete. Corollary DTD containment is PSPACE-complete. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 22 / 109 Document Type Definitions Regular Expressions in DTDs Should Be Deterministic How accurate is our abstraction? Backward compatibility with SGML The XML specifications requires regular expressions to be deterministic: for every input symbol in the input string we can uniquely determine by which symbol in the regular expression it should match without looking ahead in the input string. Example The expression (a + b)∗ a is not deterministic. Counterexample: baa. The expression b∗ a(b∗ a)∗ is deterministic. Why this restriction? Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 23 / 109 Document Type Definitions Regular Expressions in DTDs Should Be Deterministic Relevant questions 1 How do we recognize deterministic regular expressions? DTD validator 2 Can every regular language be denoted by a deterministic regular expression? 3 Are deterministic regular languages a robust class? 4 If a regular expression is not deterministic, can you find an equivalent one that is? smart DTD validator Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 24 / 109 Document Type Definitions Formalization by Brüggemann-Klein and Wood [1998] Definition A marking r 0 of a regular expression r is an assignment of numbers to every symbol in r . Example (a1 + b2 )∗ a3 is a marking of (a + b)∗ a Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 25 / 109 Document Type Definitions Formalization by Brüggemann-Klein and Wood [1998] Definition A marking r 0 of a regular expression r is an assignment of numbers to every symbol in r . For w ∈ L(r 0 ), we denote by w # the corresponding unmarked string in L(r ). Example (a1 + b2 )∗ a3 is a marking of (a + b)∗ a For w = b2 a1 a3 , w # = baa Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 25 / 109 Document Type Definitions Formalization by Brüggemann-Klein and Wood [1998] Definition A regular expression r is deterministic (one-unambiguous) iff there are no strings uxv , uyw ∈ L(r 0 ) with |x| = |y | = 1, x 6= y , (x and y are different marked symbols) x# = y# (their unmarking is the same). Example (a + b)∗ a is not deterministic: u x v u y w take and b2 a1 a3 b2 a3 ε Tool Glushkov construction preserves one-step unambiguity. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 26 / 109 Document Type Definitions Glushkov automaton for b∗ a(b∗ a)∗ Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 27 / 109 Document Type Definitions Glushkov automaton for b∗ a(b∗ a)∗ b1∗ a2 (b3∗ a4 )∗ a4 b1 a2 b3 Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 27 / 109 Document Type Definitions Glushkov automaton for b∗ a(b∗ a)∗ b1∗ a2 (b3∗ a4 )∗ a4 b1 a2 b3 Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 27 / 109 Document Type Definitions Glushkov automaton for b∗ a(b∗ a)∗ b1∗ a2 (b3∗ a4 )∗ a4 b1 a2 b3 Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 28 / 109 Document Type Definitions Glushkov automaton for b∗ a(b∗ a)∗ b1∗ a2 (b3∗ a4 )∗ a4 b1 a2 b3 Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 28 / 109 Document Type Definitions Glushkov automaton for b∗ a(b∗ a)∗ b1∗ a2 (b3∗ a4 )∗ a4 b1 a2 b3 Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 28 / 109 Document Type Definitions Glushkov automaton for b∗ a(b∗ a)∗ b1∗ a2 (b3∗ a4 )∗ a4 b1 a2 b3 Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 28 / 109 Document Type Definitions Glushkov automaton for b∗ a(b∗ a)∗ b1∗ a2 (b3∗ a4 )∗ a4 b1 a2 b3 Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 28 / 109 Document Type Definitions Glushkov automaton for b∗ a(b∗ a)∗ b1∗ a2 (b3∗ a4 )∗ a b a b Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 28 / 109 Document Type Definitions Glushkov automaton construction b1∗ a2 (b3∗ a4 )∗ (a1 + b2 )∗ a3 a4 a1 b1 a2 a3 b3 b2 Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 29 / 109 Document Type Definitions Recognition of deterministic regular expressions Theorem (Book et al 1971, Brüggemann-Klein, Wood, 1998) A regular expression is deterministic (one-unambiguous) iff its Glushkov automaton is deterministic. It is decidable in quadratic time whether a regular expression is deterministic. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 30 / 109 Document Type Definitions Properties of deterministic regular languages Theorem (Brüggemann-Klein, Wood, 1998) Not every regular language can be denoted by a deterministic regular expression. E.g., (a + b)∗ a(a + b). Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 31 / 109 Document Type Definitions Properties of deterministic regular languages Theorem (Brüggemann-Klein, Wood, 1998) Not every regular language can be denoted by a deterministic regular expression. E.g., (a + b)∗ a(a + b). Deterministic regular languages are not closed under union, concatenation, or Kleene-star. No syntax for deterministic regular languages Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 31 / 109 Document Type Definitions Properties of deterministic regular languages Theorem (Brüggemann-Klein, Wood, 1998) Not every regular language can be denoted by a deterministic regular expression. E.g., (a + b)∗ a(a + b). Deterministic regular languages are not closed under union, concatenation, or Kleene-star. No syntax for deterministic regular languages It can be decided in regular language. PTIME whether a DFA denotes a deterministic Orbit property. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 31 / 109 Document Type Definitions Properties of deterministic regular languages Theorem (Brüggemann-Klein, Wood, 1998) Not every regular language can be denoted by a deterministic regular expression. E.g., (a + b)∗ a(a + b). Deterministic regular languages are not closed under union, concatenation, or Kleene-star. No syntax for deterministic regular languages It can be decided in regular language. PTIME whether a DFA denotes a deterministic Orbit property. If it exists, an equivalent deterministic regular expression can be constructed in exponential time. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 31 / 109 Document Type Definitions Properties of deterministic regular languages Theorem (Brüggemann-Klein, Wood, 1998) Not every regular language can be denoted by a deterministic regular expression. E.g., (a + b)∗ a(a + b). Deterministic regular languages are not closed under union, concatenation, or Kleene-star. No syntax for deterministic regular languages It can be decided in regular language. PTIME whether a DFA denotes a deterministic Orbit property. If it exists, an equivalent deterministic regular expression can be constructed in exponential time. Results provide formal machinery for dealing with DTDs. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 31 / 109 Document Type Definitions Complexity of basic decision problems revisit Schema containment (⊆) Given: Schema’s d1 , d2 Question: Is L(d1 ) ⊆ L(d2 )? DTD containment reduces to containment of regular expressions d1 ⊆ d2 iff d1 (a) ⊆ d2 (a), ∀a ∈ Σ (when d1 and d2 are trimmed). Theorem Containment of DTDs with deterministic regular expressions is in PTIME. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 32 / 109 Document Type Definitions Summary slide What to remember? XML DTDs are context-free grammars with deterministic regular expressions Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 33 / 109 Document Type Definitions Summary slide What to remember? XML DTDs are context-free grammars with deterministic regular expressions deterministic regular expressions are a semantical notion: no easy syntax – non-transparent to users Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 33 / 109 Document Type Definitions Summary slide What to remember? XML DTDs are context-free grammars with deterministic regular expressions deterministic regular expressions are a semantical notion: no easy syntax – non-transparent to users advantage: optimization problems are tractable Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 33 / 109 Document Type Definitions Summary slide What to remember? XML DTDs are context-free grammars with deterministic regular expressions deterministic regular expressions are a semantical notion: no easy syntax – non-transparent to users advantage: optimization problems are tractable Question What is the largest robust class of regular expressions that can be translated to DFAs in PTIME? Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 33 / 109 Unranked Tree Automata Outline 1 Introduction to XML 2 Document Type Definitions 3 Unranked Tree Automata 4 Extended Document Type Definitions Definition XML Schema Properties of single-type EDTDs Single-type EDTDs in practice 1-pass preorder typing Relax NG 5 Decision problems for XML schema languages 6 Conclusion Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 34 / 109 Unranked Tree Automata Deterministic Tree Automata over Binary Trees Definition Formally, M = (Q, Σ, δ, F ) ∧ ∨ 0 ∧ 1 1 1 Frank Neven (Hasselt University) with Q = {f , t}, Σ = {0, 1, ∧, ∨}, F = {t}, and δ(0) = f δ(1) = t δ(f , f , ∧) = f δ(f , f , ∨) = f δ(t, f , ∧) = f δ(t, f , ∨) = t δ(f , t, ∧) = f δ(f , t, ∨) = t δ(t, t, ∧) = t δ(t, t, ∨) = t Automata and XML schema languages 27 February 2006 35 / 109 Unranked Tree Automata Deterministic Tree Automata over Binary Trees Definition Formally, M = (Q, Σ, δ, F ) ∧ ∨ ∧ 0 f 1 t 1 1 t Frank Neven (Hasselt University) t with Q = {f , t}, Σ = {0, 1, ∧, ∨}, F = {t}, and δ(0) = f δ(1) = t δ(f , f , ∧) = f δ(f , f , ∨) = f δ(t, f , ∧) = f δ(t, f , ∨) = t δ(f , t, ∧) = f δ(f , t, ∨) = t δ(t, t, ∧) = t δ(t, t, ∨) = t Automata and XML schema languages 27 February 2006 35 / 109 Unranked Tree Automata Deterministic Tree Automata over Binary Trees Definition Formally, M = (Q, Σ, δ, F ) ∧ ∨ t t 0 f 1 t 1 ∧ 1 t Frank Neven (Hasselt University) t with Q = {f , t}, Σ = {0, 1, ∧, ∨}, F = {t}, and δ(0) = f δ(1) = t δ(f , f , ∧) = f δ(f , f , ∨) = f δ(t, f , ∧) = f δ(t, f , ∨) = t δ(f , t, ∧) = f δ(f , t, ∨) = t δ(t, t, ∧) = t δ(t, t, ∨) = t Automata and XML schema languages 27 February 2006 35 / 109 Unranked Tree Automata Deterministic Tree Automata over Binary Trees Definition Formally, M = (Q, Σ, δ, F ) ∧ t ∨ t t 0 f 1 t 1 ∧ 1 t Frank Neven (Hasselt University) t with Q = {f , t}, Σ = {0, 1, ∧, ∨}, F = {t}, and δ(0) = f δ(1) = t δ(f , f , ∧) = f δ(f , f , ∨) = f δ(t, f , ∧) = f δ(t, f , ∨) = t δ(f , t, ∧) = f δ(f , t, ∨) = t δ(t, t, ∧) = t δ(t, t, ∨) = t Automata and XML schema languages 27 February 2006 35 / 109 Unranked Tree Automata Tree Automata over Binary Trees Definition A set of binary trees is regular iff it is accepted by a tree automaton. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 36 / 109 Unranked Tree Automata Tree Automata over Binary Trees Definition A set of binary trees is regular iff it is accepted by a tree automaton. Deterministic versus non-deterministic Det: δ : Q × Q × Σ → Q Non-Det: δ : Q × Q × Σ → 2Q Semantics: tree is accepted if there is a labeling of states consistent with the transition function, and root is labeled with accepting state top-down: δ : Q × Σ → 2Q×Q Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 36 / 109 Unranked Tree Automata Tree Automata over Binary Trees Robust class det. bottom-up TA = non-det. bottom-up TA (subset construction) Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 37 / 109 Unranked Tree Automata Tree Automata over Binary Trees Robust class det. bottom-up TA = non-det. bottom-up TA (subset construction) non-det. top-down TA = non-det bottom up TA Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 37 / 109 Unranked Tree Automata Tree Automata over Binary Trees Robust class det. bottom-up TA = non-det. bottom-up TA (subset construction) non-det. top-down TA = non-det bottom up TA Closed under Boolean operations: Union, intersection: product construction Complement: complete automaton, determinize, swap final and non-final states Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 37 / 109 Unranked Tree Automata Tree Automata over Binary Trees Robust class det. bottom-up TA = non-det. bottom-up TA (subset construction) non-det. top-down TA = non-det bottom up TA Closed under Boolean operations: Union, intersection: product construction Complement: complete automaton, determinize, swap final and non-final states Many equivalent notions: alternating, two-way, tree-walking + restricted pushdown, MSO, . . . Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 37 / 109 Unranked Tree Automata Tree Automata over Binary Trees Robust class det. bottom-up TA = non-det. bottom-up TA (subset construction) non-det. top-down TA = non-det bottom up TA Closed under Boolean operations: Union, intersection: product construction Complement: complete automaton, determinize, swap final and non-final states Many equivalent notions: alternating, two-way, tree-walking + restricted pushdown, MSO, . . . Decision problems: containment is EXPTIME-complete for non-det TA [Seidl 1990], PTIME-complete for det TA. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 37 / 109 Unranked Tree Automata Tree Automata over Binary Trees Robust class det. bottom-up TA = non-det. bottom-up TA (subset construction) non-det. top-down TA = non-det bottom up TA Closed under Boolean operations: Union, intersection: product construction Complement: complete automaton, determinize, swap final and non-final states Many equivalent notions: alternating, two-way, tree-walking + restricted pushdown, MSO, . . . Decision problems: containment is EXPTIME-complete for non-det TA [Seidl 1990], PTIME-complete for det TA. PTIME minimization for det TA, unique minimal TA Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 37 / 109 Unranked Tree Automata Bottom-up Tree Automata over Unranked Trees Binary versus unranked binary tree: δ : Q × Q × Σ → Q Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 38 / 109 Unranked Tree Automata Bottom-up Tree Automata over Unranked Trees Binary versus unranked binary tree: δ : Q × Q × Σ → Q S i unranked tree: δ : ∞ i=0 Q × Σ → Q Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 38 / 109 Unranked Tree Automata Bottom-up Tree Automata over Unranked Trees Binary versus unranked binary tree: δ : Q × Q × Σ → Q S i unranked tree: δ : ∞ i=0 Q × Σ → Q specify transition functions by regular string languages over states: δ(q, a) ⊆ Q ∗ is a regular language Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 38 / 109 Unranked Tree Automata Bottom-up Tree Automata over Unranked Trees Binary versus unranked binary tree: δ : Q × Q × Σ → Q S i unranked tree: δ : ∞ i=0 Q × Σ → Q specify transition functions by regular string languages over states: δ(q, a) ⊆ Q ∗ is a regular language q a q1 Frank Neven (Hasselt University) q2 q3 Automata and XML schema languages ∈ δ(q, a) 27 February 2006 38 / 109 Unranked Tree Automata Bottom-up Tree Automata over Unranked Trees ∧ ∨ 0 1 ∧ 0 1 1 ∨ 1 0 1 1 Transition function, F = {t} δ(f , 0) = {ε}; δ(f , 1) = ∅ δ(t, 1) = {ε}; δ(t, 0) = ∅ δ(f , ∧) = (f + t)∗ f (f + t)∗ δ(t, ∧) = t ∗ δ(f , ∨) = f ∗ δ(t, ∨) = (f + t)∗ t(f + t)∗ Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 39 / 109 Unranked Tree Automata Bottom-up Tree Automata over Unranked Trees ∨ t 0 f 0 1 t t ∧ t ∧ 1 t f 1 t ∨ t 1 0 t 1 f 1 t t Transition function, F = {t} δ(f , 0) = {ε}; δ(f , 1) = ∅ δ(t, 1) = {ε}; δ(t, 0) = ∅ δ(f , ∧) = (f + t)∗ f (f + t)∗ δ(t, ∧) = t ∗ δ(f , ∨) = f ∗ δ(t, ∨) = (f + t)∗ t(f + t)∗ Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 39 / 109 Unranked Tree Automata Bottom-up Tree Automata over Unranked Trees ∨ t 0 f 0 1 t t ∧ t ∧ 1 t f 1 t ∨ t 1 0 t 1 f 1 t t Transition function, F = {t} δ(f , 0) = {ε}; δ(f , 1) = ∅ δ(t, 1) = {ε}; δ(t, 0) = ∅ δ(f , ∧) = (f + t)∗ f (f + t)∗ δ(t, ∧) = t ∗ δ(f , ∨) = f ∗ δ(t, ∨) = (f + t)∗ t(f + t)∗ Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 39 / 109 Unranked Tree Automata Bottom-up Tree Automata over Unranked Trees Definition A non-deterministic tree automaton (NTA) is a tuple B = (Q, Σ, δ, F ), Q is a finite set of states, F ⊆ Q is the set of final states, δ is a function Q × Σ → 2Q such that δ(q, a) is a regular string language over Q for every a ∈ Σ and q ∈ Q. ∗ History Resurrected by Brüggemann-Klein, Murata, Wood [1995-2001] in the context of XML Originally: Pair and Quere [1968], Takahashi [1975], Thatcher [1967], . . . Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 40 / 109 Unranked Tree Automata Unranked versus Binary Trading width for depth: first-child next-sibling encoding b # b b enc # dec a −→ b a b ←− a a a b b a # a # Frank Neven (Hasselt University) # b Automata and XML schema languages a 27 February 2006 41 / 109 Unranked Tree Automata Unranked versus Binary Trading width for depth: first-child next-sibling encoding b # b b enc # dec a −→ b a b ←− a a a b b a # a # Frank Neven (Hasselt University) # b Automata and XML schema languages a 27 February 2006 41 / 109 Unranked Tree Automata Unranked versus Binary Trading width for depth: first-child next-sibling encoding b # b b enc # dec a −→ b a b ←− a a a b b a # a # Frank Neven (Hasselt University) # b Automata and XML schema languages a 27 February 2006 41 / 109 Unranked Tree Automata Binary Regular ≡ Unranked Regular Theorem [Folklore] For every unranked NTA B there is a binary TA A such that L(A) = {enc(t) | t ∈ L(B)}. For every binary TA A there is an unranked NTA B such that L(B) = {dec(t) | t ∈ L(A)}. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 42 / 109 Unranked Tree Automata Binary Regular ≡ Unranked Regular Theorem [Folklore] For every unranked NTA B there is a binary TA A such that L(A) = {enc(t) | t ∈ L(B)}. For every binary TA A there is an unranked NTA B such that L(B) = {dec(t) | t ∈ L(A)}. Encoding preserving properties closure properties (e.g., Boolean closure) equivalent characterizations (e.g., MSO definability), decidability (e.g., containment) Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 42 / 109 Unranked Tree Automata Binary Regular ≡ Unranked Regular Theorem [Folklore] For every unranked NTA B there is a binary TA A such that L(A) = {enc(t) | t ∈ L(B)}. For every binary TA A there is an unranked NTA B such that L(B) = {dec(t) | t ∈ L(A)}. Encoding preserving properties closure properties (e.g., Boolean closure) equivalent characterizations (e.g., MSO definability), decidability (e.g., containment) not everything carries over Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 42 / 109 Unranked Tree Automata Encoding does not preserve complexity Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 43 / 109 Unranked Tree Automata Encoding does not preserve complexity Representation NTA(S) is the class of NTAs where the transition functions are represented by elements from S. E.g., NTA(NFA), NTA(REG), NTA(2AFA), . . . Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 43 / 109 Unranked Tree Automata Encoding does not preserve complexity Representation NTA(S) is the class of NTAs where the transition functions are represented by elements from S. E.g., NTA(NFA), NTA(REG), NTA(2AFA), . . . Emptiness Given: automaton A Question: Is L(A) = ∅? Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 43 / 109 Unranked Tree Automata Encoding does not preserve complexity Representation NTA(S) is the class of NTAs where the transition functions are represented by elements from S. E.g., NTA(NFA), NTA(REG), NTA(2AFA), . . . Emptiness Given: automaton A Question: Is L(A) = ∅? Theorem Emptiness of NTA(2AFA) is PSPACE-complete. [Martens, Nev. 2003] Emptiness of two-way alternating tree automata is EXPTIME-complete. [Vardi 1998, Kupferman, Piterman, Vardi 2002] Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 43 / 109 Unranked Tree Automata Deterministic unranked tree automata are not so deterministic Definition An NTA(DFA) is bottom-up deterministic iff δ(q, a) ∩ δ(q 0 , a) = ∅ for all q, q 0 ∈ Q and a ∈ Σ. q a q1 Frank Neven (Hasselt University) q2 q3 Automata and XML schema languages ∈ δ(q, a) 27 February 2006 44 / 109 Unranked Tree Automata Equivalence of deterministic tree automata Equivalence Given: DTA A and B Question: Is L(A) = L(B)? Equivalence of deterministic unranked tree automata Compute complement ¬A and ¬B: Make automaton complete: add δ(q, q 0 , a) = qtrash for every undefined triple Exchange final and non-final states. in PTIME Test whether S symmetric difference is empty: (A ∩ ¬B) (B ∩ ¬A) = ∅ Frank Neven (Hasselt University) Automata and XML schema languages in PTIME 27 February 2006 45 / 109 Unranked Tree Automata Equivalence of deterministic unranked tree automata Completing unranked deterministic automata is problematic δ(qtrash , a) = Q ∗ − S Frank Neven (Hasselt University) q∈Q δ(q, a) is exponentially bigger. Automata and XML schema languages 27 February 2006 46 / 109 Unranked Tree Automata Equivalence of deterministic unranked tree automata Completing unranked deterministic automata is problematic δ(qtrash , a) = Q ∗ − S q∈Q δ(q, a) is exponentially bigger. Solution The binary encoding of a DTA is unambiguous. Testing equivalence of unambiguous binary TAs is in PTIME. [Seidl 1990] Unranked bottom-up DTA(DFA)s are exponentially more succinct than binary bottom-up DTAs [Martens, Niehren 2005] Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 46 / 109 Unranked Tree Automata Minimization Theorem (Martens, Niehren 2005) Minimization of DTA(DFA) is NP-complete. There does not always exists a unique minimal DTA(DFA). Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 47 / 109 Unranked Tree Automata Minimization Theorem (Martens, Niehren 2005) Minimization of DTA(DFA) is NP-complete. There does not always exists a unique minimal DTA(DFA). Crux Minimizing DTA(DFA)s is related to minimizing disjoint unions of DFAs: δ(q1 , a) ∪ · · · ∪ δ(qn , a). Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 47 / 109 Unranked Tree Automata Minimization Theorem (Martens, Niehren 2005) Minimization of DTA(DFA) is NP-complete. There does not always exists a unique minimal DTA(DFA). Crux Minimizing DTA(DFA)s is related to minimizing disjoint unions of DFAs: δ(q1 , a) ∪ · · · ∪ δ(qn , a). Other models Stepwise tree automata [Carme, Niehren, Tommasi 2004] Instead of n automata representing δ(q1 , a), . . . , δ(qn , a), use one automaton Na with an output function [Cristau, Löding, Thomas 2005] Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 47 / 109 Unranked Tree Automata Summary slide What to remember? Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 48 / 109 Unranked Tree Automata Summary slide What to remember? Tree automata are a very robust class (much like string automata). Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 48 / 109 Unranked Tree Automata Summary slide What to remember? Tree automata are a very robust class (much like string automata). Many properties for unranked automata carry over from the ranked case through the encoding, . . . but not all. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 48 / 109 Unranked Tree Automata Summary slide What to remember? Tree automata are a very robust class (much like string automata). Many properties for unranked automata carry over from the ranked case through the encoding, . . . but not all. A DTA is not 100 % deterministic. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 48 / 109 Unranked Tree Automata Summary slide What to remember? Tree automata are a very robust class (much like string automata). Many properties for unranked automata carry over from the ranked case through the encoding, . . . but not all. A DTA is not 100 % deterministic. XML Schema is usually abstracted by unranked tree automata Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 48 / 109 Unranked Tree Automata Summary slide What to remember? Tree automata are a very robust class (much like string automata). Many properties for unranked automata carry over from the ranked case through the encoding, . . . but not all. A DTA is not 100 % deterministic. XML Schema is usually abstracted by unranked tree automata . . . but this is not entirely accurate (as we will explain next) Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 48 / 109 Unranked Tree Automata Summary slide What to remember? Tree automata are a very robust class (much like string automata). Many properties for unranked automata carry over from the ranked case through the encoding, . . . but not all. A DTA is not 100 % deterministic. XML Schema is usually abstracted by unranked tree automata . . . but this is not entirely accurate (as we will explain next) Questions Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 48 / 109 Unranked Tree Automata Summary slide What to remember? Tree automata are a very robust class (much like string automata). Many properties for unranked automata carry over from the ranked case through the encoding, . . . but not all. A DTA is not 100 % deterministic. XML Schema is usually abstracted by unranked tree automata . . . but this is not entirely accurate (as we will explain next) Questions Given an DTA A. Can you compute ¬A in PTIME? Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 48 / 109 Unranked Tree Automata Summary slide What to remember? Tree automata are a very robust class (much like string automata). Many properties for unranked automata carry over from the ranked case through the encoding, . . . but not all. A DTA is not 100 % deterministic. XML Schema is usually abstracted by unranked tree automata . . . but this is not entirely accurate (as we will explain next) Questions Given an DTA A. Can you compute ¬A in PTIME? What is the right notion of deterministic unranked TA? Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 48 / 109 Extended Document Type Definitions Outline 1 Introduction to XML 2 Document Type Definitions 3 Unranked Tree Automata 4 Extended Document Type Definitions Definition XML Schema Properties of single-type EDTDs Single-type EDTDs in practice 1-pass preorder typing Relax NG 5 Decision problems for XML schema languages 6 Conclusion Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 49 / 109 Extended Document Type Definitions Definition Outline 1 Introduction to XML 2 Document Type Definitions 3 Unranked Tree Automata 4 Extended Document Type Definitions Definition XML Schema Properties of single-type EDTDs Single-type EDTDs in practice 1-pass preorder typing Relax NG 5 Decision problems for XML schema languages 6 Conclusion Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 50 / 109 Extended Document Type Definitions Definition Extended DTDs Grammar based approach to unranked regular tree languages Definition (Papakonstantinou, Vianu, 2000) Let ΣN := {σ n | σ ∈ Σ, n ∈ N} be the alphabet of types. An extended DTD (EDTD) is a tuple D = (Σ, d, sd ), where (d, sd ) is a (finite) DTD over Σ ∪ ΣN . A tree t is valid w.r.t. D if there is an assignment of types such that the typed tree is a derivation tree of d. Example store → (dvd1 + dvd2 )∗ dvd2 (dvd1 + dvd2 )∗ dvd1 → title price dvd2 → title price discount Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 51 / 109 Extended Document Type Definitions Definition Extended DTDs Grammar based approach to unranked regular tree languages tree t store dvd dvd title price title dvd price title dvd price discount "Amélie" 17 "Good bye, Lenin!" 20 "Gothika" 15 title price discount 4 "Pulp Fiction" 11 6 Example store → (dvd1 + dvd2 )∗ dvd2 (dvd1 + dvd2 )∗ dvd1 → title price dvd2 → title price discount Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 52 / 109 Extended Document Type Definitions Definition Extended DTDs Grammar based approach to unranked regular tree languages Typed tree t 0 store dvd1 title price dvd1 title dvd2 price title dvd2 price discount "Amélie" 17 "Good bye, Lenin!" 20 "Gothika" 15 title price discount 4 "Pulp Fiction" 11 6 Example store → (dvd1 + dvd2 )∗ dvd2 (dvd1 + dvd2 )∗ dvd1 → title price dvd2 → title price discount Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 52 / 109 Extended Document Type Definitions Definition EDTDs versus Tree Automata Theorem (Papakonstantinou, Vianu, 2000) NTAs and EDTDs define precisely the class of (homogeneous) regular unranked tree languages. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 53 / 109 Extended Document Type Definitions Definition EDTDs versus Tree Automata Theorem (Papakonstantinou, Vianu, 2000) NTAs and EDTDs define precisely the class of (homogeneous) regular unranked tree languages. Example EDTD 00 NTA 11 → ε, →ε 0 → .∗ (0 +∨0 +∧0 ) .∗ 1 ∧ → (11 + ∨1 + ∧1 )∗ ∨1 → .∗ (11 +∨1 +∧1 ) .∗ ∨0 → (00 + ∨0 + ∧0 )∗ ∧0 Frank Neven (Hasselt University) δ(f , 0) = {ε}; δ(t, 1) = {ε}; δ(f , ∧) = . ∗ f .∗ δ(t, ∧) = t ∗ δ(t, ∨) = . ∗ t.∗ δ(f , ∨) = f ∗ Automata and XML schema languages 27 February 2006 53 / 109 Extended Document Type Definitions XML Schema Outline 1 Introduction to XML 2 Document Type Definitions 3 Unranked Tree Automata 4 Extended Document Type Definitions Definition XML Schema Properties of single-type EDTDs Single-type EDTDs in practice 1-pass preorder typing Relax NG 5 Decision problems for XML schema languages 6 Conclusion Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 54 / 109 Extended Document Type Definitions XML Schema XML Schema <xs:element name="store"> <xs:complexType> <xs:sequence> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element name="dvd" type="1"/> <xs:element name="dvd" type="2"/> </xs:choice> <xs:element name="dvd" type="2"/> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element name="dvd" type="1"/> <xs:element name="dvd" type="2"/> </xs:choice> </xs:sequence> </xs:complexType> </xs:element> Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 56 / 109 Extended Document Type Definitions XML Schema XML Schema <xs:element name="store"> <xs:complexType> <xs:sequence> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element name="dvd" type="1"/> <xs:element name="dvd" type="2"/> </xs:choice> <xs:element name="dvd" type="2"/> <xs:choice minOccurs="0" Rejected by XML Schema validator maxOccurs="unbounded"> name="dvd" type="1"/> Violates the<xs:element Element Declarations Consistent Constraint. <xs:element name="dvd" type="2"/> </xs:choice> </xs:sequence> </xs:complexType> </xs:element> Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 56 / 109 Extended Document Type Definitions XML Schema A formalization of XML Schema: single-type EDTDs XML Schema 1: Element Declarations Consistent constraint (Section 3.8.6) It is illegal to have two elements of the same name [. . . ] but different types in a content model [. . . ]. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 57 / 109 Extended Document Type Definitions XML Schema A formalization of XML Schema: single-type EDTDs XML Schema 1: Element Declarations Consistent constraint (Section 3.8.6) It is illegal to have two elements of the same name [. . . ] but different types in a content model [. . . ]. Definition (Murata, Lee, Mani, 2001) A single-type EDTD is an EDTD for which in no regular expression two types bi and bj with i 6= j occur. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 57 / 109 Extended Document Type Definitions XML Schema A formalization of XML Schema: single-type EDTDs XML Schema 1: Element Declarations Consistent constraint (Section 3.8.6) It is illegal to have two elements of the same name [. . . ] but different types in a content model [. . . ]. Definition (Murata, Lee, Mani, 2001) A single-type EDTD is an EDTD for which in no regular expression two types bi and bj with i 6= j occur. Not single-type store → (dvd1 + dvd2 )∗ dvd2 (dvd1 + dvd2 )∗ dvd1 → title price dvd2 → title price discount Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 57 / 109 Extended Document Type Definitions XML Schema A formalization of XML Schema: single-type EDTDs Definition (Murata, Lee, Mani, 2001) A single-type EDTD is an EDTD in which in no regular expression two types bi and bj with i 6= j occur. Example store regulars discounts dvd1 dvd2 Frank Neven (Hasselt University) → → → → → regulars discounts (dvd1 )∗ dvd2 (dvd2 )∗ title price title price discount Automata and XML schema languages 27 February 2006 58 / 109 Extended Document Type Definitions XML Schema A formalization of XML Schema: single-type EDTDs Formal abstraction XML Schema ≈ single-type EDTDs Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 59 / 109 Extended Document Type Definitions XML Schema A formalization of XML Schema: single-type EDTDs Formal abstraction XML Schema ≈ single-type EDTDs Immediate Questions Can you recognize single-type EDTDs? Trivial XML Schema validator What kind of languages can be defined by single-type EDTDs? Is it decidable whether an EDTD is equivalent to a single-type EDTD? smart XML Schema validator Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 59 / 109 Extended Document Type Definitions Properties of single-type EDTDs Outline 1 Introduction to XML 2 Document Type Definitions 3 Unranked Tree Automata 4 Extended Document Type Definitions Definition XML Schema Properties of single-type EDTDs Single-type EDTDs in practice 1-pass preorder typing Relax NG 5 Decision problems for XML schema languages 6 Conclusion Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 60 / 109 Extended Document Type Definitions Properties of single-type EDTDs Validation and typing Validation and typing: Given a tree t and an EDTD D = (Σ, d, a0 ) validation: does t ∈ L(D), i.e., does there exist a typed tree t 0 ∈ L(d)? typing: compute for every b-labeled node its type bi in t 0 Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 61 / 109 Extended Document Type Definitions Properties of single-type EDTDs Single-type EDTDs: simple top-down typing Algorithm to validate and type a tree [Murata et al., 2001] Given: tree t and single-type EDTD D = (Σ, d, a0 ) 1 2 Check if root of t is labeled with a, assign type a0 for every interior node u with type bi , test whether the children of u match µ(d(bj )). If so, assign unique type to every child. Else fail. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 62 / 109 Extended Document Type Definitions Properties of single-type EDTDs Single-type EDTDs: simple top-down typing Algorithm to validate and type a tree [Murata et al., 2001] Given: tree t and single-type EDTD D = (Σ, d, a0 ) 1 2 Check if root of t is labeled with a, assign type a0 for every interior node u with type bi , test whether the children of u match µ(d(bj )). If so, assign unique type to every child. Else fail. µ(a1 + b1 c 2 ) = a + bc Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 62 / 109 Extended Document Type Definitions Properties of single-type EDTDs Single-type EDTDs: simple top-down typing Algorithm to validate and type a tree [Murata et al., 2001] Given: tree t and single-type EDTD D = (Σ, d, a0 ) 1 2 Check if root of t is labeled with a, assign type a0 for every interior node u with type bi , test whether the children of u match µ(d(bj )). If so, assign unique type to every child. Else fail. µ(a1 + b1 c 2 ) = a + bc Corollary Single-typedness implies unique top-down typing. Motivation Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 62 / 109 Extended Document Type Definitions Properties of single-type EDTDs Two-pass and ambiguous typing Example a → b1 + b2 , b1 → c, b2 → d Tree a b1 or 2? c Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 63 / 109 Extended Document Type Definitions Properties of single-type EDTDs Two-pass and ambiguous typing Example a → b1 + b2 , b1 → c, b2 → d Example a → b1 + b2 , b1 → c ∗ , b2 → d ∗ Tree a b1 or 2? Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 63 / 109 Extended Document Type Definitions Properties of single-type EDTDs Towards a characterization of single-type EDTDs The Ancestor-String a Notation anc-strt (u) = the ancestor-string of a tree t at node u Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 64 / 109 Extended Document Type Definitions Properties of single-type EDTDs Single-type EDTDs: simple top-down typing Definition An EDTD D = (Σ, d, sd ) has ancestor-based types if there is a function f : Σ∗ → ΣN such that, for each tree t ∈ L(D), t has exactly one witness t 0 ∈ L(d), and t 0 results from t by assigning to each node v the type f (anc-strt (v )). Intuition: The type of a node depends on its ancestor-string, and on nothing else Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 65 / 109 Extended Document Type Definitions Properties of single-type EDTDs Single-type EDTDs: simple top-down typing Proposition When a tree language T is definable by a single-type EDTD, then it has ancestor based types. Proof Let T be defined by the single-type EDTD D = (Σ, d, a0 ). Then define f inductively as follows: f (a) = a0 for any string w · a · b with w ∈ Σ∗ and a, b ∈ Σ, f (w · a · b) = bj where bj occurs in d(ai ) and f (w · a) = ai . Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 66 / 109 Extended Document Type Definitions Properties of single-type EDTDs An exchange property for single-type EDTDs Ancestor-Guarded Subtree Exchange T is a regular tree language Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 67 / 109 Extended Document Type Definitions Properties of single-type EDTDs An exchange property for single-type EDTDs Theorem (Martens, Nev., Schw., 2005) A regular tree language is definable by a single-type EDTD iff it is closed under ancestor-guarded subtree exchange. Proof ⇒ single-type EDTD has ancestor-based types. ⇐ Compute single-type closure D 0 of given EDTD D: E.g, a1 → b1 b2 and a2 → b3 becomes a{1} → b{1} b{2} a{2} → b{3} a{1,2} → b{1,2,3} b{1,2,3} + b{1,2,3} Obviously, L(D) ⊆ L(D 0 ). Now, L(D) ⊇ L(D 0 ) iff L(D) is closed under ancestor-guarded subtree exchange. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 68 / 109 Extended Document Type Definitions Properties of single-type EDTDs Tool for proving inexpressibility Evaluation of Boolean circuits is not single-type store dvd title price store dvd dvd title price Frank Neven (Hasselt University) discount title price Automata and XML schema languages dvd discount title price 27 February 2006 69 / 109 Extended Document Type Definitions Properties of single-type EDTDs Tool for proving inexpressibility Evaluation of Boolean circuits is not single-type store dvd title price store dvd dvd title price Frank Neven (Hasselt University) discount title price Automata and XML schema languages dvd discount title price 27 February 2006 69 / 109 Extended Document Type Definitions Properties of single-type EDTDs Tool for proving inexpressibility Evaluation of Boolean circuits is not single-type store dvd title price store dvd dvd title price discount title price dvd discount title price store dvd title Frank Neven (Hasselt University) price dvd title price Automata and XML schema languages 27 February 2006 69 / 109 Extended Document Type Definitions Properties of single-type EDTDs Single-type EDTDs are not closed under union Example D1 : a → b, b→c D2 : a → bb, a b→d a b b b c d d Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 70 / 109 Extended Document Type Definitions Properties of single-type EDTDs Single-type EDTDs are not closed under union Example D1 : a → b, b→c D2 : a → bb, a b→d a b b b c d d Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 70 / 109 Extended Document Type Definitions Properties of single-type EDTDs Single-type EDTDs are not closed under union Example D1 : a → b, b→c D2 : a → bb, a b→d a b b b c d d 6∈ L(D1 ) ∪ L(D2 ) a b d Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 70 / 109 Extended Document Type Definitions Properties of single-type EDTDs Characterization of DTDs DTDs define precisely the local tree languages Theorem (Papakonstantinou, Vianu, 2000) A regular tree language is definable by a DTD iff it is closed under subtree exchange. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 71 / 109 Extended Document Type Definitions Properties of single-type EDTDs Smart validator Theorem (Martens, Nev., Schw., 2005) Deciding whether an EDTD is equivalent to a single-type EDTD or a DTD is EXPTIME-complete. Upper bound Compute single-type closure D 0 of given EDTD D: E.g, a1 → b1 b2 and a2 → b3 becomes a{1} → b{1} b{2} a{2} → b{3} a{1,2} → b{1,2,3} b{1,2,3} + b{1,2,3} L(D 0 ) = L(D) iff L(D) is single-type. We know that L(D) ⊆ L(D 0 ). So, only need to test L(D 0 ) ⊆ L(D): D 0 ∩ ¬D = ∅. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 72 / 109 Extended Document Type Definitions Properties of single-type EDTDs Smart validator Theorem (Martens, Nev., Schw., 2005) Deciding whether an EDTD is equivalent to a single-type EDTD or a DTD is EXPTIME-complete. Lower bound For r and s arbitrary regular expressions over Σ − {b}, the EDTD a → r · b1 + s · b2 b1 → c b2 → d is equivalent to a single-type EDTD iff L(r ) = L(s) (a PSPACE-hard problem). The equivalent DTD is a → r · b, b → c + d. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 73 / 109 Extended Document Type Definitions Single-type EDTDs in practice Outline 1 Introduction to XML 2 Document Type Definitions 3 Unranked Tree Automata 4 Extended Document Type Definitions Definition XML Schema Properties of single-type EDTDs Single-type EDTDs in practice 1-pass preorder typing Relax NG 5 Decision problems for XML schema languages 6 Conclusion Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 74 / 109 Extended Document Type Definitions Single-type EDTDs in practice A practical study of XSDs XML Schema: successor of DTDs data types, referencing mechanism, modularity, XML Syntax, more expressive power Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 75 / 109 Extended Document Type Definitions Single-type EDTDs in practice A practical study of XSDs XML Schema: successor of DTDs data types, referencing mechanism, modularity, XML Syntax, more expressive power Corpus 819 XSDs from the Cover pages. 726 XSDs through Google’s web services. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 75 / 109 Extended Document Type Definitions Single-type EDTDs in practice A practical study of XSDs XML Schema: successor of DTDs data types, referencing mechanism, modularity, XML Syntax, more expressive power Corpus 819 XSDs from the Cover pages. 726 XSDs through Google’s web services. Only 225 are syntactically correct. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 75 / 109 Extended Document Type Definitions Single-type EDTDs in practice A practical study of XSDs XML Schema: successor of DTDs data types, referencing mechanism, modularity, XML Syntax, more expressive power Corpus 819 XSDs from the Cover pages. 726 XSDs through Google’s web services. Only 225 are syntactically correct. Practical XSDs are local 85% of the XSDs are structurally equivalent to a DTD: at most one type is associated to every element name. One example used types: a1 → b and a2 → b. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 75 / 109 Extended Document Type Definitions Single-type EDTDs in practice How do the 15% non-local XSDs look like? 90% of the cases, types only depend on parent context: store regulars discounts dvd1 dvd2 Frank Neven (Hasselt University) → → → → → regulars discounts (dvd1 )∗ dvd2 dvd2 (dvd2 )∗ title price title price discount Automata and XML schema languages 27 February 2006 76 / 109 Extended Document Type Definitions Single-type EDTDs in practice How do the 15% non-local XSDs look like? 90% of the cases, types only depend on parent context: store regulars discounts dvd1 dvd2 → → → → → regulars discounts (dvd1 )∗ dvd2 dvd2 (dvd2 )∗ title price title price discount Remaining 10% are of the following form: a b c d1 d2 Frank Neven1 (Hasselt University) → → → → → b+c e d1 f e d2 f g h1 i g h2 i h1 h2 j1 j2 → → → → Automata and XML schema languages j1 j2 kl mn 27 February 2006 76 / 109 Extended Document Type Definitions Single-type EDTDs in practice Why isn’t the expressiveness of XSDs used to its full extend? Two possible reasons 1 Extra non-local expressiveness is simply not needed in practice. 2 Users are not aware of the possibilities of XSDs: provide simple formalism that make types dependent on ancestors. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 77 / 109 Extended Document Type Definitions Single-type EDTDs in practice Making dependencies explicit Definition An ancestor-based DTD A is a set of rules r → s where r and s are regular expressions over Σ. Definition A tree t is valid w.r.t. A iff for every vertex v there is some r → s such that anc-strt (v ) ∈ L(r ) and the children of v match s. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 78 / 109 Extended Document Type Definitions Single-type EDTDs in practice Making dependencies explicit Theorem Ancestor-based DTDs and single-type EDTDs define the same class of tree languages. Ancestor-guarded DTDs can be used as a light-weight front-end for XML Schema Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 79 / 109 Extended Document Type Definitions Single-type EDTDs in practice Making dependencies explicit single-type EDTD store regulars discounts dvd1 dvd2 → → → → → regulars discounts (dvd1 )∗ dvd2 dvd2 (dvd2 )∗ title price title price discount Ancestor-guarded DTD store regulars discounts ∗ · regulars · dvd ∗ · discounts · dvd Frank Neven (Hasselt University) → → → ⇒ ⇒ regulars discounts dvd∗ dvd dvd dvd∗ title price title price discount Automata and XML schema languages 27 February 2006 80 / 109 Extended Document Type Definitions Single-type EDTDs in practice Making dependencies explicit single-type EDTD a b c d1 d2 → → → → → b+c e d1 f e d2 f g h1 i g h2 i h1 h2 j1 j2 → → → → j1 j2 kl mn Ancestor-guarded DTD a b c d → → → → Frank Neven (Hasselt University) b+c ed f ed f ghi h → j ∗·b·∗·j ⇒ kl ∗·c ·∗·j ⇒ mn Automata and XML schema languages 27 February 2006 81 / 109 Extended Document Type Definitions 1-pass preorder typing Outline 1 Introduction to XML 2 Document Type Definitions 3 Unranked Tree Automata 4 Extended Document Type Definitions Definition XML Schema Properties of single-type EDTDs Single-type EDTDs in practice 1-pass preorder typing Relax NG 5 Decision problems for XML schema languages 6 Conclusion Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 82 / 109 Extended Document Type Definitions 1-pass preorder typing 1-Pass preorder typing <store><regulars><dvd> <title>Amelie</title> <price>17</price> </dvd></regulars> <discounts>... Streaming XML as an unparsed sequence of start and stop tags (SAX). XML stream validation XPath routing XML stream typing XML stream XML stream XML stream Typing as the first operator in a pipeline Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 84 / 109 Extended Document Type Definitions 1-pass preorder typing 1-Pass Preorder Typing versus single-type EDTDs Observations Streaming (preorder) typing is not possible for every EDTD: a → b1 + b2 b1 → c b2 → d a b c Every single-type EDTD is preorder typable: type of child depends only on type of parent Single-type EDTDs are not the largest class which is preorder typeable: a a → b1 b2 c b b b1 → c 2 b →d c d Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 85 / 109 Extended Document Type Definitions 1-pass preorder typing Restrained Competition EDTDs: left-to-right unique typing Definition (Murata, Lee, Mani, 2001) A regular expression r restrains competition if there are no strings wai v and waj v 0 in L(r ) with i 6= j. An EDTD is restrained competition iff all regular expressions occurring in rules restrain competition. Not restrained-competition store → (dvd1 + dvd2 )∗ dvd2 (dvd1 + dvd2 )∗ dvd2 (dvd1 + dvd2 )∗ 1 dvd → title price 2 dvd → title price discount dvd1 dvd2 dvd2 dvd2 dvd2 dvd2 Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 86 / 109 Extended Document Type Definitions 1-pass preorder typing Restrained Competition EDTDs Definition (Murata, Lee, Mani, 2001) A regular expression r restrains competition if there are no strings wai v and waj v 0 in L(r ) with i 6= j. An EDTD is restrained competition iff all regular expressions occurring in rules restrain competition. Restrained-competition store discounts dvd1 dvd2 Frank Neven (Hasselt University) → → → → (dvd1 )∗ discounts dvd2 dvd2 (dvd2 )∗ ε title price title price discount Automata and XML schema languages 27 February 2006 87 / 109 Extended Document Type Definitions 1-pass preorder typing Towards characterizations of 1-pass preorder typing The ancestor-sibling string a Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 88 / 109 Extended Document Type Definitions 1-pass preorder typing Towards characterizations of 1-pass preorder typing Theorem (Martens, Nev., Schw., 2005) For a regular tree language T , the following are equivalent T is 1-pass preorder typable T is definable by a restrained-competition EDTD T is closed under ancestor-sibling-guarded subtree exchange T is definable by an ancestor-sibling-based DTD Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 89 / 109 Extended Document Type Definitions 1-pass preorder typing Summary slide What to remember? Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 90 / 109 Extended Document Type Definitions 1-pass preorder typing Summary slide What to remember? DTD ≈ extended context-free grammars Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 90 / 109 Extended Document Type Definitions 1-pass preorder typing Summary slide What to remember? DTD ≈ extended context-free grammars XML Schema ≈ single-type EDTDs Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 90 / 109 Extended Document Type Definitions 1-pass preorder typing Summary slide What to remember? DTD ≈ extended context-free grammars XML Schema ≈ single-type EDTDs XML Schema is much closer to DTDs than to tree automata Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 90 / 109 Extended Document Type Definitions 1-pass preorder typing Summary slide What to remember? DTD ≈ extended context-free grammars XML Schema ≈ single-type EDTDs XML Schema is much closer to DTDs than to tree automata single-typedness is not the most liberal restriction to get unique top-down (1-pass) typing: restrained-competition EDTDs. actually, determinism constraint alone already implies 1-pass typing Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 90 / 109 Extended Document Type Definitions Relax NG Outline 1 Introduction to XML 2 Document Type Definitions 3 Unranked Tree Automata 4 Extended Document Type Definitions Definition XML Schema Properties of single-type EDTDs Single-type EDTDs in practice 1-pass preorder typing Relax NG 5 Decision problems for XML schema languages 6 Conclusion Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 91 / 109 Extended Document Type Definitions Relax NG Relax NG James Clark and Makoto Murata [2001] based on RELAX (Regular Language description for XML) and TREX (Tree Regular Expressions for XML) Clean specification: 40 pages, XML Schema: 170 pages O’Reilly book by Eric Van der Vlist Motivated by unranked regular tree languages. Very similar to extended DTDs. Closed under Boolean operations. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 92 / 109 Extended Document Type Definitions Relax NG Relax NG: abbreviated syntax store = element store { (dvd1 | dvd2)*, dvd2, (dvd1 | dvd2)* } dvd1 = element dvd { element title { xsd:NCName }, element price { xsd:integer } } dvd2 = element dvd { element title { xsd:NCName }, element price { xsd:integer }, element discount { xsd:integer } } EDTD store → (dvd1 + dvd2 )∗ dvd2 (dvd1 + dvd2 )∗ dvd1 → title price dvd2 → title price discount Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 94 / 109 Extended Document Type Definitions Relax NG Relax NG: XML syntax <define name="store"> <element name="store"> <zeroOrMore> <choice> <ref name="dvd1"/> <ref name="dvd2"/> </choice> </zeroOrMore> <ref name="dvd2"/> <zeroOrMore> <choice> <ref name="dvd1"/> <ref name="dvd2"/> </choice> </zeroOrMore> </element> </define> Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 96 / 109 Decision problems for XML schema languages Outline 1 Introduction to XML 2 Document Type Definitions 3 Unranked Tree Automata 4 Extended Document Type Definitions Definition XML Schema Properties of single-type EDTDs Single-type EDTDs in practice 1-pass preorder typing Relax NG 5 Decision problems for XML schema languages 6 Conclusion Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 97 / 109 Decision problems for XML schema languages Complexity of basic decision problems Schema CONTAINMENT (⊆) Given: Schema’s d1 , d2 Question: Is L(d1 ) ⊆ L(d2 )? Schema EQUIVALENCE (=) Given: Schema’s d1 , d2 Question: Is L(d1 ) = L(d2 )? Schema intersection (∩) Given: Schema’s T d1 , . . . , dn Question: Is ni=1 L(di ) = ∅? Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 98 / 109 Decision problems for XML schema languages Complexity of basic decision problems Theorem (Seidl 1990, 1994) CONTAINMENT, EQUIVALENCE, and INTERSECTION are EXPTIME-complete for EDTDs and NTA(NFA)s. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 99 / 109 Decision problems for XML schema languages Complexity of basic decision problems Theorem (Seidl 1990, 1994) CONTAINMENT, EQUIVALENCE, and INTERSECTION are EXPTIME-complete for EDTDs and NTA(NFA)s. Proposition Let R be a class of regular expressions and C a complexity class. Then the following are equivalent: CONTAINMENT for R is in C; CONTAINMENT for DTD(R) is in C; CONTAINMENT for single-type EDTD(R) is in C; CONTAINMENT for restrained-competition EDTD(R) is in C. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 99 / 109 Decision problems for XML schema languages Complexity of basic decision problems Proposition Let R be a class of regular expressions and C a complexity class. Then the following are equivalent: INTERSECTION for R is in C; INTERSECTION for DTD(R) is in C. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 100 / 109 Decision problems for XML schema languages Complexity of basic decision problems Proposition Let R be a class of regular expressions and C a complexity class. Then the following are equivalent: INTERSECTION for R is in C; INTERSECTION for DTD(R) is in C. Theorem (Martens, Nev., Schw. 2005) There is a class of regular expressions X such that INTERSECTION for X is NP-complete; INTERSECTION for single-type EDTD(X ) is EXPTIME-complete. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 100 / 109 Decision problems for XML schema languages Complexity of regular expressions Basic decision problems of regular expressions carry over to schema languages Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 101 / 109 Decision problems for XML schema languages Complexity of regular expressions Basic decision problems of regular expressions carry over to schema languages Problem has been studied in depth (Hunt III et al., Kozen, Meyer and Stockmeyer, . . . ) Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 101 / 109 Decision problems for XML schema languages Complexity of regular expressions Basic decision problems of regular expressions carry over to schema languages Problem has been studied in depth (Hunt III et al., Kozen, Meyer and Stockmeyer, . . . ) more than ninety percent of the regular expressions occurring in practical DTDs and XSDs are Chain Regular Expressions (CHAREs). (Bex et al. 2004) Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 101 / 109 Decision problems for XML schema languages Complexity of regular expressions Definition A base symbol is a regular expression s, s∗ , s+ , or s?, where s is a non-empty string; a factor is of the form e, e∗ , e+ , or e? where e is a disjunction of base symbols. A chain regular expression (CHARE) is ∅, ε, or a sequence of factors. Example ((abc)∗ + b∗ )(a + b)?(ab)+ (ac + b)∗ is a CHARE (a + b) + (a∗ b∗ ) is not a CHARE. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 102 / 109 Decision problems for XML schema languages Chain Regular Expressions (CHAREs) Abbreviations Factor (a1 + · · · + an ) (a1 + · · · + an )∗ (a1 + · · · + an )+ (a1 + · · · + an )? (a1∗ + · · · + an∗ ) (a1+ + · · · + an+ ) Frank Neven (Hasselt University) Abbr. (+a) (+a)∗ (+a)+ (+a)? (+a∗ ) (+a+ ) Factor (w1 + · · · + wn ) (w1 + · · · + wn )∗ (w1 + · · · + wn )+ (w1 + · · · + wn )? (w1∗ + · · · + wn∗ ) (w1+ + · · · + wn+ ) Automata and XML schema languages Abbr. (+w) (+w)∗ (+w)+ (+w)? (+w ∗ ) (+w + ) 27 February 2006 103 / 109 Decision problems for XML schema languages Complexity of CHAREs Known results CONTAINMENT for RE(a?, (+a)∗ ) is in PTIME [Abdulla, Bouajjani, Jonsson 1998] CONTAINMENT for RE(a, Σ, Σ∗ ) is in PTIME [Milo, Suciu 1999] INTERSECTION for RE((+w)∗ ) is PSPACE-hard [Bala 2002] Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 104 / 109 Decision problems for XML schema languages Complexity of CHAREs [Martens, Nev., Schw. 2004] RE-fragment a, a+ a, a∗ a, a? CHAREs − {(+a)∗ , (+w)∗ , (+a)+ , (+w)+ } a, (+a)∗ CHAREs − {(+w)∗ , (+w)+ } a, (+w)∗ CHAREs RE≤k (k ≥ 3) deterministic Frank Neven (Hasselt University) Inclusion in PTIME (DFA!) coNP coNP Equivalence in PTIME in PTIME in PTIME Intersection in PTIME NP NP coNP in coNP NP PSPACE PSPACE PSPACE PSPACE in PTIME in PTIME in PSPACE in PSPACE in PSPACE in PSPACE in PTIME in PTIME NP NP PSPACE PSPACE PSPACE PSPACE Automata and XML schema languages 27 February 2006 105 / 109 Decision problems for XML schema languages Equivalence of a, a∗ is in PTIME Put expression in sequence normal form. E.g., aaa∗ bb∗ cccc ∗ becomes a≥2 b≥1 c ≥3 . Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 106 / 109 Decision problems for XML schema languages Equivalence of a, a∗ is in PTIME Put expression in sequence normal form. E.g., aaa∗ bb∗ cccc ∗ becomes a≥2 b≥1 c ≥3 . There are equivalent expressions with a different sequence normal form: a≥i b∗ a∗ b≥1 a≥j Frank Neven (Hasselt University) = Automata and XML schema languages a≥i b≥1 a∗ b∗ a≥j 27 February 2006 106 / 109 Decision problems for XML schema languages Equivalence of a, a∗ is in PTIME Put expression in sequence normal form. E.g., aaa∗ bb∗ cccc ∗ becomes a≥2 b≥1 c ≥3 . There are equivalent expressions with a different sequence normal form: a≥i b∗ a∗ b≥1 a≥j = a≥i b≥1 a∗ b∗ a≥j Good news: this is the only exception. Non-trivial proof. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 106 / 109 Decision problems for XML schema languages Equivalence of a, a∗ is in PTIME Put expression in sequence normal form. E.g., aaa∗ bb∗ cccc ∗ becomes a≥2 b≥1 c ≥3 . There are equivalent expressions with a different sequence normal form: a≥i b∗ a∗ b≥1 a≥j = a≥i b≥1 a∗ b∗ a≥j Good news: this is the only exception. Non-trivial proof. Conjecture: equivalence is tractable for much larger fragments Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 106 / 109 Decision problems for XML schema languages Summary slide What to remember? Decision problems for XML Schema translate to decision problems for regular expressions. Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 107 / 109 Decision problems for XML schema languages Summary slide What to remember? Decision problems for XML Schema translate to decision problems for regular expressions. Question What is the largest class of regular expressions for which equivalence is in PTIME? Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 107 / 109 Conclusion Outline 1 Introduction to XML 2 Document Type Definitions 3 Unranked Tree Automata 4 Extended Document Type Definitions Definition XML Schema Properties of single-type EDTDs Single-type EDTDs in practice 1-pass preorder typing Relax NG 5 Decision problems for XML schema languages 6 Conclusion Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 108 / 109 Conclusion Conclusion DTDs are almost extended context-free grammars Unranked tree automata are a robust class – questions remain XML Schema is closer to DTDs than to tree automata XML (schema) research is a good excuse to do theory Frank Neven (Hasselt University) Automata and XML schema languages 27 February 2006 109 / 109