Compilerbau mit Java
Transcription
Compilerbau mit Java
Compilerbau mit Java Java User Group, 22.11.07 Jens Bendisposto JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto References • Étienne Gagnon, SableCC, An Object-Oriented Compiler Framework,1998 , Masterthesis, McGill University, Montreal, Canada. Available from the SableCC Website: http://www.sablecc.org • Nat Pryce, Concrete to Abstract Syntax Transformations with SableCC, 2005. Available on: http://nat.truemesh.com/archives/000531.html • Aho, Lam, Sethi, Ullman, Compiler - Prinzipien, Techniken und Werkzeuge – 17.1.2008, ISBN: 978-3-8273-7097-6 (german) – german edition contains an additional chapter on SableCC JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Why should I care about compilers? “Most programmers have written a compiler, and most of them didn’t notice.” JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Why should I care about compilers? Motivating example: Commandline Calculator Possible inputs: 3+1, 2*9, 4+3*8, 2*(2+3), ... $ java calc 3*(2+3) 15 $ How can we implement it? Think about difficult expressions: (2)*(3+((7*(9)))+(15)) This task is quite easy using compiler techniques! We will write the calculator during this talk. JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Other examples • • • • • • • Markup language parser (XML, HTML, RSS, …) URI Parser Logfile parser Text-Crawler/ Text-Indexing / Search engines (nested) configuration files Domain specific languages (Scripting languages) Programming language compilers Compiler technologies are part of the programmer‘s toolbox! JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Anatomy of a compiler (attributed) Abstract Syntax Tree source language Front-End (Analysis) Back-End (Synthesis) Lexical Syntax Semantic Code Generator JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto machine language Anatomy of a compiler II Focus of this talk Typechecking, … JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto What is SableCC ? JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto What is SableCC? • Generator for parser frameworks • Takes a language description • Generates a set of Java classes for the language: – – – – Lexer Parser Typed Syntax Tree Basic Analysis tools / Tree Walker Classes • Uses OO Principles – Good design is more important than performance – Compilers should be maintainable – Analysis tools keep their own data • Bottom-Up Parsing – Grammar is intuitive (left recursive) – Efficient Parsers (LALR) JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Lexical Analysis Converting a sequence of characters into a sequence of tokens JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Lexing • Given a sourcecode, we use a lexer to generate a token stream • Using tokens instead of character streams is usefull: – We want to deal with numbers instead of digits – We want to deal with identifiers instead of x,y and z – Whitespaces or comments can appear everywhere in the code, we do not want to deal with this during parsing Thisissometextwithoutspacesandpunctuationmarkswhichist hereforequitedifficulttoreadbyhumanslexicalanalysiswillbre akthistextupintowords Lexer This is some text without spaces and punctuation marks which is therefore quite difficult to read by humans lexical analysis will break this text up into words JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Real world lexing if (n == 0) then return 0 else //recursive case return n*f(n-1) Lexer if lpar id equals num rpar then return num else comment return id mult id lpar id minus num rpar JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Lexing for Dummies • Lexers use regular expressions • A regular expression can be translated into a finite automaton (DFA) • DFAs can be combined • The combined DFA is used to recognize tokens • The very good news: – The algorithm is linear in the length of the input – We do not have to write the automatons by hand JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Writing a Lexer with SableCC Sections in the input file 1. Helpers – Character-Classes small_letters = [‘a’ .. ‘z’]; hex_digits_zero = [[‘a’..’f’] + [‘1’..’9’]]; hex_digits = [[’0’] + hex_digits_zero]; some_capitals = [[‘A’ .. ‘Z’] - [‘I’..’K’]]; linebreak = 10; also possible: linebreak = 0x000a or linebreak = ‘\n’ – Regular expressions my_string = ‘keyword’; hexnumber = (‘0x’ | ‘0X’)? (hex_digits_zero hex_digits*); identifier = some_capitals +; – Right hand side can use previously defined Helpers – Cannot be used in the parser 2. Tokens – Regular expressions – Right hand side might use Helpers, but not Token – Used in the parser JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Translation into Java-Types • One class for each token • Name of the class derived from token’s name: – – – – Prepend a T Capitalize the first letter Capitalize each letter after underscore Remove each Underscore • Exception: EOF (implicit defined token) Example: small_letters becomes TSmallLetters JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Lexer with SableCC SableCC generates a Lexer for us Helpers first_digit = ['1' .. '9'] ; digit = ['0' + first_digit ] ; Tokens white_space = (' ' | '\n' | '\t')*; number = '0' | first_digit digit*; add = '+'; mult = '*'; l_par = '('; r_par = ')'; DEMO Lexer_Example JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Lexer States • Combine the Lexer with an additional DFA • Recognize the same string as different token in different context States normal, comment; Tokens {normal->comment} comment_start = '**'; {comment->normal} comment_end= '**'; {normal} keyword = 'do'; {comment} string = [0 .. 0xffff]*; {normal} white_space = ' ' | '\n' | '\t'; • do ** do ** • TKeyword TWhiteSpace TCommentStart TString TCommentEnd DEMO LexerState_Example LexerStateBug_Example JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Customized Lexers • SableCC generates a Lexer-Class • Use Inheritance for Customization – Lexer calls method filter()when recognizing a token – Override this method to modify the token stream • Can be used for nested comments – Comment Start Token → Increment counter – Comment End Token → Decrement counter – if counter > 0: state = comment – otherwise: state = normal • Could screw up your compiler, should be handled with care JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Syntactical Analysis Analyzing a sequence of tokens to determine its grammatical structure JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Theoretical background Sorry folks, we have to deal with theory! But we’ll try to keep it nice and simple! Absolutely no proofs! 1. What is an alphabet? A finite set of symbols. For instance {0,1,2,id, for, do … } The alphabet is the set of tokens recognized by the lexer. 2. What is a (computer) language? A set of sequences over an alphabet. 3. What is a grammar? • • • The alphabet (terminal symbols) A disjoint set (nonterminal symbols) Replacement Rules A → BC means: We can replace A by BC A sequence of terminal symbols belongs to a language iff we can create the string by iteratively applying replacement rules to a specific start symbol. JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Example • Alphabet: {a,b} • Language: all strings starting with a’s (at least one) followed by the same number of b’s (e.g. ab,aabb,aaabbb,…) • Nonterminals: {S} • Start symbol: S • Rules to produce all strings of the language: S → ab | aSb • Let’s check if this is correct: – – – – S S S S → → → → ab aSb → aabb aSb → aaSbb → aaabbb aSb → aaSbb → aaaSbbb → … • It seems to be correct (and this is good enough for us!) JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto The Chomsky Hierarchy Hierarchical Classification of languages Type 0 Recursively enumerable α→β undecidable Type 1 Context sensitive αAβ →αγβ exponential Type 2 Context free A →γ O(n3) Type 3 Regular A →a O(n) A →aB α,β,γ: Sequences of terminal and non terminal symbols a: terminal symbol A,B: non terminal symbol Note: The language from the previous example cannot be expressed using a type 3 grammar. Regular expressions are not expressive enough for compilers. JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto The real world • • • • • Lexing: regular expressions Parsing: context free grammar Real programming languages are usually context sensitive We over-approximate using a CFG grammar And do some semantic checks after parsing (e.g. type checking) int i := 9; Integer i = new File(“peng“); Parser NO! Parser YES! Typechecker NO! JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Parsing • The output of a parser is a syntax tree • Properties of a syntax tree: – The root node represents the start symbol of the grammar – The leafs represent the token – Each symbol on the right hand side of a production rule is represented by a child node • Example Grammar: S → ab | aSb Input: aaabbb JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Parsing How do we obtain a syntax tree from a token stream? 1. Top-Down Parsing • • Start with “start symbol” Try to derive the string 2. Bottom-Up Parsing • • • Start from the string Apply productions in reverse order Try to reach the “start symbol” Sounds simple? JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Top-Down Parsing • Usually LL(k), left to right, leftmost derivation, k token lookahead • Many tools available (javacc, ANTLR, coco/R) • Either table based (javacc, coco/R) • Or Recursive descent (ANTLR, hand-written) • Hand-written Parser (one method per non terminal) S → if E then S else S S → begin S L S → print E • Easy to understand • No Left-Recursion • Left factoring often needed • LL(K) Grammars do not feel natural JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto X → aB X → aC Left factoring X → aX’ X’→ B X’→ C Bottom-Up Parsing • • • • • LALR, lookahead, left to right, leftmost derivation Left recursion is allowed No need for left factoring LR Grammars feel more natural Ambiguous grammars are even more natural • It is almost impossible to hand-write a LR Parser • It is even hard to understand how the LR parsing table is built • But if we accept a little bit of magic (SableCC) • LR Parsing is easy to understand JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example • A simple grammar E →E ∧ B | E ∨ B | B B → false | true • Some strings of the language: true, true∧false, true∧true∨false • Using some “magic” we create a LR parsing table: Action State Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E B E 5 1 2 7 6 1 2 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E • Now we try to parse: true ∨ true JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Read true in state 0 Action: Shift 2 Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: true ∨ true $ Stack: 0 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: true ∨ true $ Stack: 0,2 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Read ∨ in state 2 Action: Reduce Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: true ∨ true $ Stack: 0,2 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: B ∨ true $ Stack: 0 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Read B in state 0 Action: Shift 4 Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: B ∨ true $ Stack: 0 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: B ∨ true $ Stack: 0,4 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Read ∨ in state 4 Action: Reduce Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: B ∨ true $ Stack: 0,4 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: E ∨ true $ Stack: 0 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Read E in state 0 Action: Shift 3 Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: E ∨ true $ Stack: 0 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: E ∨ true $ Stack: 0,3 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Read ∨ in state 3 Action: Shift 6 Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: E ∨ true $ Stack: 0,3 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: E ∨ true $ Stack: 0,3,6 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Read true in state 6 Action: Shift 2 Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: E ∨ true $ Stack: 0,3,6 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: E ∨ true $ Stack: 0,3,6,2 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Read $ in state 2 Action: Reduce Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: E ∨ true $ Stack: 0,3,6,2 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: E ∨ B $ Stack: 0,3,6 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Read B in state 6 Action: Shift 8 Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: E ∨ B $ Stack: 0,3,6 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: E ∨ B $ Stack: 0,3,6,8 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Read $ in state 8 Action: Reduce (!!!) Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: E ∨ B $ Stack: 0,3,6,8 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: E $ Stack: 0 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Read E in state 0 Action: Shift 3 Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: E $ Stack: 0 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: E $ Stack: 0,3 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Bottom-Up Parsing - Example Action State Read $ in state 3 Action: Accept! Goto false true 1 2 0 $ 1 false B false B false B false B false B 2 true B true B true B true B true B 3 5 4 B 6 E B E B 3 4 Accept! E B E B E 5 1 2 6 1 2 B E 7 8 7 E B E E B E E B E E B E E B E 8 E B E E B E E B E E B E E B E Input: E $ Stack: 0,3 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Parser with SableCC Sections in the input file 1. Ignored Tokens 2. Productions 3. Abstract Syntax Tree Ignored Tokens – – – – Remove certain token while processing the token stream Lexer produces the tokens Parser ignores them Extremly usefull for whitespaces and comments JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Productions in SableCC • One rule per non terminal • Alternatives have to be named (max. one unnamed alternative) expr = {add} expr + term | term • If we use the same symbol multiple times, we need to name them (max. one unnamed occurrence) tuple = [first]:value comma [second]:value comma • Operators *, + and ? can be used coordinates = [coordinates]:coordinate*; coordinate = {two_dimensional} l_par [first]:number comma [second]:number r_par | {three_dimensional} l_par [first]:number [first_comma]:comma [second]:number [second_comma]:comma [third]:number r_par; JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Translation into Java-Types • • • • • • • All nodes inherit from Node For each non terminal symbol one abstract supertype An implicit production rule S → P <EOF> For each alternative one syntax tree node Syntax tree nodes have methods to get child nodes If we use * or + SableCC will use Lists ACoordinates has the method List<PCoordinate> getCoordinates() JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto SableCC-Example • We need to take care about operator precedence • A possible grammar for the expressions: expr → expr + term | term term → term * factor | factor factor → (expr) | number • The translation to SableCC is straightforward DEMO Expression_parser1 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto CST vs. AST E→ E + T | T T→T*F|F F → (E) | digit Abstraction Input: (2+2)*5 E → E + E | E * E | digit • • • • • Abstract syntax tree contains the same information But the grammar is ambiguous Cannot be used for efficient parsers SableCC allows to parse the concrete grammar while creating an AST Unfortunately the syntax is quite ugly (will probably change in next release) JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Abstract Syntax Tree • • • • • We need a second (abstract) grammar And the productions need some transformation annotations Annotations look like { -> … } Left hand side: mapping of CST node to AST node Right hand side: – Creation of new nodes for the AST { -> New ast_type.ast_alternative(concrete_symbol.abstract_type, …) } – No creation of an AST node, just forward a symbol { -> concrete_symbol.abstract_type } DEMO expression_parser2 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto List transformation • Transforming lists is a little bit tricky • Type (LHS) is expr* • List construction on RHS: [a] / [a,L] Abstraction E→ d | f ( L ) L→E|E,L DEMO list_transformation JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto E → d | E* Analysis Tools • Now we have an abstract syntax tree, what’s next? • All AST classes contain apply methods public void apply(Switch sw) { ((Analysis) sw).caseStart(this); } • • • • Analysis is an interface generated by SableCC Using Switch and casting to Analysis is IMHO bullshit AnalysisAdapter implements Analysis (empty defaults) DepthFirstAdapter, ReversedDepthFirstAdapter are very usefull • Tree walk in defined order • In and out methods for each type of node DEMO Expression_parser3 JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto Conflicts Sometimes even SableCC causes pain JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto LR Parsing (Shift-Reduce) Grammar: E → Num | (E) | E + E | E * E Input: Num + Num * Num Stack Input Action Num + Num * Num Shift Num + Num * Num Reduce E + Num * Num Shift E+ Num * Num Shift E + Num * Num Reduce E+E * Num Shift (ambiguous) E+E* Num Shift E + E * Num Reduce E+E*E Reduce E+E Reduce E Accept JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto The magic of LALR Parsing • SableCC does magic for us • Unless our grammar produces conflicts – Shift/Shift – Shift/Reduce – Reduce/Reduce • We need to fix the grammar DEMO conflicts JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto