Compilerbau mit Java

Transcription

Compilerbau mit Java
Compilerbau mit Java
Java User Group, 22.11.07
Jens Bendisposto
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
References
• Étienne Gagnon, SableCC, An Object-Oriented Compiler
Framework,1998 , Masterthesis, McGill University, Montreal,
Canada.
Available from the SableCC Website: http://www.sablecc.org
• Nat Pryce, Concrete to Abstract Syntax Transformations with
SableCC, 2005.
Available on: http://nat.truemesh.com/archives/000531.html
• Aho, Lam, Sethi, Ullman, Compiler - Prinzipien, Techniken und
Werkzeuge
– 17.1.2008, ISBN: 978-3-8273-7097-6 (german)
– german edition contains an additional chapter on SableCC
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Why should I care about compilers?
“Most programmers have written a compiler, and
most of them didn’t notice.”
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Why should I care about compilers?
Motivating example: Commandline Calculator
Possible inputs: 3+1, 2*9, 4+3*8, 2*(2+3), ...
$ java calc 3*(2+3)
15
$
How can we implement it?
Think about difficult expressions: (2)*(3+((7*(9)))+(15))
This task is quite easy using compiler techniques!
We will write the calculator during this talk.
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Other examples
•
•
•
•
•
•
•
Markup language parser (XML, HTML, RSS, …)
URI Parser
Logfile parser
Text-Crawler/ Text-Indexing / Search engines
(nested) configuration files
Domain specific languages (Scripting languages)
Programming language compilers
Compiler technologies are part of the
programmer‘s toolbox!
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Anatomy of a compiler
(attributed)
Abstract
Syntax Tree
source
language
Front-End
(Analysis)
Back-End
(Synthesis)
Lexical
Syntax
Semantic
Code
Generator
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
machine
language
Anatomy of a compiler II
Focus of this talk
Typechecking, …
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
What is SableCC ?
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
What is SableCC?
• Generator for parser frameworks
• Takes a language description
• Generates a set of Java classes for the language:
–
–
–
–
Lexer
Parser
Typed Syntax Tree
Basic Analysis tools / Tree Walker Classes
• Uses OO Principles
– Good design is more important than performance
– Compilers should be maintainable
– Analysis tools keep their own data
• Bottom-Up Parsing
– Grammar is intuitive (left recursive)
– Efficient Parsers (LALR)
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Lexical Analysis
Converting a sequence of characters
into a sequence of tokens
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Lexing
• Given a sourcecode, we use a lexer to generate a token stream
• Using tokens instead of character streams is usefull:
– We want to deal with numbers instead of digits
– We want to deal with identifiers instead of x,y and z
– Whitespaces or comments can appear everywhere in the code, we do
not want to deal with this during parsing
Thisissometextwithoutspacesandpunctuationmarkswhichist
hereforequitedifficulttoreadbyhumanslexicalanalysiswillbre
akthistextupintowords
Lexer
This is some text without spaces and punctuation marks which is
therefore quite difficult to read by humans lexical analysis will break
this text up into words
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Real world lexing
if (n == 0) then return 0
else //recursive case
return n*f(n-1)
Lexer
if lpar id equals num rpar then return num
else comment return id mult id lpar id minus
num rpar
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Lexing for Dummies
• Lexers use regular expressions
• A regular expression can be translated into a finite
automaton (DFA)
• DFAs can be combined
• The combined DFA is used to recognize tokens
• The very good news:
– The algorithm is linear in the length of the input
– We do not have to write the automatons by hand
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Writing a Lexer with SableCC
Sections in the input file
1. Helpers
– Character-Classes
small_letters = [‘a’ .. ‘z’];
hex_digits_zero = [[‘a’..’f’] + [‘1’..’9’]];
hex_digits = [[’0’] + hex_digits_zero];
some_capitals = [[‘A’ .. ‘Z’] - [‘I’..’K’]];
linebreak = 10;
also possible: linebreak = 0x000a or linebreak = ‘\n’
– Regular expressions
my_string = ‘keyword’;
hexnumber = (‘0x’ | ‘0X’)? (hex_digits_zero hex_digits*);
identifier = some_capitals +;
– Right hand side can use previously defined Helpers
– Cannot be used in the parser
2. Tokens
– Regular expressions
– Right hand side might use Helpers, but not Token
– Used in the parser
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Translation into Java-Types
• One class for each token
• Name of the class derived from token’s name:
–
–
–
–
Prepend a T
Capitalize the first letter
Capitalize each letter after underscore
Remove each Underscore
• Exception: EOF (implicit defined token)
Example: small_letters becomes TSmallLetters
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Lexer with SableCC
SableCC generates a Lexer for us
Helpers
first_digit = ['1' .. '9'] ;
digit = ['0' + first_digit ] ;
Tokens
white_space = (' ' | '\n' | '\t')*;
number = '0' | first_digit digit*;
add = '+';
mult = '*';
l_par = '(';
r_par = ')';
DEMO
Lexer_Example
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Lexer States
• Combine the Lexer with an additional DFA
• Recognize the same string as different token in different context
States
normal, comment;
Tokens
{normal->comment} comment_start = '**';
{comment->normal} comment_end= '**';
{normal} keyword = 'do';
{comment} string = [0 .. 0xffff]*;
{normal} white_space = ' ' | '\n' | '\t';
• do ** do **
• TKeyword TWhiteSpace TCommentStart TString TCommentEnd
DEMO
LexerState_Example
LexerStateBug_Example
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Customized Lexers
• SableCC generates a Lexer-Class
• Use Inheritance for Customization
– Lexer calls method filter()when recognizing a token
– Override this method to modify the token stream
• Can be used for nested comments
– Comment Start Token → Increment counter
– Comment End Token → Decrement counter
– if counter > 0: state = comment
– otherwise: state = normal
• Could screw up your compiler, should be handled with care
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Syntactical Analysis
Analyzing a sequence of tokens to determine its
grammatical structure
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Theoretical background
Sorry folks, we have to deal with theory!
But we’ll try to keep it nice and simple!
Absolutely no proofs!
1. What is an alphabet?
A finite set of symbols. For instance {0,1,2,id, for, do … }
The alphabet is the set of tokens recognized by the lexer.
2. What is a (computer) language?
A set of sequences over an alphabet.
3. What is a grammar?
•
•
•
The alphabet (terminal symbols)
A disjoint set (nonterminal symbols)
Replacement Rules
A → BC means: We can replace A by BC
A sequence of terminal symbols belongs to a language iff we can create the
string by iteratively applying replacement rules to a specific start symbol.
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Example
• Alphabet: {a,b}
• Language: all strings starting with a’s (at least one) followed
by the same number of b’s (e.g. ab,aabb,aaabbb,…)
• Nonterminals: {S}
• Start symbol: S
• Rules to produce all strings of the language: S → ab | aSb
• Let’s check if this is correct:
–
–
–
–
S
S
S
S
→
→
→
→
ab
aSb → aabb
aSb → aaSbb → aaabbb
aSb → aaSbb → aaaSbbb → …
• It seems to be correct (and this is good enough for us!)
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
The Chomsky Hierarchy
Hierarchical Classification of languages
Type 0
Recursively enumerable α→β
undecidable
Type 1
Context sensitive
αAβ →αγβ
exponential
Type 2
Context free
A →γ
O(n3)
Type 3
Regular
A →a
O(n)
A →aB
α,β,γ: Sequences of terminal and non terminal symbols
a: terminal symbol
A,B: non terminal symbol
Note: The language from the previous example cannot be expressed
using a type 3 grammar.
Regular expressions are not expressive enough for compilers.
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
The real world
•
•
•
•
•
Lexing: regular expressions
Parsing: context free grammar
Real programming languages are usually context sensitive
We over-approximate using a CFG grammar
And do some semantic checks after parsing
(e.g. type checking)
int i := 9;
Integer i = new File(“peng“);
Parser
NO!
Parser
YES!
Typechecker
NO!
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Parsing
• The output of a parser is a syntax tree
• Properties of a syntax tree:
– The root node represents the start symbol of the grammar
– The leafs represent the token
– Each symbol on the right hand side of a production rule is
represented by a child node
• Example
Grammar:
S → ab | aSb
Input:
aaabbb
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Parsing
How do we obtain a syntax tree from a token stream?
1. Top-Down Parsing
•
•
Start with “start symbol”
Try to derive the string
2. Bottom-Up Parsing
•
•
•
Start from the string
Apply productions in reverse order
Try to reach the “start symbol”
Sounds simple?
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Top-Down Parsing
• Usually LL(k), left to right, leftmost derivation, k token lookahead
• Many tools available (javacc, ANTLR, coco/R)
• Either table based (javacc, coco/R)
• Or Recursive descent (ANTLR, hand-written)
• Hand-written Parser (one method per non terminal)
S → if E then S else S
S → begin S L
S → print E
• Easy to understand
• No Left-Recursion
• Left factoring often needed
• LL(K) Grammars do not feel natural
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
X → aB
X → aC
Left factoring
X → aX’
X’→ B
X’→ C
Bottom-Up Parsing
•
•
•
•
•
LALR, lookahead, left to right, leftmost derivation
Left recursion is allowed
No need for left factoring
LR Grammars feel more natural
Ambiguous grammars are even more natural
• It is almost impossible to hand-write a LR Parser
• It is even hard to understand how the LR parsing table is
built
• But if we accept a little bit of magic (SableCC)
• LR Parsing is easy to understand
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
• A simple grammar
E →E ∧ B | E ∨ B | B
B → false | true
• Some strings of the language: true, true∧false, true∧true∨false
• Using some “magic” we create a LR parsing table:
Action
State
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
B
E
5
1
2
7
6
1
2
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
• Now we try to parse: true ∨ true
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Read true in state 0
Action: Shift 2
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: true ∨ true $
Stack: 0
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: true ∨ true $
Stack: 0,2
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Read ∨ in state 2
Action: Reduce
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: true ∨ true $
Stack: 0,2
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: B ∨ true $
Stack: 0
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Read B in state 0
Action: Shift 4
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: B ∨ true $
Stack: 0
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: B ∨ true $
Stack: 0,4
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Read ∨ in state 4
Action: Reduce
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: B ∨ true $
Stack: 0,4
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: E ∨ true $
Stack: 0
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Read E in state 0
Action: Shift 3
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: E ∨ true $
Stack: 0
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: E ∨ true $
Stack: 0,3
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Read ∨ in state 3
Action: Shift 6
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: E ∨ true $
Stack: 0,3
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: E ∨ true $
Stack: 0,3,6
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Read true in state 6
Action: Shift 2
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: E ∨ true $
Stack: 0,3,6
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: E ∨ true $
Stack: 0,3,6,2
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Read $ in state 2
Action: Reduce
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: E ∨ true $
Stack: 0,3,6,2
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: E ∨ B $
Stack: 0,3,6
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Read B in state 6
Action: Shift 8
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: E ∨ B $
Stack: 0,3,6
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: E ∨ B $
Stack: 0,3,6,8
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Read $ in state 8
Action: Reduce (!!!)
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: E ∨ B $
Stack: 0,3,6,8
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: E $
Stack: 0
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Read E in state 0
Action: Shift 3
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: E $
Stack: 0
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: E $
Stack: 0,3
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Bottom-Up Parsing - Example
Action
State
Read $ in state 3
Action: Accept!
Goto
false
true
1
2
0
$
1
false
B
false
B
false
B
false
B
false
B
2
true
B
true
B
true
B
true
B
true
B
3
5
4
B
6
E
B
E
B
3
4
Accept!
E
B
E
B
E
5
1
2
6
1
2
B
E
7
8
7
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
8
E
B
E
E
B
E
E
B
E
E
B
E
E
B
E
Input: E $
Stack: 0,3
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Parser with SableCC
Sections in the input file
1. Ignored Tokens
2. Productions
3. Abstract Syntax Tree
Ignored Tokens
–
–
–
–
Remove certain token while processing the token stream
Lexer produces the tokens
Parser ignores them
Extremly usefull for whitespaces and comments
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Productions in SableCC
• One rule per non terminal
• Alternatives have to be named
(max. one unnamed alternative)
expr = {add} expr + term | term
• If we use the same symbol multiple times, we need to name them
(max. one unnamed occurrence)
tuple = [first]:value comma [second]:value comma
• Operators *, + and ? can be used
coordinates = [coordinates]:coordinate*;
coordinate =
{two_dimensional} l_par [first]:number comma
[second]:number r_par |
{three_dimensional} l_par [first]:number [first_comma]:comma
[second]:number [second_comma]:comma
[third]:number r_par;
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Translation into Java-Types
•
•
•
•
•
•
•
All nodes inherit from Node
For each non terminal symbol one
abstract supertype
An implicit production rule
S → P <EOF>
For each alternative one syntax tree
node
Syntax tree nodes have methods to
get child nodes
If we use * or + SableCC will use
Lists
ACoordinates has the method
List<PCoordinate> getCoordinates()
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
SableCC-Example
• We need to take care about operator precedence
• A possible grammar for the expressions:
expr → expr + term | term
term → term * factor | factor
factor → (expr) | number
• The translation to SableCC is straightforward
DEMO
Expression_parser1
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
CST vs. AST
E→ E + T | T
T→T*F|F
F → (E) | digit
Abstraction
Input: (2+2)*5
E → E + E | E * E | digit
•
•
•
•
•
Abstract syntax tree contains the same information
But the grammar is ambiguous
Cannot be used for efficient parsers
SableCC allows to parse the concrete grammar while creating an AST
Unfortunately the syntax is quite ugly (will probably change in next release)
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Abstract Syntax Tree
•
•
•
•
•
We need a second (abstract) grammar
And the productions need some transformation annotations
Annotations look like { -> … }
Left hand side: mapping of CST node to AST node
Right hand side:
– Creation of new nodes for the AST
{ -> New ast_type.ast_alternative(concrete_symbol.abstract_type, …) }
– No creation of an AST node, just forward a symbol
{ -> concrete_symbol.abstract_type }
DEMO
expression_parser2
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
List transformation
• Transforming lists is a little bit tricky
• Type (LHS) is expr*
• List construction on RHS: [a] / [a,L]
Abstraction
E→ d | f ( L )
L→E|E,L
DEMO
list_transformation
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
E → d | E*
Analysis Tools
• Now we have an abstract syntax tree, what’s next?
• All AST classes contain apply methods
public void apply(Switch sw) {
((Analysis) sw).caseStart(this);
}
•
•
•
•
Analysis is an interface generated by SableCC
Using Switch and casting to Analysis is IMHO bullshit
AnalysisAdapter implements Analysis (empty defaults)
DepthFirstAdapter, ReversedDepthFirstAdapter are very
usefull
• Tree walk in defined order
• In and out methods for each type of node
DEMO
Expression_parser3
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
Conflicts
Sometimes even SableCC causes pain
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
LR Parsing (Shift-Reduce)
Grammar: E → Num | (E) | E + E | E * E
Input: Num + Num * Num
Stack
Input
Action
Num + Num * Num
Shift
Num
+ Num * Num
Reduce
E
+ Num * Num
Shift
E+
Num * Num
Shift
E + Num
* Num
Reduce
E+E
* Num
Shift  (ambiguous)
E+E*
Num
Shift
E + E * Num
Reduce
E+E*E
Reduce
E+E
Reduce
E
Accept
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto
The magic of LALR Parsing
• SableCC does magic for us
• Unless our grammar produces conflicts
– Shift/Shift
– Shift/Reduce
– Reduce/Reduce
• We need to fix the grammar
DEMO
conflicts
JUG Düsseldorf, 22.11. - Compilerbau mit Java - Jens Bendisposto

Similar documents