slides

Transcription

slides
Context-Free Languages
Wen-Guey Tzeng
Department of Computer Science
National Chiao Tung University
1
Context-Free Grammars
• Some languages are not regular.
– Eg. L={anbn : n0}
• A grammar G=(V, T, S, P) is context-free if all
productions are of form
Ax, where AV, x(VT)*
– The left side has only one variable.
• A language L is context-free if and only if
there is a context-free grammar G such that
L=L(G).
2
Examples
• G=({S}, {a, b}, S, P), with P={SaSa|bSb|}
– Derivation:
S  aSa  aaSaa  aabSbaa  aabbaa=aabbaa
– L(G) = {wwR : w{a, b}*}
3
• S abB, AaaBb|, BbbAa
– L(G) = {ab(bbaa)nbba(ba)n : n0} ?
4
Design cfg’s
• Give a cfg for L={anbm : n>m}
5
Design cfg’s
• Give a cfg for L={anbm : nm0}
– Idea1: parse L into two cases (not necessarily
disjoint) L1={anbm : n>m}  L2={anbm : n<m}. Then,
construct productions for L1 and L2, respectively.
6
• Give a cfg for L={anbm : nm0}
– Idea2: for L1, produce the same amount of a’s and
b’s, then extra a’s
7
• Give a cfg for L={anbmck : m=n+k}
8
• Give a cfg for L={anbmck : |n-m|=k}
9
• Give a cfg for L={anbmck : m>n+k}
10
• Give a cfg for L={anbmck : mn+k}
11
• Give a cfg for L={w{a,b}* : na(w)=nb(w)}
12
• Give a cfg for L={w{a,b}* : na(w)>nb(w)}
13
Leftmost and rightmost derivation
• G={{A, B, S}, {a, b}, S, P}, where P contains
SAB, AaaA, A, BBb, B 
– L(G)={a2nbm : n, m0}
• For string aab
– Rightmost derivation
– Leftmost derivation
14
Derivation (parse) tree
• AabABc
15
• SaAB, AbBb, BA|
16
Some comments
• Derivation trees represent no orders of
derivation
• Leftmost/rightmost derivations correspond to
depth-first visiting of the tree
• Derivation tree and derivation order are very
important to “programming language” and
“compiler design”
17
Grammar for C
18
main()
{
int i=1;
printf("i starts out life as %d.", i);
i = add(1, 1); /* Function call */
printf(" And becomes %d after function
is executed.\n", i);
}
19
Parsing and ambiguity
• Parsing of wL(G): find a sequence of
productions by which wL(G) is derived.
• Questions: given G and w
– Is wL(G) ? (membership problem)
– Efficient way to determine whether wL(G) ?
– How is wL(G) parsed ? (build the parsing tree)
– Is the parsing unique ?
20
Exhaustive search/top down parsing
• SSS|aSb|bSa|
• Determine aabbL(G) ?
– 1st round: (1) SSS; (2) SaSb; (3) SbSa; (4) S
– 2nd round:
• From (1), SSSSSS, SSSaSbS, SSSbSaS, SSSS
• From (2), SaSbaSSb, SaSbaaSbb, SaSbabSab,
SaSbab
– 3rd round: …
• Drawback: inefficiency
• Other ways ?
21
• If no productions of form A or AB, the
exhaustive search for wL(G) can be done in
|P|+|P|2+…+|P|2|w| = O(|P|2|w|+1)
22
Bottom up parsing
• To reduce a string w to the start variable S
• SaSb|
– w=aabb  aaSbb  aSb  S
• Efficiency: O(|w|3)
23
Linear-time parsing
• Simple grammar (s-grammar)
– All productions are of form
Aax, where x(V T)*
– Any pair (A, a) occurs at most once in P.
• Example: SaS|bSS|c
– Parsing for ababccc
24
Ambiguous grammars
• G is ambiguous if some wL(G) has two
derivation trees.
• Example: SaSb|SS|
25
Example from programming languages
• C-like grammar for arithmetic expressions.
G=({E, I}, {a, b, c, +, x, (, )}, E, P), where P contains
EI
EE+E
EExE
E(E)
Ia|b|c
• w=a+bxc has two derivation trees
26
27
Ambiguous languages
• A cfl L is inherently ambiguous if any cfg G
with L(G)=L is ambiguous. Otherwise, it is
unambiguous.
• Note: an unambiguous language may have
ambiguous grammar.
• Example: L={anbncm} {anbmcm} is inherently
ambigous.
– Hard to prove.
28
CFG and Programming Languages
• Programming language: syntax + semantics
• Syntax is defined by a grammar G
– <expression> ::= <term> | <expression> + <term>
<term> ::= <factor> | <term> * <factor>
– <while_statement> ::= while <expression><statement>
• Syntax checking in compilers is done by a parser
– Is a program p correct ?
– Is pL(G) ?
– We need efficient parsers.
29
Restricted grammars for Programming
Languages
• Goal:
– The expression power is enough.
– No ambiguity.
if then if then else
 If then “if then else”
 If then “if then” else
– There exist efficient parsers.
30
• C -- LR(1)
• PASCAL -- LL(1)
• Hierarchy of classes of context-free languages
– LL(1)  LR(0)  LR(1)=DCFL  LR(2)  …  CFL
31
Syntactic Correctness
• Lexical analyzer produces a stream of tokens
x = y +2.1  <id> <op> <id> <op> <real>
• Parser (syntactic analyzer) verifies that this
token stream is syntactically correct by
constructing a valid parse tree for the entire
program
– Unique parse tree for each language construct
– Program = collection of parse trees rooted at the top
by a special start symbol
slide 32
CFG For Floating Point Numbers
::= stands for production rule; <…> are non-terminals;
| represents alternatives for the right-hand side of a production rule
Sample parse tree:
slide 33
CFG For Balanced Parentheses
Could we write this grammar using
regular expressions or DFA? Why?
Sample derivation:
<balanced>  ( <balanced> )
 (( <balanced> ))
 (( <empty> ))
 (( ))
slide 34
CFG For Decimal Numbers (Redux)
This grammar is right-recursive
Sample
top-down leftmost
derivation:
<num>  <digit> <num>
 7 <num>
 7 <digit> <num>
 7 8 <num>
 7 8 <digit>
789
slide 35
Compiler-compiler
• A compiler-compiler is a program that
generates a compiler from a defined grammar
• Parser can be built automatically from the BNF
description of the language’s CFG
• Tools: yacc, Bison
slide 36
program
G=(V, T, S, P)
Compilercompiler
Compiler:
parser + code
generator
Input data
Execution code
result
slide 37