slides
Transcription
slides
Context-Free Languages Wen-Guey Tzeng Department of Computer Science National Chiao Tung University 1 Context-Free Grammars • Some languages are not regular. – Eg. L={anbn : n0} • A grammar G=(V, T, S, P) is context-free if all productions are of form Ax, where AV, x(VT)* – The left side has only one variable. • A language L is context-free if and only if there is a context-free grammar G such that L=L(G). 2 Examples • G=({S}, {a, b}, S, P), with P={SaSa|bSb|} – Derivation: S aSa aaSaa aabSbaa aabbaa=aabbaa – L(G) = {wwR : w{a, b}*} 3 • S abB, AaaBb|, BbbAa – L(G) = {ab(bbaa)nbba(ba)n : n0} ? 4 Design cfg’s • Give a cfg for L={anbm : n>m} 5 Design cfg’s • Give a cfg for L={anbm : nm0} – Idea1: parse L into two cases (not necessarily disjoint) L1={anbm : n>m} L2={anbm : n<m}. Then, construct productions for L1 and L2, respectively. 6 • Give a cfg for L={anbm : nm0} – Idea2: for L1, produce the same amount of a’s and b’s, then extra a’s 7 • Give a cfg for L={anbmck : m=n+k} 8 • Give a cfg for L={anbmck : |n-m|=k} 9 • Give a cfg for L={anbmck : m>n+k} 10 • Give a cfg for L={anbmck : mn+k} 11 • Give a cfg for L={w{a,b}* : na(w)=nb(w)} 12 • Give a cfg for L={w{a,b}* : na(w)>nb(w)} 13 Leftmost and rightmost derivation • G={{A, B, S}, {a, b}, S, P}, where P contains SAB, AaaA, A, BBb, B – L(G)={a2nbm : n, m0} • For string aab – Rightmost derivation – Leftmost derivation 14 Derivation (parse) tree • AabABc 15 • SaAB, AbBb, BA| 16 Some comments • Derivation trees represent no orders of derivation • Leftmost/rightmost derivations correspond to depth-first visiting of the tree • Derivation tree and derivation order are very important to “programming language” and “compiler design” 17 Grammar for C 18 main() { int i=1; printf("i starts out life as %d.", i); i = add(1, 1); /* Function call */ printf(" And becomes %d after function is executed.\n", i); } 19 Parsing and ambiguity • Parsing of wL(G): find a sequence of productions by which wL(G) is derived. • Questions: given G and w – Is wL(G) ? (membership problem) – Efficient way to determine whether wL(G) ? – How is wL(G) parsed ? (build the parsing tree) – Is the parsing unique ? 20 Exhaustive search/top down parsing • SSS|aSb|bSa| • Determine aabbL(G) ? – 1st round: (1) SSS; (2) SaSb; (3) SbSa; (4) S – 2nd round: • From (1), SSSSSS, SSSaSbS, SSSbSaS, SSSS • From (2), SaSbaSSb, SaSbaaSbb, SaSbabSab, SaSbab – 3rd round: … • Drawback: inefficiency • Other ways ? 21 • If no productions of form A or AB, the exhaustive search for wL(G) can be done in |P|+|P|2+…+|P|2|w| = O(|P|2|w|+1) 22 Bottom up parsing • To reduce a string w to the start variable S • SaSb| – w=aabb aaSbb aSb S • Efficiency: O(|w|3) 23 Linear-time parsing • Simple grammar (s-grammar) – All productions are of form Aax, where x(V T)* – Any pair (A, a) occurs at most once in P. • Example: SaS|bSS|c – Parsing for ababccc 24 Ambiguous grammars • G is ambiguous if some wL(G) has two derivation trees. • Example: SaSb|SS| 25 Example from programming languages • C-like grammar for arithmetic expressions. G=({E, I}, {a, b, c, +, x, (, )}, E, P), where P contains EI EE+E EExE E(E) Ia|b|c • w=a+bxc has two derivation trees 26 27 Ambiguous languages • A cfl L is inherently ambiguous if any cfg G with L(G)=L is ambiguous. Otherwise, it is unambiguous. • Note: an unambiguous language may have ambiguous grammar. • Example: L={anbncm} {anbmcm} is inherently ambigous. – Hard to prove. 28 CFG and Programming Languages • Programming language: syntax + semantics • Syntax is defined by a grammar G – <expression> ::= <term> | <expression> + <term> <term> ::= <factor> | <term> * <factor> – <while_statement> ::= while <expression><statement> • Syntax checking in compilers is done by a parser – Is a program p correct ? – Is pL(G) ? – We need efficient parsers. 29 Restricted grammars for Programming Languages • Goal: – The expression power is enough. – No ambiguity. if then if then else If then “if then else” If then “if then” else – There exist efficient parsers. 30 • C -- LR(1) • PASCAL -- LL(1) • Hierarchy of classes of context-free languages – LL(1) LR(0) LR(1)=DCFL LR(2) … CFL 31 Syntactic Correctness • Lexical analyzer produces a stream of tokens x = y +2.1 <id> <op> <id> <op> <real> • Parser (syntactic analyzer) verifies that this token stream is syntactically correct by constructing a valid parse tree for the entire program – Unique parse tree for each language construct – Program = collection of parse trees rooted at the top by a special start symbol slide 32 CFG For Floating Point Numbers ::= stands for production rule; <…> are non-terminals; | represents alternatives for the right-hand side of a production rule Sample parse tree: slide 33 CFG For Balanced Parentheses Could we write this grammar using regular expressions or DFA? Why? Sample derivation: <balanced> ( <balanced> ) (( <balanced> )) (( <empty> )) (( )) slide 34 CFG For Decimal Numbers (Redux) This grammar is right-recursive Sample top-down leftmost derivation: <num> <digit> <num> 7 <num> 7 <digit> <num> 7 8 <num> 7 8 <digit> 789 slide 35 Compiler-compiler • A compiler-compiler is a program that generates a compiler from a defined grammar • Parser can be built automatically from the BNF description of the language’s CFG • Tools: yacc, Bison slide 36 program G=(V, T, S, P) Compilercompiler Compiler: parser + code generator Input data Execution code result slide 37