LL parsers

An LL(k) parser is a top-down parser, that is, it decides which production to use by looking at the next 1-k tokens. This means that an LL(k) grammar must not have a point where the alternative productions have the same prefix, and that a production cannot be left-recursive - the left hand side must always resolve to a terminal. In addition, an LL(k) grammar must have a fixed value of k. Most LL(k) parsers are actually LL(1) parsers that use lookahead trees where it's necessary to disambiguate rules.

An LL parser is much easier to understand than an LALR parser, easier to write and debug and they have better error recovery semantics. A lot of non-LL grammars can be easily munged into LL(k)-form, in a lot of cases, where k=1. LL parser generators tend to produce smaller tables than LALR parsers.


Algorithm

The LL(1) algorithm requires a stack (the 'prediction stack'), and a lookahead token. It works as follows (in the pseudocode and examples, the rightmost entry in the stack is the 'top' element):

Initialise the prediction stack:
  (eof, start) [where start is the start symbol]
Initialise the lookahead token to be empty

Repeat
    If lookahead is empty, read the next token into the lookahead token
    Pop the top symbol off of the predicition stack ('current symbol')
    If the current symbol is a terminal and matches the lookahead token
       Make the lookahead token empty
    If the current symbol is a terminal and doesn't match the lookahead token
       An error has occured
    If the current symbol is a non-terminal
       Choose the production from the non-terminal whose first
       terminal symbol matches the lookahead symbol (or the empty
       token if none is found), and push it onto the stack, rightmost
       symbol first.
Until stack is empty or error occurs

You may also find a semantics stack useful (push the terminals onto it as they are matched on the prediction stack), and it is possible to create 'action' tokens, which have code associated with them that is executed when they are popped from the stack, but are otherwise treated as empty symbols.


LL(1) vs LL(k)

These grammars differ in the number of look-ahead tokens required to determine the production to push on to the stack. A grammar is not LL(1) if there exists a nonterminal X defined in the following form:

X ::= empty | X1 | X2 | X3 | ... | Xn

(Where X1..Xn are productions), and X is used in a context where the next terminal symbol to be resolved is one of the first in X1-Xn. For example, the following is a LL(k) grammar, but not a LL(1) grammar:

X ::= Y | Y Z
Y ::= empty | 't' Y
Z ::= 't' 'z'

(Note that 't' 'z' might be an empty (Y) followed by a 't' followed by a 'z' or a 't' (Y) followed by a 'z' (error)). MooCow will resolve these ambiguities by forking the parse process into two (or more) parallel parses (that is, the GLR algorithm adapted to LL parsers). The example above would be non-LL(1) without the recursion, but trivial to fix so that it was LL(1)...


Example

The following is a LL(1) grammar:

X ::= '(' X ') |
     empty

And for the string '(((())))' the parser will procede as follows:

Prediction stack Semantics stack Lookahead
eof X empty
eof ')' X '(' '('
eof ')' X '(' empty
eof ')' ')' X '(' '(' '('
eof ')' ')' X '(' '(' empty
eof ')' ')' ')' X '(' '(' '(' '('
eof ')' ')' ')' X '(' '(' '(' empty
eof ')' ')' ')' ')' X '(' '(' '(' '(' '('
eof ')' ')' ')' ')' X '(' '(' '(' '(' empty
eof ')' ')' ')' ')' '(' '(' '(' '(' ')'
eof ')' ')' ')' '(' '(' '(' '(' ')' empty
eof ')' ')' ')' '(' '(' '(' '(' ')' ')'
eof ')' ')' '(' '(' '(' '(' ')' ')' empty
eof ')' ')' '(' '(' '(' '(' ')' ')' ')'
eof ')' '(' '(' '(' '(' ')' ')' ')' empty
eof ')' '(' '(' '(' '(' ')' ')' ')' ')'
eof '(' '(' '(' '(' ')' ')' ')' ')' empty
eof '(' '(' '(' '(' ')' ')' ')' ')' eof
empty '(' '(' '(' '(' ')' ')' ')' ')' eof empty

Andrew Hunter
Last modified: Sat Feb 19 19:16:39 GMT 2000