LR parsing

A LR parser is a type of shift/reduce parser that does not use backtracking (it is the most general type of non-backtracking S/R parser currently known). Its main drawback is the difficulty of constructing suitable parse tables. LR parsers also tend to produce larger tables than LL parsers.[1] The LR parser was first described by Knuth in 1965[2]. All LR parsers use the same algorithm; they differ in the way the table is generated.

A LR parser consists of a stack and a state table. It takes input in the form of a series of symbols. The state table lists actions for each terminal symbol in each state and also lists goto states for nonterminal symbols. The stack contains value/state pairs, and initially contains just the state 0.

The action table for each state may contain one of four values:

Shift, and move to state n
Reduce using rule number n
Error
Accept

The algorithm proceeds by reading the current state from the top of the stack, and the next symbol from the input. It looks up the action for that state and symbol in the table. A shift action causes the parser to push the read symbol and the new state onto the stack. A reduce action causes the parser to pop a suitable number of symbols off the stack, make the state the one now on top of the stack, and push the nonterminal representing rule n onto the stack, followed by the state specified for that nonterminal in the goto part of the state table. Error causes the parser to enter error handling mode, and accept causes the parser to accept the grammar.

The number of symbols popped by a reduce operation is determined by the rule used for reduction. That is, if we're reducing using a rule Y -> '(' ')', we pop 4 symbols (2 states and 2 symbols) off the stack, choose the goto state for Y using the state that is now on top of the stack, and push Y and the goto state onto the stack. Whew!

A simple example

Given the grammar

Y := '(' ')' |
     '(' Y ')'

We might derive the following table:

(1)	Y -> '(' ')'
(2)	Y -> '(' Y ')'

State	action			goto
State	(	)	$	Y

0	shift 1			5
1	shift 1	shift 2		3
2		reduce 1	reduce 1
3		shift 4
4		reduce 2	reduce 2
5			accept

The parse will proceed as follows:

Stack	Input	Action
0	((()))	Shift '(', state 1
0, '(', 1	(()))	Shift '(', state 1
0, '(', 1, '(', 1	()))	Shift '(', state 1
0, '(', 1, '(', 1, '(', 1	)))	Shift ')', state 2
0, '(', 1, '(', 1, '(', 1, ')', 2	))	Reduce using rule 1
0, '(', 1, '(', 1, Y, 3	))	Shift ')', state 4
0, '(', 1, '(', 1, Y, 3, ')', 4	)	Reduce using rule 2
0, '(', 1, Y, 3	)	Shift ')', state 4
0, '(', 1, Y, 3 ')', 4	$	Reduce using rule 2
0, Y, 5	$	Accept

GLR parsing[3]

A GLR parser is an extension of an LR parser that allows the parse table to contain multiple actions. That is, it deals with shift/reduce and reduce/reduce conflicts by evaluating *both* possibilities. It does this with the aid of a graph-structured stack, a stack which can branch to allow for concurrent evaluation. It should be noted that a GLR parser generator won't notice ambiguities, and the parser itself can produce multiple parse trees for an ambiguious grammar. GLR parsing techniques are popular in natural language processing, as natural languages tend to have an ambiguous nature. These parsers also allow scannerless parsers to be created.

[1] Although some experience with the algorithms described by Thomas Christopher seems to indicate that LL parse tables can grow large if derived from non-LL grammars.

[2] "On the translation of languages from left to right", Information and Control 8:6, 607-639

[3] Masaru Tomita and See-Kiong Ng. The generalized LR parsing algorithm. Generalized LR Parsing, Kluwer Academic Publishers, 1991.

Andrew Hunter

Last modified: Sat Jun 3 16:07:39 GMT 2000