LR parser — LLMpedia

LR parser
Name	LR parser
Class	Bottom-up parser
Author	Donald Knuth
Year	1965

Contents

Overview
Algorithm and operation
Variants and classification
Construction of parsing tables
Comparison with other parsers
Practical applications and examples

LR parser. An LR parser is a type of bottom-up parser used extensively in compiler construction for analyzing the syntax of programming languages. It reads input from left to right and produces a rightmost derivation in reverse, making it highly efficient for deterministic context-free grammars. The method was first introduced by Donald Knuth in his seminal 1965 paper, establishing a formal foundation for syntax analysis.

Overview

The core principle of an LR parser involves using a pushdown automaton with a stack and an input buffer to process terminal and nonterminal symbols. It operates by constructing a deterministic finite automaton from the grammar's LR items, which guide shift and reduce actions. This approach is fundamental to many compiler-compiler tools, such as Yacc and its successor GNU Bison, which generate parsing code for languages like C (programming language) and Java (programming language). The parser's ability to handle a broad class of context-free grammars with lookahead makes it a cornerstone of modern computer science.

Algorithm and operation

The algorithm employs a parsing table derived from the LR(0) or LR(1) sets of items, dictating actions for each state. During operation, the parser performs shift-reduce parsing by pushing symbols onto the stack or reducing them based on production rules. Key components include the ACTION and GOTO tables, which are consulted at each step to decide between shifting input, reducing a handle, accepting the string, or reporting a syntax error. This process is central to implementations in systems like ANTLR and the Java Compiler Compiler.

Variants and classification

Several variants exist, categorized by their lookahead capabilities and table construction methods. The SLR parser simplifies table building but is less powerful, while the canonical LR parser offers greater precision at the cost of larger tables. The LALR parser, used in Yacc and Bison (parser generator), strikes a practical balance by merging states from the canonical LR construction. Other notable variants include the GLR parser for ambiguous grammars and the LR(k) parser for generalized lookahead, influencing tools like Eclipse (IDE) and the GNU Compiler Collection.

Construction of parsing tables

Table construction begins by computing the canonical collection of LR(0) or LR(1) items from the grammar's augmented grammar. The closure and goto operations generate states for a deterministic finite automaton, which are then used to populate the ACTION and GOTO tables. This method, formalized by Donald Knuth, is implemented in parser generators like Bison (parser generator) and CUP (parser generator), enabling support for languages such as Python (programming language) and Ruby (programming language).

Comparison with other parsers

Compared to top-down parsers like the LL parser, LR parsers can handle a wider range of context-free grammars, including left-recursive rules, but require more complex table generation. While recursive descent parser methods, used in Pascal (programming language) compilers, are simpler to implement, LR parsers offer superior performance for deterministic languages. The CYK algorithm and Earley parser support all context-free grammars but are less efficient, making LR parsing the preferred choice in production compilers for C++ and Fortran.

Practical applications and examples

LR parsers are integral to the front-end of many compiler and interpreter (computing) systems. They are employed in GNU Bison for generating parsers in GNU Project tools, in Java Compiler Compiler for Java (programming language) development, and in Roslyn (compiler) for C Sharp (programming language). Notable examples include their use in the GNU Compiler Collection for Ada (programming language) and in SQL query processing within Oracle Database and Microsoft SQL Server.

Category:Parsing algorithms Category:Compiler construction Category:Formal languages