Friday 11 October 2013

LR parser

In computer science, an LR parser is a parser that reads input from Left to right (as it would appear if visually displayed) and produces a Rightmost derivation. The term LR(k) parser is also used; where the k refers to the number of unconsumed "look ahead" input symbols that are used in making parsing decisions. Usually k is 1 and the term LR parser is often intended to refer to this case.
The syntax of many programming languages can be defined by a grammar that is LR(1), or close to being so, and for this reason LR parsers are often used by compilers to perform syntax analysis of source code.
An LR parser can be created from a context-free grammar by a program called a parser generator or hand written by a programmer. A context-free grammar is classified as LR(k) if there exists an LR(k) parser for it, as determined by the parser generator.
An LR parser is said to perform bottom-up parsing because it attempts to deduce the top level grammar productions by building up from the leaves.
A deterministic context-free language is a language for which some LR(k) grammar exists. Every LR(k) grammar for k > 1 can be mechanically transformed into an LR(1) grammar for the same language, while an LR(0) grammar for the same language may not exist; the LR(0) languages are a proper subset of the deterministic ones.
An LR parser is based on an algorithm which is driven by a parser table, a data structure which contains the syntax of the computer language being parsed. So the term LR parser actually refers to a class of parser that can process almost any programming language, as long as the parser table for a programming language is supplied. The parser table is created by a program called a parser generator.
LR parsing can be generalized as arbitrary context-free language parsing without a performance penalty, even for LR(k) grammars. This is because most programming languages can be expressed with LR(k) grammars, where k is a small constant (usually 1). Note that parsing of non-LR(k) grammars is an order of magnitude slower (cubic instead of quadratic in relation to the input length).
LR parsing can handle a larger range of languages than LL parsing, and is also better at error reporting, i.e. it detects syntactic errors when the input does not conform to the grammar as soon as possible. This is in contrast to an LL(k) (or even worse, an LL(*) parser) which may defer error detection to a different branch of the grammar due to backtracking, often making errors harder to localize across disjunctions with long common prefixes.
LR parsers are difficult to produce by hand and they are usually constructed by a parser generator or a compiler-compiler. Depending on how the parsing table is generated, these parsers can be called simple LR parsers (SLR), look-ahead LR parsers (LALR), or canonical LR parsers. LALR parsers have more language recognition power than SLR parsers. Canonical LR parsers have more recognition power than LALR parsers.

Contents


No comments:

Post a Comment