Aho–Ullman algorithm

Aho–Ullman algorithm
Name	Aho–Ullman algorithm
Class	Parsing
Data structure	Directed acyclic graph
Time	$O(n^3)$
Space	$O(n^2)$
Authors	Alfred Aho, Jeffrey Ullman
Year	1969

Contents

● Overview
● Algorithm description
● Complexity analysis
● Applications
● Related algorithms

Aho–Ullman algorithm. The Aho–Ullman algorithm is a fundamental dynamic programming method for context-free grammar recognition and parsing, developed in the late 1960s. It operates by systematically constructing a parse table to determine if a given input string can be generated by a specified grammar. The algorithm is historically significant as one of the early general parsing algorithms, establishing key techniques that influenced later developments in compiler construction and formal language theory.

● Overview

The algorithm was introduced by Alfred Aho and Jeffrey Ullman in their seminal 1969 paper, which contributed to the theoretical foundations of parsing. It is designed to handle any context-free grammar without restrictions, unlike more efficient but less general methods like the LL parser or the LR parser. The core idea involves representing possible parse tree structures for substrings of the input using a Boolean matrix, a technique that draws from the Cocke–Younger–Kasami algorithm. This approach is part of the broader study of formal languages within computer science, connecting to the work of Noam Chomsky on grammar hierarchies. The Aho–Ullman algorithm is primarily of theoretical importance, demonstrating that context-free language recognition is computationally tractable, albeit with relatively high time complexity.

● Algorithm description

The algorithm takes as input a context-free grammar in Chomsky normal form and a string of terminal symbols. It constructs a three-dimensional Boolean array or table, where each entry indicates whether a specific nonterminal symbol can derive a particular substring. The procedure iterates over all possible substring lengths, starting with single symbols, and applies the grammar production rules to combine smaller, already recognized substrings. This bottom-up method resembles the dynamic programming strategy used in the Cocke–Younger–Kasami algorithm and shares conceptual ground with Earley parser techniques for handling ambiguity. The final result is determined by checking the table entry corresponding to the start symbol and the entire input string. The algorithm's structure ensures it explores all possible derivations, making it a general but computationally intensive solution for the membership problem for context-free grammars.

● Complexity analysis

In its standard formulation, the Aho–Ullman algorithm has a time complexity of $O(n^3)$ and a space complexity of $O(n^2)$ , where $n$ is the length of the input string. This cubic time bound arises from the triple-nested loops over substring indices and nonterminal symbols, a characteristic shared with the Cocke–Younger–Kasami algorithm. While this was a significant result proving the polynomial-time decidability of context-free language membership, it is less efficient than practical parsers like those generated by Yacc or ANTLR. The analysis of the algorithm contributed to the field of computational complexity theory, particularly in classifying problems within the complexity class P. Subsequent research by Leslie Valiant and others on matrix multiplication has shown that the general context-free grammar recognition problem can be related to the complexity of Boolean matrix multiplication.

● Applications

Though rarely used directly in modern compiler implementations due to its performance, the Aho–Ullman algorithm has important theoretical and educational applications. It serves as a key case study in courses on formal languages and compiler construction at institutions like Stanford University and the Massachusetts Institute of Technology. The algorithm's principles underpin the design of certain syntax-directed translation schemes and are referenced in foundational textbooks by Alfred Aho and Monica Lam on compilers. Its methodology for table construction also informs techniques in natural language processing, particularly for chart parsing in systems like the PLNLP toolkit. Furthermore, the algorithm's approach to handling ambiguous grammars has influenced the development of generalized parsing frameworks used in integrated development environments for language implementation.

The Aho–Ullman algorithm is closely related to several other parsing strategies developed in the same era. The Cocke–Younger–Kasami algorithm is essentially contemporaneous and shares the same cubic time complexity for Chomsky normal form grammars. The Earley parser, invented by Jay Earley, offers similar generality but can be more efficient for certain classes of grammars. More practical, linear-time algorithms include the LL parser family, the LR parser family (including LALR parser variants used by Yacc), and the recursive descent parser. The theoretical lineage also connects to the Valiant's algorithm, which reduced the problem to matrix multiplication. The work of Donald Knuth on LR grammars and the development of tools like Bison and JavaCC for parser generation represent the practical evolution from these early general methods.

Category:Parsing algorithms Category:Dynamic programming Category:Formal languages

● Some section boundaries were detected using heuristics. Certain LLMs occasionally produce headings without standard wikitext closing markers, which are resolved automatically.

Aho–Ullman algorithm

● Overview

● Algorithm description

● Complexity analysis

● Applications

● Related algorithms