Lempel–Ziv–Welch

Lempel–Ziv–Welch
Name	Lempel–Ziv–Welch
Caption	A conceptual diagram of the LZW compression process.
Class	Lossless data compression
Data structure	Dictionary
Time	O(n)
Space	O(n)
Authors	Terry Welch
Year	1984
Based on	Lempel–Ziv 78

Contents

Overview
Algorithm
Implementation details
Applications
Patents and history
Variants and derivatives

Lempel–Ziv–Welch. Lempel–Ziv–Welch is a universal lossless data compression algorithm created by Terry Welch in 1984. It is an improved implementation of the Lempel–Ziv 78 algorithm, designed for high-speed hardware implementation and widespread software use. The algorithm works by building a dictionary of strings from the input data and outputting codes representing sequences of symbols, achieving compression without any loss of information.

Overview

The algorithm was developed at Sperry Corporation as a refinement of the earlier work by Abraham Lempel and Jacob Ziv. Its primary advantage over its predecessor, Lempel–Ziv 78, was its simpler, more efficient dictionary management strategy, which used fixed-size codes. This design made it exceptionally suitable for implementation in hardware, such as the Intel 8270 disk controller chip, and led to its rapid adoption. The format became particularly famous through its use in the Graphics Interchange Format standard, developed by CompuServe.

Algorithm

The core process begins with a dictionary initialized to contain all possible single-character strings. The algorithm then parses the input sequence, finding the longest string already present in the dictionary. It outputs the dictionary code for that string and adds a new dictionary entry consisting of that string plus the following input symbol. This incremental parsing and dictionary building allows the compressor to adapt to the data's statistical properties dynamically. Decompression reconstructs an identical dictionary using the stream of codes, requiring no prior knowledge of the data's structure.

Implementation details

A critical implementation detail is the management of the dictionary's size, typically using 12-bit codes allowing for 4096 entries. When the dictionary fills, implementations must choose a strategy; common approaches include freezing the dictionary, clearing it entirely, or using a least-recently used scheme to discard old entries. Efficient dictionary search is often achieved using a hash table or a trie data structure. The algorithm's simplicity is demonstrated in its canonical representation in the C (programming language) as published in Dr. Dobb's Journal.

Applications

Its most iconic application was in the Graphics Interchange Format, where it became a *de facto* standard for simple image compression on early online services like CompuServe and America Online. The algorithm was also widely used in the UNIX `compress` utility and embedded in hardware controllers for hard disk drives and modems, such as those from V.42bis. Other notable uses included the Adobe PostScript language and early versions of the Portable Document Format.

Patents and history

The algorithm was patented by Sperry Corporation, which later merged into Unisys. This patent, U.S. Patent 4,558,302, became the subject of significant controversy in the early 1990s when Unisys began enforcing licensing fees for its use in Graphics Interchange Format software, an event often called the GIF controversy. This enforcement spurred the development of alternative formats like the Portable Network Graphics standard. The patent expired in the United States in 2003 and earlier in other jurisdictions like the European Union.

Variants and derivatives

Several important variants were developed to address specific limitations or patent issues. Lempel–Ziv–Markov chain algorithm, used in the 7-Zip archiver, combines an LZ77-style algorithm with Markov chain prediction. The Lempel–Ziv–Oberhumer algorithm is optimized for extremely fast decompression. Other derivatives include Lempel–Ziv–Stac and Lempel–Ziv–Ross Williams, each with different dictionary management and encoding strategies. These algorithms form the core of many modern compression schemes, including those in the DEFLATE algorithm used in Gzip and ZIP (file format).

Category:Lossless compression algorithms Category:1984 in computing