Viterbi algorithm

Viterbi algorithm
Name	Viterbi algorithm
Class	Dynamic programming
Data	Hidden Markov model
Time	$O(T \times$
Space	$O(T \times$
Discovered-by	Andrew Viterbi
Year	1967

Contents

Overview
Algorithm description
Example
Applications
Extensions and variations
Computational complexity

Viterbi algorithm. The Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especially in the context of Markov information sources and hidden Markov models. It was proposed by Andrew Viterbi in 1967 as a method for decoding convolutional codes in digital communication. The algorithm has become fundamental in fields ranging from digital communications and speech recognition to computational linguistics and bioinformatics.

Overview

The algorithm addresses the problem of state estimation for processes that can be modeled as a hidden Markov model, where the system being studied is assumed to be a Markov process with unobservable states. It finds a single best path through the model's state space that explains the observed sequence, effectively solving the decoding problem. This is distinct from the forward algorithm, which computes the total probability of an observation sequence. The core idea relies on the principle of optimality, where the best path to any given state must consist of the best path to a preceding state. Its development was a breakthrough for digital signal processing and the design of error correction code systems like those used in the Voyager program.

Algorithm description

Given a hidden Markov model defined by a set of states, an observation sequence, state transition probabilities, and emission probabilities, the algorithm proceeds in two main phases: a forward pass and a backtracking pass. The forward pass computes, for each time step and each state, the maximum probability of reaching that state given the observations up to that point, storing a pointer to the previous state that achieved this maximum. This step is often visualized using a trellis diagram, a structure common in the analysis of convolutional codes. The backtracking pass, initiated from the most probable state at the final time step, follows these pointers backward to reconstruct the optimal state sequence. This process is mathematically equivalent to finding the shortest path through a directed acyclic graph.

Example

A classic illustrative example involves determining the weather (hidden states) based on observed seaweed humidity levels, a scenario often used in teaching hidden Markov models. If the hidden states are "Sunny" and "Rainy," with known transition and emission probabilities, and the observed sequence is "Dry," "Damp," "Soggy," the algorithm calculates the most likely daily weather sequence. At each day, it computes the probability of it being sunny or rainy given the observation, discarding lower-probability paths—a process known as path pruning. The final reconstructed path, for instance, might be "Sunny, Rainy, Rainy." This example mirrors applications in part-of-speech tagging, where words are observations and grammatical tags are hidden states.

Applications

The algorithm's primary and original application was in decoding convolutional codes and trellis-coded modulation in digital communication systems, such as those used in GSM cellular networks, deep space network communications, and satellite television standards like DVB-S. In speech recognition, it is used within systems like IBM's early Tangora (speech recognition) to find the most likely word sequence from acoustic features. In computational biology, it is employed for tasks like gene finding within sequences from the Human Genome Project and modeling protein families in the Pfam database. Other uses include optical character recognition in systems developed by companies like Adobe Systems, and natural language processing for tasks like named-entity recognition.

Extensions and variations

Several important extensions have been developed to address the algorithm's limitations or adapt it to new domains. The list Viterbi algorithm produces a ranked list of the top *N* most likely sequences rather than just the single best path. For continuous state spaces, the Kalman filter serves a similar estimation purpose, while the particle filter provides a sequential Monte Carlo approximation. The maximum a posteriori estimation framework generalizes the decoding objective. In machine learning, the structured perceptron and conditional random field models often use a Viterbi-like decoding step during inference. The Bahl–Cocke–Jelinek–Raviv algorithm, used in turbo code decoding, is a related forward-backward algorithm for computing posterior marginals.

Computational complexity

The time complexity of the algorithm is

O(T \times |S|^2)

, where *T* is the length of the observation sequence and *|S|* is the number of hidden states, due to the need to consider all possible transitions between states at each time step. Its space complexity is

O(T \times |S|)

for storing the probability and pointer matrices. This makes it efficient for models with a manageable number of states but can become prohibitive for very large state spaces, prompting the use of approximate methods like beam search, which prunes low-probability paths at each step. The algorithm is a cornerstone in the design of digital signal processors and application-specific integrated circuits for real-time decoding in devices from Qualcomm modems to NASA spacecraft.

Category:Dynamic programming Category:Algorithms Category:Digital signal processing