HMM — LLMpedia

HMM
Name	HMM
Type	Statistical model
Introduced	1960s
Developers	Andrey Markov, Leonard E. Baum, Boris Galanter, James Buck
Related	Kalman filter, Hidden semi-Markov model, Conditional random field

Contents

Introduction
Mathematical Formulation
Algorithms and Inference
Applications
Variants and Extensions
Implementation and Software
Limitations and Challenges

HMM

Hidden Markov Models are statistical models for sequences in which an unobserved stochastic process with Markovian dynamics generates observed data through an emission process. Originating from work on stochastic processes and later formalized in the 1960s for speech signal analysis, HMMs provide a compact framework to model temporal structure, latent states, and noisy observations. They bridge ideas from Andrey Markov's chains, Norbert Wiener's filtering traditions, and innovations used in Automatic speech recognition and bioinformatics such as gene prediction.

Introduction

HMMs describe systems where a sequence of hidden discrete states evolves according to a Markov chain and each state stochastically emits an observable symbol or vector. Early practical deployments tied HMMs to Bell Labs research on speech, later adopted by teams behind Hidden Markov model toolkit and research at institutions like Massachusetts Institute of Technology and Carnegie Mellon University. HMMs connect to foundational work by Leonard E. Baum and collaborators who derived the expectation-maximization style Baum–Welch algorithm, and to applications influenced by figures such as Lawrence Rabiner and organizations like IBM.

Mathematical Formulation

A standard discrete HMM comprises a finite state set S = {s1,...,sN}, initial state distribution pi, state transition matrix A = [a_ij], and emission probabilities B = {b_j(o)} over observation alphabet O. The joint probability of a state path and observation sequence factorizes as pi_{s1} b_{s1}(o1) ∏_{t=2}^T a_{s_{t-1},s_t} b_{s_t}(o_t), mirroring constructions from Markov chain theory and discrete-time stochastic processes studied by Andrey Markov and later by Doob, Joseph L.. Continuous-emission HMMs replace b_j with conditional densities (often Gaussian mixtures), connecting to mixture models used by groups at University of Cambridge and Stanford University. Parameter estimation optimizes likelihood functions often nonconvex, requiring iterative schemes introduced in statistical literature related to the Expectation–maximization algorithm.

Algorithms and Inference

Key algorithmic primitives include the forward algorithm and backward algorithm for computing marginal likelihoods, the Viterbi algorithm for most probable state path decoding, and the Baum–Welch algorithm for maximum likelihood parameter estimation. The Viterbi dynamic program shares ancestry with optimization techniques used in Claude Shannon's information theory and dynamic programming pioneered by Richard Bellman. Filtering, smoothing, and state prediction operations are analogous to recursions in the Kalman filter for linear Gaussian systems. Complexity scales as O(N^2 T) for naive implementations; sparse or structured A matrices reduce costs, as exploited in datasets studied at European Molecular Biology Laboratory and projects at Google's speech teams.

Applications

HMMs have seen broad adoption across domains. In speech processing they power phoneme recognition systems developed at Bell Labs and operationalized in products by AT&T and Microsoft Research. Bioinformatics uses HMMs for sequence alignment, gene finding, and protein family modeling in tools from European Bioinformatics Institute and projects like Pfam and HMMER. Natural language processing employs HMMs for part-of-speech tagging and shallow parsing in corpora curated by Pennsylvania State University and the Linguistic Data Consortium. Other applications include financial time series modeling in studies at Goldman Sachs and Bank of England-affiliated research, gesture recognition in robotics labs at MIT Media Lab, and reliability analysis in aerospace projects at NASA.

Variants and Extensions

Numerous extensions address modeling needs: hidden semi-Markov models add explicit state-duration distributions used in speech and bioinformatics research from University of California, Berkeley; factorial HMMs model multiple interacting state chains as used in machine learning groups at Toronto institutions; input–output HMMs condition emissions on exogenous inputs, similar in spirit to mechanisms in Conditional random field research at University of Massachusetts Amherst. Continuous-time HMMs relate to point processes studied by groups at Columbia University; hierarchical HMMs enable multi-scale structure as explored in work from SRI International. Bayesian nonparametric variants such as the Hierarchical Dirichlet Process HMM (HDP-HMM) were developed by researchers including those associated with University of California, San Diego and Harvard University.

Implementation and Software

Many toolkits implement HMM functionality. HMMER and Bio++ support biological sequence HMMs, maintained by teams affiliated with EMBL-EBI and research groups in France. HTK (Hidden Markov Model Toolkit) originated at Cambridge University for speech tasks; Kaldi, maintained by a collaboration including researchers from Johns Hopkins University, provides modern speech recognition primitives incorporating HMMs with deep learning. Libraries for general-purpose use include implementations in scikit-learn (Python) influenced by work at INRIA and Google, R packages hosted by The Comprehensive R Archive Network, and MATLAB toolboxes developed at MathWorks.

Limitations and Challenges

HMMs assume conditional independence of observations given states and geometric (memoryless) state-duration properties inherent in first-order transitions, limitations highlighted in critiques from research at University of Edinburgh and University College London. They can struggle with long-range dependencies addressed by recurrent neural networks developed by groups at Google DeepMind and Facebook AI Research, and parameter estimation can overfit with limited data prompting Bayesian regularization from teams at University of Washington. Scalability and numerical stability issues require careful engineering—techniques like scaling in the forward algorithm and sparse approximations were advanced by practitioners at Microsoft Research and Amazon Web Services.

Category:Statistical models