Hamming distance

Hamming distance
AI-generated (Stable Diffusion 3.5) · CC BY 4.0 · source
Name	Hamming distance
Notation	d_H
Field	Information theory
Introduced	1950s
Introduced by	Richard Hamming

Contents

Definition and basic properties
Examples and simple calculations
Mathematical formulation and variants
Applications in coding theory and communications
Computational methods and algorithms
Related metrics and generalizations

Hamming distance The Hamming distance is a measure of difference between two equal-length strings, introduced by Richard Hamming in the context of error-detecting and error-correcting codes during his work at Bell Telephone Laboratories in the 1950s. It underpins classical results in Claude Shannon's information theory, influences constructions in Richard Hamming's coding theory, and connects to combinatorial designs studied by Paul Erdős and William Tutte. The metric appears across applications from NASA telemetry to AT&T switching systems and relates to algebraic structures used by Emil Artin and Noam Chomsky-inspired formal language theory.

Definition and basic properties

The Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols differ, a nonnegative integer satisfying the properties of a metric: identity, symmetry, and triangle inequality. In the binary-case context used by Claude Shannon and Norbert Wiener, the distance equals the number of bit flips; in linear codes as studied by Elwyn Berlekamp and Marcel Golay, the minimum Hamming distance determines error-correcting capability. The metric is invariant under coordinate permutations, a property exploited in work by Richard Bellman and Donald Knuth on combinatorial optimization and algorithm analysis.

Examples and simple calculations

For binary strings of length n, simple examples include distances such as between 0000 and 1111 (distance 4) and between 0101 and 1100 (distance 2), illustrating basic counting arguments familiar from texts by Thomas Cover and Joyce Fletcher. In block codes like the Hamming code family developed by Richard Hamming, a single-bit error produces distance 1 from the valid codeword, a phenomenon exploited in practical systems at IBM and Xerox research labs. For nonbinary alphabets encountered by Richard Karp and Gordon Welchman, symbol-wise comparison yields analogous distances used in sequence analysis in projects at Cold Spring Harbor Laboratory and European Bioinformatics Institute.

Mathematical formulation and variants

Formally, for vectors x, y in an alphabet A^n, the Hamming distance d_H(x,y) = |{i : x_i ≠ y_i}|; when A is the finite field GF(q) studied by Emil Artin and Evariste Galois, the distance interacts with linear-span properties of codes by Vladimir Levenshtein and Gaston Julia. Variants include Hamming weight (distance to the zero vector) and normalized Hamming distance used in asymptotic analyses by Richard Gallager and John Nash. Weighted Hamming distances assign coordinate-specific costs, a generalization appearing in optimization problems tackled by George Dantzig and Kenneth Arrow, while edit-distance comparisons like Levenshtein distance studied by Vladimir Levenshtein contrast insertion–deletion operations with pure substitution counts.

Applications in coding theory and communications

Hamming distance is central to block-code parameters [n,k,d] where the minimum distance d determines error detection and correction capabilities explored by Richard Hamming, Elwyn Berlekamp, and Andrew Viterbi. In channel coding theory developed by Claude Shannon and Robert Gallager, bounds like the Hamming bound and Gilbert–Varshamov bound relate packing in Hamming spaces to achievable rates studied by David Forney and Seymour Ginsburg. Practical applications include satellite telemetry systems at NASA, deep-space communication projects influenced by Elliott H. Lieb-adjacent engineering groups, wireless standards developed by Qualcomm and Ericsson, and storage technologies advanced at Seagate and Western Digital where error-correcting codes rely on Hamming-distance-based decoding algorithms.

Computational methods and algorithms

Computing Hamming distances efficiently is essential in algorithms by Donald Knuth and implementations in processor designs by Intel and AMD; bitwise XOR plus population-count instruction yields hardware-accelerated distance calculations. Nearest-neighbor search in Hamming spaces uses locality-sensitive hashing approaches related to work by Piotr Indyk and Andrei Broder, and indexing structures for approximate matching appear in bioinformatics pipelines at European Bioinformatics Institute and Broad Institute. Complexity-theoretic aspects connect to reductions studied by Richard Karp and Leslie Valiant, while parallel algorithms and GPU implementations have been explored by researchers at NVIDIA and Lawrence Berkeley National Laboratory.

Generalizations include Levenshtein distance, Damerau–Levenshtein distance, and metrics on permutation spaces as used in research by Donald Knuth and Persi Diaconis; these extend Hamming's substitution-only model to insertions, deletions, and transpositions relevant in computational biology at Salk Institute and linguistics models inspired by Noam Chomsky. In coding theory, rank metric and Lee metric, studied by Emanuel Thorp and Pless, provide alternate error models for applications in network coding explored by Rudolf Ahlswede and Robert Koetter. Connections to combinatorial designs, sphere-packing in Hamming cubes, and bounds proven by Paul Erdős and Alfréd Rényi further illustrate the metric's role across discrete mathematics and engineering.

Category:Metrics (mathematics) Category:Coding theory Category:Information theory

Definition and basic properties

Examples and simple calculations

Mathematical formulation and variants

Applications in coding theory and communications

Computational methods and algorithms

Related metrics and generalizations