Entropy (information theory)

Entropy (information theory)
Name	Entropy (information theory)
Field	Claude Shannon, Information theory, Mathematics
Introduced	1948
Related	Thermodynamics, Probability theory, Coding theory

Contents

Definition and intuitive interpretation
Mathematical formulation
Properties and inequalities
Relationship to probability and coding
Extensions and generalizations
Applications and examples

Entropy (information theory) Entropy in information theory, introduced by Claude Shannon in 1948, quantifies the average uncertainty or information content associated with outcomes of a random variable. It connects foundational figures and institutions such as Norbert Wiener, Bell Labs, MIT, Harvard University, Princeton University and mathematical developments like Shannon–Hartley theorem, Nyquist rate, Kolmogorov complexity and Gibbs free energy; it also influenced applied fields including telecommunications, cryptography, computer science, statistical mechanics and signal processing.

Definition and intuitive interpretation

Entropy measures expected surprise: for a discrete source modeled by a random variable X taking values x_i with probabilities p_i, the entropy quantifies how many binary decisions are needed on average to specify an outcome. Shannon framed this in terms of communication through channels studied at Bell Labs and related to limits later formalized by the Noisy-channel coding theorem and the Source coding theorem. Intuitively, distributions concentrated on a single outcome (examples in Pareto distribution extremes or degenerate measures studied in Kolmogorov complexity) have low entropy, while uniform distributions such as equiprobable messages in Lottery analogies have maximal entropy. Historical correspondence between Claude Shannon and contemporaries at AT&T and exchanges with John von Neumann helped cement the interpretation; von Neumann famously suggested using the term "entropy" because of its established role in Statistical mechanics and the Second law of thermodynamics.

Mathematical formulation

For a discrete variable X with alphabet {x_1,...,x_n} and probability mass function p(x_i)=p_i, the entropy H(X) is defined as H(X) = -∑_{i=1}^n p_i log p_i, where the logarithm base determines units (bits for base 2, nats for base e). In continuous settings one uses differential entropy h(X)= -∫ f(x) log f(x) dx for densities f(x), related to limits studied in Central limit theorem contexts and analyses in Fourier analysis on Wiener process paths. The definition generalizes to joint entropy H(X,Y), conditional entropy H(X|Y) and mutual information I(X;Y)=H(X)+H(Y)-H(X,Y), quantities central to the Shannon–Hartley theorem and exploited in proofs by researchers at Bell Labs and in textbooks from Cambridge University Press and Oxford University Press. Entropy also connects formally to the Kullback–Leibler divergence D_{KL}(P||Q) via identities used in learning theory at institutions like Carnegie Mellon University and Stanford University.

Properties and inequalities

Entropy satisfies key properties: nonnegativity H(X)≥0, maximality for uniform distributions, and additivity or subadditivity expressed by H(X,Y)≤H(X)+H(Y). The chain rule H(X,Y)=H(X)+H(Y|X) underpins proofs of the Data processing inequality and bounds such as Fano's inequality, used in analyses by researchers affiliated with Bell Labs and IBM Research. Shannon's coding theorems rely on asymptotic equipartition property (AEP), which links to the Law of large numbers and inequalities like Gibbs' inequality and concavity of entropy as a function on the simplex; these proofs often cite methods developed at Princeton University and University of Cambridge. Strong subadditivity and monotonicity properties find application in quantum extensions explored at Institute for Advanced Study and in mathematical physics groups at Harvard University.

Relationship to probability and coding

Entropy bounds the average length of optimal codes via the source coding theorem: for a memoryless source with entropy H, no lossless code can achieve average length below H bits per symbol, while codes such as Huffman codes and arithmetic coding approach H, results developed and disseminated across Bell Labs, MIT Press publications and industry implementations at AT&T and Microsoft Research. Entropy rate generalizes to stochastic processes including Markov chains studied in works from Cambridge University Press and ergodic theory treated at University of Chicago. Mutual information I(X;Y) quantifies reduction in uncertainty about X given observations Y and appears in channel capacity formulas like the Shannon–Hartley theorem for Gaussian channels and the Binary symmetric channel capacity; these principles guided engineers at NASA and standards bodies such as IEEE.

Extensions and generalizations

Generalizations include Rényi entropies H_α and Tsallis entropy S_q, introduced by Alfréd Rényi and Constantino Tsallis respectively, which parametrize sensitivity to distribution tails and are applied in contexts ranging from fractal analysis studied at Max Planck Institute to nonextensive statistical mechanics explored at Los Alamos National Laboratory. Quantum analogues—von Neumann entropy S(ρ)= -Tr(ρ log ρ)—were developed in quantum information theory by researchers at Bell Labs and Caltech and tie to quantum channel capacities and entanglement measures used by groups at Perimeter Institute and CERN. Relative entropy and f-divergences link to decision theory and hypothesis testing, with methods refined at Columbia University and Yale University in statistical inference and machine learning research hubs like Google DeepMind.

Applications and examples

Entropy plays central roles across disciplines: in coding and compression (Huffman, arithmetic) used by companies such as Apple Inc. and Google LLC; in cryptography where unpredictability analyzed in standards from NIST and IETF; in statistical mechanics connecting to the Boltzmann equation and Gibbs measure; in neuroscience for neural coding research at Max Planck Institute for Brain Research and Salk Institute; in linguistics for entropy rates of natural language studied at Stanford University and University of Edinburgh; and in ecology for diversity indices linked to Rényi entropy in studies published by Nature (journal) and Science (journal). Concrete examples include the entropy of a fair coin (1 bit) and of English text estimated via experiments by Claude Shannon and later by researchers at Harvard University and MIT. Entropy-based metrics underpin modern machine learning objectives, regularization terms in variational inference developed at UC Berkeley and DeepMind, and information bottleneck methods researched at Google Brain.

Category:Information theory