Shannon entropy — LLMpedia

Shannon entropy
AI-generated (Stable Diffusion 3.5) · CC BY 4.0 · source
Name	Shannon entropy
Field	Information theory
Introduced	1948
Introduced by	Claude Shannon

Contents

Definition and formalism
Properties and interpretations
Examples and calculations
Relation to information theory and coding
Extensions and generalizations
Applications and practical uses

Shannon entropy is a foundational measure in information theory introduced in 1948 by Claude Shannon. It quantifies the average uncertainty or information content of a discrete probability distribution and underpins results in communication theory, cryptography, statistical mechanics, and computer science. The concept connects to work by contemporaries and successors in Bell Labs, MIT, and other institutions where research in telecommunications, cybernetics, and signal processing advanced.

Definition and formalism

Shannon entropy is defined for a discrete random variable with probabilities p_i as H = -∑ p_i log p_i, a formula appearing in Shannon's 1948 paper at Bell Labs and later textbooks from MIT Press and authors affiliated with Princeton University and Stanford University. The logarithm base choice relates to units such as bits (base 2) tied to binary numeral system implementations in IBM and ENIAC-era computing, or nats (base e) common in works from Princeton University and INRIA. The formalism uses expectations and sums familiar to researchers at Cambridge University and Harvard University who developed rigorous treatments in probability theory and measure theory.

Properties and interpretations

Entropy is nonnegative and maximal for the uniform distribution, a fact used in analyses by scholars at Bell Labs, University of California, Berkeley, and Columbia University who studied optimal signaling and uncertainty. It is concave in the probability vector, a property exploited in optimization results from Stanford University and California Institute of Technology researchers. Entropy satisfies chain rules and conditional decompositions referenced in courses at Massachusetts Institute of Technology and University of Oxford, and it relates to mutual information studied by teams at AT&T and Bell Labs. Interpretations include expected surprisal, average code length bounds appearing in proofs by Shannon and later expositions at University of Cambridge and Yale University, and connections to thermodynamic entropy discussed in texts by authors at Princeton University and University of Chicago.

Examples and calculations

Common examples include a fair coin (uniform two-outcome distribution) with entropy 1 bit, an unbiased die with entropy log2(6) bits often calculated in problem sets from Harvard University and University of Pennsylvania, and biased distributions treated in exercises from Stanford University and ETH Zurich. Calculations use probability mass functions appearing in case studies by Bell Labs engineers and statisticians at Columbia University who analyze source models for coding. Continuous analogues lead to differential entropy studied in seminars at University of California, Berkeley and Princeton University, where Gaussian distributions maximize entropy under variance constraints, a result compared to maximum-entropy formulations in work at Imperial College London.

Relation to information theory and coding

Entropy establishes lower bounds on average code length in lossless source coding theorems proved by Shannon and elaborated in textbooks from Wiley and Springer. It directly informs Huffman coding developed at MIT and arithmetic coding techniques applied in standards by ISO and ITU. Channel capacity theorems from Shannon's era at Bell Labs relate entropy to mutual information used by researchers at Nokia and Ericsson in designing reliable communication systems. Entropy also appears in rate-distortion theory advanced by researchers at Princeton University and McGill University for lossy compression and in universal coding methods explored by scholars at University of Cambridge and University of Tokyo.

Extensions and generalizations

Generalizations include Rényi entropy introduced by Alfréd Rényi and min-entropy used in cryptographic proofs at RSA-era research groups and universities such as Harvard University and MIT. Tsallis entropy, proposed in works associated with researchers at University of São Paulo and University of Rome, provides nonextensive generalizations applied in statistical physics studied at Los Alamos National Laboratory. Quantum analogues like von Neumann entropy are central in research at Caltech and Perimeter Institute for quantum information, while conditional, joint, and relative entropy (Kullback–Leibler divergence) figures in methods developed at Bell Labs and Bellcore and used in statistical inference at Princeton University.

Applications and practical uses

Practically, entropy guides design and analysis across telecommunications companies and research labs such as Bell Labs, AT&T, and Nokia for source coding and channel utilization, and in cryptography by practitioners at RSA and academic groups at University of Cambridge and Stanford University for randomness assessment. In machine learning, entropy-based criteria are used in decision tree algorithms developed at Carnegie Mellon University and ensemble methods studied at Google and Microsoft Research for feature selection and impurity measures. Entropy principles inform image and audio compression standards by MPEG and ISO and underpin statistical mechanics discussions in publications from Princeton University and Los Alamos National Laboratory.

Category:Information theory