Fisher information

Fisher information
AI-generated (Stable Diffusion 3.5) · CC BY 4.0 · source
Name	Fisher information
Field	Statistics, Information theory
Introduced	1920s
Introduced by	Ronald A. Fisher

Contents

Definition and Intuition
Mathematical Formulation
Properties and Identities
Estimation and the Cramér–Rao Bound
Examples and Applications
Extensions and Generalizations

Fisher information Fisher information quantifies the amount of information that an observable random variable carries about an unknown parameter in a statistical model. Developed in the 1920s by Ronald A. Fisher, it plays a central role in estimation theory, asymptotic analysis, and hypothesis testing, connecting to concepts from Andrey Kolmogorov to Norbert Wiener and appearing across applications from Alan Turing's wartime efforts to modern work at institutions like Bell Laboratories and INRIA.

Definition and Intuition

Fisher information measures sensitivity of a likelihood to changes in a parameter, capturing how quickly probabilities change with respect to that parameter; it underpins notions used by Karl Pearson, Jerzy Neyman, Egon Pearson, and Harald Cramér. Intuitively, high Fisher information corresponds to data that sharply distinguish nearby parameter values, a principle exploited by researchers at Princeton University, University of Cambridge, Harvard University, Massachusetts Institute of Technology, and University of Oxford. Connections run to the work of John von Neumann, Émile Borel, Andrey Kolmogorov, Alfred Tarski, and methods used at Royal Society meetings, shaping developments in the Institute for Advanced Study and laboratories like Los Alamos National Laboratory and Sandia National Laboratories.

Mathematical Formulation

For a parametric family of probability densities p(x; θ), Fisher information is defined via the score function. This formalism was refined in texts by C. R. Rao, Harald Cramér, Jerzy Neyman, and Egon Pearson, and elaborated in monographs from Princeton University Press and Cambridge University Press. The Fisher information I(θ) equals the variance of the score or negative expected second derivative of the log-likelihood; these expressions appear in treatments by S. N. Bernstein, Andrey Kolmogorov, Norbert Wiener, Alfréd Rényi, and Claude Shannon. In multivariate settings the Fisher information becomes a matrix, a notion used by researchers at Stanford University, University of California, Berkeley, Columbia University, and Yale University for inference in complex models.

Properties and Identities

Fisher information obeys additivity for independent samples, a property highlighted in the work of Ronald A. Fisher and developed further by C. R. Rao, Harald Cramér, and Jerzy Neyman. It satisfies invariance under reparameterization through the chain rule, an identity used in derivations by Élie Cartan-inspired geometers and by authors at Oxford University Press and Wiley. The information inequality links Fisher information to variance bounds derived by Harald Cramér and Jerzy Neyman, while information monotonicity under coarsening relates to principles studied by Pierre-Simon Laplace, Simeon Denis Poisson, Arthur Cayley, and groups at Bell Labs. Matrix inequalities for Fisher information connect to results by John von Neumann, Marshall Hall Jr., E. H. Moore, and researchers at MIT Lincoln Laboratory.

Estimation and the Cramér–Rao Bound

The Cramér–Rao bound provides a lower bound on the variance of unbiased estimators in terms of Fisher information, originally proved by Harald Cramér and linked to earlier ideas of Ronald A. Fisher and C. R. Rao. Efficient estimators achieving the bound asymptotically include maximum likelihood estimators, studied by Jerzy Neyman, Egon Pearson, Herman Wold, and modern statisticians at Johns Hopkins University and University College London. The bound informs design of experiments, a theme in the work of Ronald A. Fisher and later teams at Bell Laboratories, IBM Research, and Siemens. Extensions to biased estimation and Bayesian bounds connect to studies by Harold Jeffreys, Dennis Lindley, Bruno de Finetti, and the Royal Statistical Society.

Examples and Applications

Fisher information appears in parametric models such as the normal, exponential, binomial, and Poisson distributions, examples treated in classic texts by Harald Cramér, C. R. Rao, G. S. Watson, and Jerzy Neyman. Applications span signal processing at Bell Laboratories and MIT Lincoln Laboratory, image reconstruction in work at NASA and European Space Agency, and genomics at Cold Spring Harbor Laboratory and Broad Institute. In econometrics it informs estimators used by researchers at London School of Economics and University of Chicago; in physics it connects to quantum metrology at Perimeter Institute and Niels Bohr Institute. Fields from epidemiology at Centers for Disease Control and Prevention to machine learning groups at Google Research and DeepMind exploit Fisher information for parameter tuning, natural gradient methods, and uncertainty quantification, following developments at University of Toronto and Carnegie Mellon University.

Extensions and Generalizations

Generalizations include the Fisher information metric in information geometry developed by Shun'ichi Amari and Nicholas C. Jewell, the quantum Fisher information studied by Carl W. Helstrom and Alexander Holevo, and local asymptotic normality frameworks by Lucien Le Cam and David Blackwell. Robust versions and Rényi-type divergences trace to Alfréd Rényi and modern robust statistics programs at University of Minnesota and ETH Zurich. Connections extend to entropy and Kullback–Leibler divergence explored by Claude Shannon, Solomon Kullback, Richard A. Leibler, and methods used by Bell Labs engineers and AT&T researchers. Recent work links Fisher-type quantities to optimal transport studied by Cédric Villani and to geometric analysis at Institute for Advanced Study and Clay Mathematics Institute.

Category:Statistics