Maximum likelihood estimation

Maximum likelihood estimation
Name	Maximum likelihood estimation
Field	Statistics
Introduced	20th century
Inventor	Sir Ronald A. Fisher
Applications	Statistics, econometrics, biostatistics, machine learning

Contents

Introduction
Mathematical formulation
Properties and theoretical results
Computational methods and algorithms
Applications and examples
Limitations and extensions

Maximum likelihood estimation is a statistical method for estimating parameters of probabilistic models by maximizing the likelihood function given observed data. Developed and popularized in the early 20th century, it is associated with foundational figures and institutions in modern statistics and has influenced fields from Royal Society-era theoretical work to contemporary Google-scale inference. MLE connects to asymptotic theory developed in the contexts of the Princeton University mathematics community and the analytical traditions at University of Cambridge.

Introduction

Maximum likelihood estimation was introduced by Sir Ronald A. Fisher and further analyzed by scholars at Harvard University, University of Chicago, and Columbia University. It operates by selecting parameter values that make the observed data most probable under a chosen model, a principle applied across research at organizations such as Bell Labs, IBM research, and laboratories at National Institutes of Health. Historical debates involving figures like Jerzy Neyman and Egon Pearson shaped the interpretation of likelihood methods alongside competing approaches developed at University of Edinburgh and London School of Economics.

Mathematical formulation

Given a parametric model with probability density or mass function f(x; θ), the likelihood L(θ|x) = ∏ f(x_i; θ) for independent observations x_1,...,x_n is maximized with respect to θ. The score function and Fisher information matrix appear in derivations linking MLE to asymptotic normality; these results were formalized in work connected to Institute for Advanced Study and influenced by mathematicians at ETH Zurich, University of Göttingen, and École Normale Supérieure. For regular models, log-likelihood derivatives lead to the estimating equations often solved using calculus and linear algebra techniques employed at places like Massachusetts Institute of Technology and Stanford University.

Properties and theoretical results

Under regularity conditions, MLEs are consistent, asymptotically normal, and asymptotically efficient, achieving the Cramér–Rao lower bound in many settings. These theorems were proved and extended in the literature associated with scholars from Princeton University and Yale University, and applied in landmark studies at Johns Hopkins University and University of California, Berkeley. Extensions such as the Wilks’ theorem and likelihood ratio tests are central in inferential frameworks used at institutions like World Health Organization and in major trials reviewed by Food and Drug Administration. Counterexamples involving nonregular models appeared in research tied to Bell Labs and led to refinements in robust estimation studied at Columbia University.

Computational methods and algorithms

Practical maximization of likelihoods employs numerical optimization algorithms like Newton–Raphson, Fisher scoring, Expectation–Maximization (EM), and quasi-Newton methods; these algorithms were developed and refined across research groups at AT&T Laboratories, Microsoft Research, and Google Research. Implementation and convergence theory have been advanced within software projects at University of Oxford and collaborations with teams at Carnegie Mellon University and University of Washington. High-dimensional and constrained problems leverage techniques such as stochastic gradient descent used in industrial labs at Facebook and coordinate ascent methods seen in publications from Duke University and Imperial College London.

Applications and examples

MLE underlies parameter estimation in logistic regression models popularized in applied work at Centers for Disease Control and Prevention, survival analysis routines used in oncology studies at Dana-Farber Cancer Institute, and mixed-effects models applied in agricultural experiments tied to Iowa State University. In econometrics, MLE variants are central to maximum-likelihood estimation of discrete choice models developed in collaborations involving National Bureau of Economic Research and Princeton University. In engineering and signal processing, MLE is employed in radar and sonar problems researched at Massachusetts Institute of Technology Lincoln Laboratory and in communications systems advanced by Bell Labs.

Limitations and extensions

Limitations of MLE include sensitivity to model misspecification, multimodality of likelihood surfaces, and small-sample bias; these issues prompted developments such as penalized likelihood, Bayesian methods, and profile likelihood approaches studied at University College London, University of Cambridge, and Swiss Federal Institute of Technology in Lausanne. Robust alternatives and regularization techniques like ridge and lasso were developed in part by researchers affiliated with Stanford University and Columbia University to address high-dimensional settings. Semiparametric and nonparametric extensions have been advanced in research centers including Sciences Po and University of California, Los Angeles.

Category:Statistical inference