Least Squares — LLMpedia

Least Squares
Name	Least squares
Caption	Geometric depiction of a linear fit minimizing squared residuals
Field	Numerical analysis; Statistics; Data science
Inventor	Adrien-Marie Legendre; Carl Friedrich Gauss
Introduced	1805–1809
Complexity	Depends on algorithm (e.g., O(n^3) for naive matrix methods)

Contents

History
Mathematical Formulation
Computational Methods
Statistical Properties and Inference
Extensions and Generalizations
Applications

Least Squares Least squares is a numerical technique for estimating parameters by minimizing the sum of squared residuals between observations and a specified model. It underpins parameter estimation in Astronomy, Geodesy, Econometrics, Psychometrics, and Machine learning, and is foundational to methods developed by Adrien-Marie Legendre and Carl Friedrich Gauss. The approach connects to later formalism in Émile Picard’s circles, the probabilistic interpretation used by Thomas Bayes adherents, and computational implementations popularized at institutions such as Bell Labs and Lawrence Livermore National Laboratory.

History

Early uses of least squares trace to astronomical problems addressed by Johannes Kepler and instruments at the Royal Observatory, Greenwich. Formal presentation appeared in 1805 by Adrien-Marie Legendre and independent advocacy by Carl Friedrich Gauss in 1809 for ephemeris computation tied to Ceres observations. Subsequent development intersected with the rise of Probability theory through figures like Pierre-Simon Laplace and with surveying needs handled by Ordnance Survey. In the 20th century, contributions from Andrey Kolmogorov, Norbert Wiener, and practitioners at IBM and AT&T shaped numerical stability and algorithmic practice, while controversies over attribution led to historical studies by scholars at Royal Society and Académie des sciences.

Mathematical Formulation

Given observations y_i and model functions f(x_i; β) parameterized by β, least squares finds β̂ that minimizes Σ_i (y_i − f(x_i; β))^2. In linear models with design matrix X and response vector y, the normal equations X^T X β̂ = X^T y yield the closed-form solution β̂ = (X^T X)^{-1} X^T y when X^T X is invertible; this algebraic structure relates to Carl Gustav Jacob Jacobi’s work and matrix theory advanced at École Polytechnique. The quadratic objective induces a projection of y onto the column space of X, linking to concepts developed by John von Neumann and David Hilbert in functional analysis. For nonlinear f, iterative solvers linearize about current estimates using Jacobians, techniques informed by research at Courant Institute and in the work of Raoul Bricard.

Computational Methods

Classical solution methods include normal-equation inversion, QR factorization, and singular value decomposition (SVD); QR algorithms were popularized in numerical libraries from Netlib and by researchers at Argonne National Laboratory. QR via Householder reflections or Givens rotations improves numerical stability compared with naive inversion, building on work by Alston Scott Householder and Wallace Givens. SVD, with roots in Eugenio Beltrami and Camille Jordan’s matrix theory and modern implementations influenced by developers at Numerical Algorithms Group, handles rank-deficiency robustly. Iterative approaches—conjugate gradient, Gauss–Newton, Levenberg–Marquardt—are widely used in large-scale problems at CERN and in industrial practice at Siemens. Software ecosystems like MATLAB, R, Python’s scientific stack, and libraries from Intel and NVIDIA implement optimized kernels exploiting multicore and GPU architectures.

Statistical Properties and Inference

Under assumptions of homoscedastic, uncorrelated Gaussian errors, least squares estimators coincide with maximum likelihood estimators as established in the literature influenced by Ronald Fisher and Jerzy Neyman. The Gauss–Markov theorem, proved in contexts advanced by Andrey Markov and later formalized in statistical curricula at institutions like Princeton University, states that ordinary least squares yields the best linear unbiased estimator (BLUE). Inference tools—standard errors, confidence intervals, hypothesis tests—use covariance estimators that may be adjusted for heteroscedasticity via techniques associated with researchers at Harvard University and University of Chicago. Robustification strategies, inspired by work from Peter Huber and Frank Hampel, modify loss functions to mitigate outliers, while bootstrap methods from Bradley Efron provide nonparametric uncertainty quantification.

Extensions and Generalizations

Extensions include weighted least squares (WLS) for heteroscedastic data, generalized least squares (GLS) for correlated errors related to studies at Johns Hopkins University, and regularized formulations—ridge regression and lasso—introduced by researchers at Hoerl and Kennard’s school and popularized in sparsity literature linked to David Donoho and Robert Tibshirani. Total least squares addresses errors-in-variables problems encountered in Metrology and instrumentation laboratories such as NIST. Nonlinear least squares generalizes to parameter estimation in Naval Research Laboratory and NASA mission planning; constrained least squares incorporates equality and inequality constraints in control applications studied at Massachusetts Institute of Technology. Bayesian formulations combine prior distributions from Thomas Bayes’s legacy with quadratic approximations used in posterior mode estimation.

Applications

Least squares permeates fields: orbit determination at Jet Propulsion Laboratory, photogrammetry at US Geological Survey, system identification in Siemens and Bosch, and signal processing at Bell Labs. In Econometrics, ordinary and instrumental variable least squares underpin empirical work at London School of Economics and University of Chicago. In Computer vision and Robotics, bundle adjustment and pose estimation rely on nonlinear least squares methods developed in research groups at Oxford University and Carnegie Mellon University. Machine learning pipelines use least squares in linear regression, kernel methods studied at Courant Institute and University of Toronto, and in training components of deep learning systems by teams at Google and Facebook. Geodetic networks, medical imaging reconstruction at Mayo Clinic and Johns Hopkins Hospital, and spectroscopy inversion in European Southern Observatory operations all depend on least squares formulations.

Category:Numerical analysis Category:Statistical estimation