Kolmogorov–Smirnov test

Kolmogorov–Smirnov test
Name	Kolmogorov–Smirnov test
Type	Nonparametric test
Introduced	1933
Creators	Andrey Kolmogorov; Nikolai Smirnov
Applications	Goodness-of-fit testing; two-sample comparison; model validation

Contents

History and development
Definitions and variants
Test statistic and null distribution
Hypothesis testing procedure and p-values
Power, alternatives, and limitations
Practical implementation and examples

Kolmogorov–Smirnov test The Kolmogorov–Smirnov test is a nonparametric statistical test for comparing an empirical distribution with a reference probability distribution or for comparing two empirical distributions. It provides a statistic based on the maximal difference between cumulative distribution functions and yields a distribution-free null distribution under certain conditions for inference. The test has broad use in fields ranging from statistical quality control to econometrics and climate science.

History and development

Andrey Kolmogorov and Nikolai Smirnov developed foundational work in probability theory and statistical testing during the interwar and Soviet periods, with Kolmogorov publishing measure-theoretic probability foundations in the 1930s and Smirnov extending empirical distribution theory; their efforts intersect with contemporaries such as Richard von Mises, Émile Borel, Paul Lévy, and Harald Cramér. Influential institutions including Moscow State University, the Steklov Institute, Princeton University, and the Institut Henri Poincaré fostered rigorous probability research that shaped the test alongside contributions from Jerzy Neyman, Egon Pearson, Ronald Fisher, and Jerzy Neyman. The test's mathematical development connected to limit theorems by Aleksandr Khinchin, Andrey Kolmogorov, and Paul Lévy, and its adoption in applied settings was propelled by statisticians at Bell Labs, RAND Corporation, IBM, and the United States Department of Agriculture. Subsequent improvements and computational implementations were influenced by John Tukey, William Gosset, Florence Nightingale David, Samuel Wilks, and David Cox, while modern treatment appears in textbooks by C. R. Rao, John A. Rice, and George Box.

Definitions and variants

The one-sample variant compares an empirical cumulative distribution function to a fully specified reference distribution introduced in classical contexts by Kolmogorov and elaborated by Smirnov, with parallels to tests by Pearson and Lilliefors; the two-sample variant compares two empirical cumulative distribution functions and links to Mann and Whitney's ranks-based approaches and to Wilcoxon. Extensions include the Lilliefors correction for estimated parameters introduced by Hubert Lilliefors, multivariate generalizations explored by Wassily Hoeffding and Vladimir Vapnik with connections to Vladimir Vapnik–Chervonenkis theory, and adaptations like the Kuiper test used in astronomical analyses by Carl Sagan-era practitioners and by astrophysicists at the European Southern Observatory and NASA. Other related procedures include Cramér–von Mises tests examined by Harald Cramér and Gunnar von Mises, Anderson–Darling tests developed by Theodore Anderson and Donald Darling, and the Shapiro–Wilk test by Samuel Shapiro and Martin Wilk; each variant emphasizes different weightings of distribution tails and sampling regimes encountered in research at Columbia University, Harvard University, and the University of Cambridge.

Test statistic and null distribution

The test statistic in the one-sample and two-sample forms is the supremum norm of differences between empirical cumulative distribution functions, a concept rooted in functional analysis taught at École Normale Supérieure and elaborated by Stefan Banach and John von Neumann. The null distribution of the supremum statistic is derived via Donsker's theorem and Brownian bridge theory linked to Norbert Wiener and Paul Lévy, and practical critical values were tabulated by Smirnov and later by Maurice Kendall, George Box, and Henry Scheffé. Asymptotic distributions connect to results by Andrey Kolmogorov and Nikolai Smirnov, and finite-sample corrections are informed by the works of Egon Pearson, Jerzy Neyman, and Samuel Wilks. Computational approximations rely on algorithms developed at Bell Labs and in software libraries from Bell Labs-affiliated researchers, the R Project for Statistical Computing, SAS Institute, and MathWorks, and are used in applied research at CERN, NASA Jet Propulsion Laboratory, and Los Alamos National Laboratory.

Hypothesis testing procedure and p-values

Hypothesis testing with the statistic follows Neyman–Pearson framework and is conventionally taught in courses at Princeton University, University of Chicago, and Stanford University; one formulates a null hypothesis of equality to a reference distribution or equality between two samples, computes the supremum difference, and compares to critical values tabulated by statisticians such as Maurice Kendall and David Cox. P-values are computed via asymptotic formulae by Kolmogorov and Smirnov or by exact finite-sample distributions studied by William Feller and Harald Cramér, and are routinely produced by statistical packages developed by R Core Team, SAS Institute, SPSS from IBM, and MATLAB from MathWorks. Adjustments for parameter estimation use Lilliefors' corrections and bootstrap methods developed by Bradley Efron at Stanford and colleagues at the University of Washington, with further refinements from researchers at University College London and the London School of Economics.

Power, alternatives, and limitations

Power analysis contrasts the test's sensitivity under alternatives studied by Jerzy Neyman and Egon Pearson and examines performance relative to Anderson–Darling, Cramér–von Mises, and Shapiro–Wilk tests; the Kolmogorov–Smirnov statistic is less sensitive to tail deviations emphasized in actuarial studies at Lloyd's of London and the Prudential Assurance Company and more sensitive to central deviations relevant to analyses at the Federal Reserve Board and European Central Bank. Limitations include reduced power in discrete settings encountered in epidemiological studies at the Centers for Disease Control and Prevention and in genetics research at the Broad Institute, and complications when parameters are estimated as in econometric applications at the International Monetary Fund and World Bank; remedies include permutation tests advanced by Ronald Fisher, bootstrap adjustments by Bradley Efron, and alternative metrics used in machine learning research at Google, DeepMind, and OpenAI.

Practical implementation and examples

Implementation is available via libraries and software maintained by the R Foundation, Python Software Foundation (SciPy), MATLAB (MathWorks), and commercial products by SAS Institute and IBM SPSS, all used by practitioners at CERN, NASA, NIH, and the European Medicines Agency. Example applications include goodness-of-fit testing in climate studies at the National Oceanic and Atmospheric Administration, model validation in econometrics at the Federal Reserve Bank of New York, and quality control in manufacturing at Toyota and General Motors; case studies appear in journals affiliated with the American Statistical Association, Royal Statistical Society, and Institute of Mathematical Statistics. Applied workflows adopt Monte Carlo simulation routines from Los Alamos National Laboratory, bootstrap protocols from Stanford, and diagnostic visualizations popularized by John Tukey and Edward Tufte.

Category:Statistical tests