Glivenko–Cantelli theorem

Glivenko–Cantelli theorem
Name	Glivenko–Cantelli theorem
Field	Probability theory
Introduced	1933
Authors	Valery Glivenko; Francesco Cantelli
Related	Law of large numbers; Donsker theorem; Vapnik–Chervonenkis theory

Contents

Statement
Historical background
Proofs and variants
Applications
Generalizations and related results
Examples and counterexamples

Glivenko–Cantelli theorem The Glivenko–Cantelli theorem is a fundamental result in Probability theory and Statistics concerning the uniform convergence of the empirical distribution function to the true distribution function. It establishes that, for independent identically distributed observations, the supremum norm of the difference between empirical and population distribution functions converges almost surely to zero. The theorem underpins consistency results used across Kolmogorov's framework, Borel's work, and later developments by Doob and Levy.

Statement

Let X1, X2, ... be independent identically distributed random variables with cumulative distribution function F on the real line. Denote by Fn the empirical distribution function based on X1,...,Xn. The Glivenko–Cantelli theorem states that sup_x |Fn(x) − F(x)| → 0 almost surely as n → ∞. This result complements the Law of large numbers and connects to Kolmogorov's strong law and Donsker theorem in providing uniform convergence in the sup-norm topology over ℝ.

Historical background

Valery Glivenko published an early form in 1933 in Soviet Union journals, contemporaneous with Francesco Cantelli's related 1933 note in Italy. The result extended antecedents from Bernoulli's law and the contributions of Kolmogorov on empirical processes. Subsequent exposition and formalization were influenced by work of Feller, Doob, and Levy, while later probabilists such as Donsker, Prokhorov, Skorokhod, and Kolmogorov himself placed the theorem within a broader functional limit theory. The theorem has been cited in literature pertaining to Central Limit Theorem, Ergodic theory, and the development of modern statistical learning by Vapnik and Chervonenkis.

Proofs and variants

Classical proofs use combinatorial partitions of the real line and union bounds with the Hoeffding or Dvoretzky–Kiefer–Wolfowitz inequality yielding exponential concentration. Alternative proofs exploit orthogonal expansions linked to Karhunen–Loève theorem and martingale approaches from Doob's inequalities. Functional-analytic variants situate the theorem in the setting of empirical processes as developed by Donsker and Prokhorov, while modern proofs invoke covering numbers and entropy methods from Vapnik–Chervonenkis theory. Strengthened forms include distribution-free exponential bounds by Massart and extensions via Talagrand's concentration inequalities.

Applications

The theorem provides a foundation for consistency of nonparametric estimators used in contexts such as Kolmogorov–Smirnov test statistics, bootstrap methods advocated by Efron, and uniform convergence guarantees in statistical learning addressed by Vapnik and Blumer. It supports asymptotic justification of empirical risk minimization in frameworks influenced by Turing's computational ideas and theoretical machine learning results in MIT, Stanford University, and research by Cortes and Vapnik. In econometrics it underlies estimation procedures referenced in works from Cowles Commission scholars and in survival analysis traditions originating with Kaplan and Meier.

Several generalizations treat classes of sets or functions: the uniform law over Glivenko–Cantelli classes ties to Vapnik–Chervonenkis classes and combinatorial dimensions appearing in Sauer's lemma and Pollard's empirical processes text. Multivariate extensions relate to Prokhorov metric and weak convergence as developed by Skorokhod and Billingsley. Functional analogues connect to the Donsker theorem and invariance principles by Komlós, Major, and Tusnády. Connections exist with large deviations results by Cramér and concentration phenomena formalized by Ledoux and Boucheron.

Examples and counterexamples

Examples: For i.i.d. samples from classical distributions studied by Gauss, Poisson, Bernoulli, and Exponential families, empirical distribution functions satisfy the theorem, enabling practitioners in institutions like Bell Labs and Bell Institute to apply uniform convergence results. Counterexamples: Pathological index sets or dependent sequences, such as certain stationary processes analyzed by Kolmogorov and Sinai, can violate the uniform convergence without additional mixing conditions; similarly, function classes with infinite Vapnik–Chervonenkis dimension fail to be Glivenko–Cantelli classes as shown in works by Vapnik and Chervonenkis.

Category:Probability theorems