Chi-square distribution

Chi-square distribution
Name	Chi-square distribution
Type	Continuous probability distribution
Parameters	Degrees of freedom (k)
Support	[0, ∞)
Pdf	(1/(2^{k/2} Γ(k/2))) x^{k/2-1} e^{-x/2}
Variance	2k

Contents

Definition and basic properties
Probability density and cumulative distribution
Moments and characteristic functions
Relationships to other distributions
Estimation and hypothesis testing applications
Multivariate and generalized chi-square distributions

Chi-square distribution is a continuous probability distribution commonly used in statistical inference, named after the Greek letter chi. It arises in contexts including sampling theory, analysis of variance, and goodness-of-fit tests connected to the works of Karl Pearson, Ronald Fisher, Jerzy Neyman, Egon Pearson, and institutions such as University College London and University of Cambridge. The distribution plays central roles in inferential procedures implemented in software by organizations like The R Project for Statistical Computing, SAS Institute, and Python Software Foundation-linked libraries.

Definition and basic properties

The distribution is defined for a nonnegative variable and parameterized by positive degrees of freedom k, a scalar often linked to sample size or parameter counts in models from Student's t-distribution derivations and historical developments by William Sealy Gosset, Fisher, and Pearson family investigations. It is the distribution of the sum of squares of k independent standard normal random variables, a fact used in proofs in texts by Andrey Kolmogorov, André-Marie Ampère-era probability foundations, and modern expositions at institutions such as Massachusetts Institute of Technology and Stanford University. Key properties include additivity under independent components (sum of independent chi-square variables yields a chi-square with summed degrees of freedom) and nonnegativity of support, which underpins tests in laboratories like Los Alamos National Laboratory and analytical frameworks in agencies like National Institutes of Health.

Probability density and cumulative distribution

The probability density function (pdf) for k degrees of freedom is given by an expression involving the gamma function Γ, used in classic texts by Leonhard Euler and applied in computational routines by John von Neumann and Alan Turing-inspired algorithms. The cumulative distribution function (cdf) is expressed via the lower incomplete gamma function, a special function studied by Carl Friedrich Gauss and implemented in libraries from GNU Project and Intel Corporation. Evaluating tail probabilities is critical in hypothesis testing procedures designed by Neyman and Egon Pearson for controlled error rates in experiments at institutions such as Bell Labs and IBM Research.

Moments and characteristic functions

The mean and variance are simple functions of k (mean = k, variance = 2k), relations appearing in lecture notes from Princeton University and Harvard University. Higher central moments and cumulants involve polygamma and gamma functions developed by Adrien-Marie Legendre and later tabulated by NIST-related projects. The moment-generating function (mgf) and characteristic function (cf) have closed forms: mgf M(t) = (1-2t)^{-k/2} for t<1/2, and cf φ(t) = (1-2it)^{-k/2}, formulas appearing in treatises by Srinivasa Ramanujan's successors and used in asymptotic analyses in publications from Oxford University Press.

Relationships to other distributions

The chi-square distribution is related to numerous named distributions: it is a special case of the gamma distribution studied by Pierre-Simon Laplace and linked to the exponential family promoted in works by Jerzy Neyman. Ratios involving chi-square variables produce the F-distribution introduced by Ronald Fisher and the Student's t-distribution linked to William Sealy Gosset; these relationships underpin ANOVA methods in texts by Fisher and subsequent extensions by John Tukey. In multivariate contexts, connections to the Wishart distribution developed by John Wishart and to Hotelling's T-squared statistic from Harold Hotelling are central in multivariate analysis courses at Columbia University and University of Chicago.

Estimation and hypothesis testing applications

Chi-square-based tests form the basis of Pearson's chi-squared goodness-of-fit test and contingency table analyses, procedures pioneered by Karl Pearson and refined by Fisher and Neyman and routinely applied in epidemiological studies at Centers for Disease Control and Prevention and clinical trials regulated by Food and Drug Administration. Confidence intervals for variance in normal models use chi-square quantiles, a method documented in statistical standards by ISO and textbooks from Wiley and Springer. In genetics and population studies influenced by work at Cold Spring Harbor Laboratory and Max Planck Society, chi-square tests assess allele frequency deviations and linkage disequilibrium under models presented by Ronald Fisher and J.B.S. Haldane.

Multivariate and generalized chi-square distributions

Extensions include the generalized chi-square distribution arising from quadratic forms in normal variables, studied in depth in mathematical statistics by scholars at Institute of Mathematical Statistics and Royal Statistical Society. The noncentral chi-square distribution, introduced in noncentrality parameter contexts by Andrey Kolmogorov-era theory and elaborated by Fisher, appears in power calculations for tests used by National Science Foundation-funded projects. Multivariate analogues such as the Wishart distribution and Hotelling's T-squared provide matrix-valued generalizations applied in neuroimaging at National Institutes of Health and machine learning work at Google LLC and DeepMind.

Category:Probability distributions