Analysis of variance

Analysis of variance
AI-generated (Stable Diffusion 3.5) · CC BY 4.0 · source
Name	Analysis of variance
Invented by	Ronald Fisher
Year	1925
Field	Statistics
Related	Regression analysis, Experimental design

Contents

Introduction
History and development
Theory and assumptions
Types of ANOVA and extensions
Computation and interpretation
Applications and examples
Limitations and alternatives

Analysis of variance

Analysis of variance is a statistical method for comparing group means and partitioning observed variance among sources of variation. It is central to experimental design and inference in fields ranging from agriculture to psychology, and underpins techniques developed at institutions such as Rothamsted Experimental Station, University of Cambridge, and Imperial College London. Key figures associated with its development and dissemination include Ronald Fisher, Gertrude Cox, Frank Yates, John Tukey, and George Box.

Introduction

ANOVA decomposes total variability into components attributable to different factors and interactions, enabling hypothesis tests about mean differences across levels of categorical predictors. The method is foundational to designs advanced at Rothamsted Experimental Station and taught in curricula at University of Oxford, Harvard University, and Stanford University. It links conceptually and procedurally to methods popularized by Karl Pearson, Fisher-era work at Mendelian genetics labs, and later extensions promoted at Bell Labs and IBM Research.

History and development

The formal framework emerged in the 1920s and 1930s, when Ronald Fisher synthesized ideas from agricultural experiments at Rothamsted Experimental Station and statistical theory at University of Cambridge. Subsequent development involved Frank Yates's practical randomization schemes, Gertrude Cox’s applied experimental designs at Iowa State University, and John Tukey’s exploratory data analysis contributions at Princeton University. Later computational and matrix formulations were advanced by researchers at Bell Labs, IBM Research, University of Chicago, and Bellcore, while modern software implementations were driven by projects at AT&T Bell Laboratories, SAS Institute, R Project for Statistical Computing, and MATLAB.

Theory and assumptions

ANOVA models partition sums of squares into components associated with factors and residuals, relying on linear model theory articulated by Gauss and extended by Fisher and Pearson. Core assumptions include independence (invoked in designs promoted by Fisher and randomized trials at London School of Hygiene & Tropical Medicine), normality (classical foundations tied to Carl Friedrich Gauss's normal distribution work), and homoscedasticity (equal variances), considerations debated in texts by Neyman and Jerzy Neyman's contemporaries. The F-distribution used for testing was characterized by Harald Cramér and applied in the Fisherian framework; estimation of fixed and random effects draws on writings by Charles Roy Henderson and later work at Iowa State University.

Types of ANOVA and extensions

Common forms include one-way ANOVA, two-way ANOVA with interaction, repeated measures ANOVA, and nested ANOVA—techniques refined in experimental programs at Rothamsted Experimental Station, Iowa State University, and USDA. Extensions encompass mixed-effects models popularized by George Box and Douglas Bates, multivariate ANOVA (MANOVA) influenced by Harold Hotelling, and generalized linear models developed by John Nelder and Robert Wedderburn. Nonparametric alternatives such as the Kruskal–Wallis test relate to work by W. H. Kruskal and M. Friedman, while modern high-dimensional and permutation-based variants credit contributions from Bradley Efron, Jerome Friedman, and research teams at Stanford University and University of California, Berkeley.

Computation and interpretation

ANOVA computation typically tabulates sums of squares, degrees of freedom, mean squares, and F-statistics, as formalized in textbooks from Princeton University and Cambridge University Press. Implementation in software frameworks such as R Project for Statistical Computing, SAS Institute, SPSS, and Stata incorporates model fitting routines influenced by algorithms from G. H. Golub and numerical linear algebra advances at Massachusetts Institute of Technology. Interpreting main effects and interactions often invokes the experimental paradigms advanced at Iowa State University and reporting standards from journals like Biometrika and Journal of the Royal Statistical Society.

Applications and examples

ANOVA has been applied in agricultural trials at Rothamsted Experimental Station, clinical trials coordinated by National Institutes of Health, industrial quality studies at General Electric, and psychological research at University of Chicago and Columbia University. Examples include comparing fertilizer treatments in agronomy experiments overseen by Ronald Fisher, assessing drug dosages in trials funded by National Health Service (England), and evaluating manufacturing processes at Toyota Motor Corporation and Ford Motor Company. Multivariate and repeated-measures ANOVA are used in neuroimaging research at National Institutes of Health and experimental psychology studies at Yale University.

Limitations and alternatives

Limitations include sensitivity to violated assumptions noted by John Tukey and the tendency to conflate statistical and practical significance discussed in venues like Royal Statistical Society meetings. Alternatives and complements include regression analysis at University of California, Berkeley, generalized linear models advocated by John Nelder and Robert Wedderburn, nonparametric tests from W. H. Kruskal and M. Friedman, and Bayesian hierarchical models developed in work by Andrew Gelman and groups at Columbia University. Robust and permutation approaches used in modern genomics research cite implementations from Broad Institute and computational frameworks from Stanford University.

Category:Statistical methods