F-statistics — LLMpedia

F-statistics
Name	F-statistics
Field	Population genetics
Introduced	1920s–1950s
Developed by	Sewall Wright; Ronald Fisher

Contents

Definition and Origins
Mathematical Formulation
Types and Variants (FST, FIT, FIS)
Estimation Methods and Software
Applications in Population Genetics
Limitations and Criticisms

F-statistics are a set of coefficients that quantify genetic structure and relatedness among populations and individuals, originating in early 20th-century quantitative genetics debates. They summarize patterns of allele frequency variation across hierarchies such as individuals, subpopulations, and total populations, and connect to pedigree-based measures of inbreeding, kinship, and genetic drift. Developed in the context of statistical genetics and evolutionary theory, these coefficients have been adapted and applied across empirical studies in fields from conservation biology to human population history.

Definition and Origins

F-statistics were formalized by Sewall Wright and placed in dialogue with contemporaries such as Ronald Fisher and J. B. S. Haldane during debates on genetic drift, selection, and inbreeding. Wright introduced coefficients to partition heterozygosity at nested levels—individual, subpopulation, total—and to link population subdivision to departure from Hardy–Weinberg expectations established by G. H. Hardy and Wilhelm Weinberg. Subsequent work by Motoo Kimura, Theodosius Dobzhansky, and C. C. Li extended these ideas into models used by institutions like the National Institutes of Health and research groups studying Drosophila, Arabidopsis, and human genetic variation projects led by the Human Genome Project and the International HapMap Project.

Mathematical Formulation

The classical formulation expresses coefficients as ratios of variance or probabilities of identity by descent, following Wright’s notation. Mathematically, they relate expected heterozygosity measures attributed to W. S. Gosset and applied in Fisher’s analysis of variance. The coefficients are commonly defined using heterozygosity H observed and expected in works influenced by J. F. Crow and Motoo Kimura and can be connected to fixation indices used in coalescent models by John Kingman and Richard Hudson. Derivations often invoke likelihood frameworks and moment estimators associated with methods popularized by David Cox and Jerzy Neyman in statistical inference.

Types and Variants (FST, FIT, FIS)

Three canonical coefficients are traditionally reported: FIT, FST, and FIS, corresponding to overall, among-population, and within-population departures from random mating, respectively. Discussions of FST in the literature include contrasts with measures by Lou Jost and critiques from researchers such as Mark Beaumont and Montgomery Slatkin; FIT formulations tie back to Wright’s inbreeding coefficient and have been used in studies by Luigi Luca Cavalli-Sforza, Marcus Feldman, and Svante Pääbo. FIS has been employed in plant population studies by Ernst Mayr and Barbara McClintock and in pedigree analyses influenced by Sewall Wright and Ronald A. Fisher.

Estimation Methods and Software

Estimators include analysis-of-variance approaches by C. C. Cockerham and weighted estimators developed by Weir and Cockerham; moment-based, likelihood, and Bayesian estimators are implemented in widely used packages and tools. Software implementations appear in programs such as Arlequin, STRUCTURE, ADMIXTURE, GENEPOP, PLINK, and EIGENSOFT; analyses often use pipelines in R packages like hierfstat and adegenet and tools developed in collaboration with institutions such as the Wellcome Trust Sanger Institute. Coalescent and forward-time simulators like ms, simuPOP, and SLiM are used to validate estimators following approaches by Richard Hudson and Matthew Stephens.

Applications in Population Genetics

F-statistics are applied to infer population subdivision in studies of human evolution (comparative analyses involving Neanderthal and Denisovan genomes), conservation genetics of endangered taxa studied by IUCN assessments, landscape genetics in work by Patrick W. W. Hanks and in crop domestication research on Zea mays and Oryza sativa led by researchers affiliated with CGIAR centers. They are used in forensic genetics with standards influenced by the FBI and INTERPOL, in epidemiological genetics linked to studies at the Centers for Disease Control and Prevention, and in investigations of adaptation cited in papers by Peter R. Grant, Rosemary Grant, and Michael Turelli.

Limitations and Criticisms

Critiques highlight that FST can be sensitive to marker diversity and mutation rates, an issue emphasized by Lou Jost and discussed in the context of molecular markers like microsatellites and single-nucleotide polymorphisms used in the 1000 Genomes Project. Other limitations concern assumptions about equilibrium, effects of migration modeled in Wright’s island model and isolation-by-distance frameworks by Niles Eldredge and Richard Levins, and interpretational ambiguity when comparing disparate taxa noted by conservation biologists at the World Wildlife Fund. Methodological criticisms have led to alternative metrics and expanded inferential frameworks advocated by statisticians such as Bradley Efron and Bayesian practitioners exemplified by Andrew Gelman.

Category:Population genetics