LLMpediaThe first transparent, open encyclopedia generated by LLMs

Median

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Parent: Rayy Hop 4
Expansion Funnel Raw 61 → Dedup 0 → NER 0 → Enqueued 0
1. Extracted61
2. After dedup0 (None)
3. After NER0 ()
4. Enqueued0 ()
Median
Median
Blythwood · CC BY-SA 4.0 · source
NameMedian
TypeMeasure of central tendency
IntroducedAncient mathematics
RelatedMean, Mode, Quantile, Percentile, Order statistic

Median The median is a measure of central tendency that identifies the middle value in an ordered dataset or the 50th percentile of a distribution. It provides a location parameter used in descriptive statistics, robust estimation, and applied analysis across fields such as Demography, Economics, Epidemiology, Sociology, and Finance. The median is widely employed in reports by institutions like the United States Census Bureau, OECD, and World Bank because of its resistance to extreme observations and skewed distributions.

Definition

In a finite ordered sample the median is defined as the middle observation when values are sorted by magnitude; for an odd sample size it is the single central element, and for an even sample size it is commonly taken as the average of the two central elements. For probability distributions the median is any value m satisfying P(X ≤ m) ≥ 1/2 and P(X ≥ m) ≥ 1/2, equivalently a solution of F(m) = 1/2 when the cumulative distribution function F is continuous. Classical treatments of the median appear in works by John Tukey, Frank Wilcoxon, and earlier statisticians; modern formalizations are included in texts by Jerzy Neyman, Ronald Fisher, and Bradley Efron.

Calculation Methods

Exact computation for ordered lists is trivial via indexing, but practical datasets require algorithms and software implementations. For in-memory arrays standard procedures in R (programming language), Python (programming language), MATLAB, and SAS implement sorting-based selection or linear-time selection methods such as the median-of-medians algorithm described by Jon Bentley and M. D. McIlroy. For streaming data or very large datasets online algorithms include reservoir sampling, the GK algorithm by M. Greenwald and S. Khanna, and quantile sketches used in Apache Hadoop and Apache Spark. For weighted observations the weighted median solves sum_{i: x_i ≤ m} w_i ≥ 1/2Σw_i and is used in optimization problems addressed by researchers like Peter J. Huber and Michael T. Heath. For discrete or multimodal distributions alternative definitions use lower and upper medians or set-valued medians; robust statistical software platforms such as Stata and SPSS offer configurable median calculations.

Properties

The median has several mathematical and statistical properties that distinguish it from other statistics. It is a 1/2-quantile, a monotone functional of the empirical distribution, and equivariant under strictly monotone transformations: if g is strictly increasing, the median of g(X) equals g(median(X)). The median minimizes the sum of absolute deviations, making it the L1 estimator and a solution of argmin_m Σ|x_i − m|; this contrasts with the mean, the L2 optimizer. Influence function analysis by Huber and asymptotic theory in treatises by Peter Hall show the median has bounded influence and a slower asymptotic variance than the mean under normality, with asymptotic efficiency of approximately 64% relative to the mean for Gaussian populations. The median is robust to outliers and heavy tails; breakdown point considerations, as discussed by David L. Donoho and Peter J. Huber, assign the median a 50% breakdown point, the highest possible for location estimators.

Comparison with Other Measures of Central Tendency

Unlike the arithmetic mean used in analyses by Karl Pearson and Francis Galton, the median resists distortion from extreme values, an advantage emphasized in applications by Amartya Sen and Joseph Stiglitz. The mode, studied by Ronald Fisher and applied by E. H. Moore, captures the most frequent value and can differ markedly from the median in skewed or multimodal distributions; in income studies reported by Forbes and The Economist the median income often diverges from mean income. Trimmed means and M-estimators derived in work by Peter Huber and Frank Hampel interpolate between mean and median behavior, balancing efficiency and robustness. Quantile regression introduced by Roger Koenker generalizes the median to conditional settings, linking median regression to least absolute deviations and contrasting with ordinary least squares from Gauss and Legendre.

Applications

The median is ubiquitous in empirical reporting and methodological work. Government agencies such as the United States Department of Labor and Eurostat publish median wages and median household incomes to summarize central tendencies without undue influence from the top tail. In medicine, clinical studies in journals like The Lancet and New England Journal of Medicine report median survival times in Kaplan–Meier analyses pioneered by Edward Kaplan and Paul Meier. In engineering and signal processing median filters, used in image de-noising and time-series smoothing, trace back to algorithms by J. W. Tukey and are implemented in toolkits from MathWorks and OpenCV. In operations research and facility location problems the median location (Weber problem) and 1-median problem studied by Hugo Steinhaus and Weber guide siting of facilities, while the geometric median appears in computational geometry literature by Jack Edmonds and Nicos Christofides.

Statistical Inference and Robustness

Inference for medians uses nonparametric and robust techniques. Exact sign tests and the Wilcoxon signed-rank test by Frank Wilcoxon assess median hypotheses without assuming normality; large-sample inference employs asymptotic normality of the sample median with variance estimated via bootstrap methods developed by Bradley Efron or via influence-function-based estimators from robust statistics literature. Confidence intervals for medians can be constructed with order-statistic-based methods, distribution-free sign-based intervals, or bootstrap percentile intervals used in applied work by Andrew Gelman and Donald B. Rubin. The median's robustness properties make it suitable for heavy-tailed models studied by Murray Rosenblatt and for contamination models formalized by Peter Huber; in high-dimensional settings recent research by Emmanuel Candès and Terence Tao investigates median-based aggregation and breakdown in adversarial contexts.

Category:Statistics