Cummean

Cummean
Name	Cummean
Type	Statistical measure
Domain	Statistics, Probability, Data Analysis
Introduced	Unknown
Related	Moving average, Cumulative sum, Running mean

Contents

Definition and Basic Properties
Mathematical Formulation
Computation and Algorithms
Applications and Examples
Statistical and Probabilistic Interpretation
Asymptotic Behavior and Convergence
Related Concepts and Extensions

Cummean

Cummean is a statistical summary that describes the sequence of cumulative arithmetic means computed from a data stream or ordered sample. It appears in time series analysis, sequential estimation, and online algorithms as a simple running estimator of central tendency; it relates closely to cumulative sum techniques used in change detection, sequential hypothesis testing, and control charts. Cummean provides a trajectory showing how the sample mean evolves as additional observations arrive, enabling comparisons with population parameters, benchmarks, and alternative estimators.

Definition and Basic Properties

The cummean of an ordered sample x1, x2, ..., xn is the sequence m1, m2, ..., mn where mk = (x1 + x2 + ... + xk)/k. For finite samples drawn from distributions studied by Thomas Bayes, Pierre-Simon Laplace, Andrey Kolmogorov, Karl Pearson, and Ronald Fisher, cummean sequences converge to the population mean under common regularity conditions such as independence and identical distribution invoked in the Law of Large Numbers proved by Jakob Bernoulli and formalized by Émile Borel. For processes modeled by Norbert Wiener or Andrey Markov frameworks, cummean exhibits smoothing properties and reduced variance relative to raw observations. Symmetry with respect to permutations fails: the cummean depends on data order, a property exploited in sequential analysis developed by Abraham Wald.

Mathematical Formulation

Formally, given real-valued observations {xi}_{i=1}^n, define mk = (1/k) Σ_{i=1}^k xi. Equivalently, m_k = m_{k-1} + (x_k - m_{k-1})/k for k ≥ 2 with m1 = x1. This recursion connects to stochastic approximation schemes studied by Herbert Robbins and Srinivasa Varadhan and to martingale decompositions used by Joseph Doob. When xi are random variables with E[xi] = μ and Var(xi) = σ^2, E[mk] = μ and Var(mk) = σ^2/k under independence, invoking results in works by William Gosset and Fisher. For nonstationary sequences arising in econometrics researched by Clive Granger and Christopher Sims, bias and variance expressions require time-varying formulations.

Computation and Algorithms

Cummean computation uses O(1) memory and O(n) time via the incremental update m_k = m_{k-1} + (x_k - m_{k-1})/k, an approach used in streaming algorithms influenced by Leslie Lamport and implemented in systems like Apache Kafka and Apache Flink for online analytics. Numerically stable variants use pairwise averaging and divide-and-conquer aggregation resembling algorithms in Donald Knuth’s texts and libraries such as BLAS or LAPACK for batch processing. In distributed computing environments pioneered by Google and Doug Cutting (creator of Hadoop), cummeans combine via weighted averages of partitions using counts and local means, mirroring MapReduce aggregation patterns introduced by Jeff Dean and Sanjay Ghemawat.

Applications and Examples

Cummean appears in exploratory data analysis performed by practitioners at institutions like NASA, CERN, and National Institutes of Health to visualize convergence of measurements from experiments by Ernest Rutherford, Lise Meitner, or Cecilia Payne-Gaposchkin. In finance, traders and researchers at firms influenced by techniques from Nassim Nicholas Taleb and Eugene Fama use running means to monitor asset returns and detect regime shifts studied in Black–Scholes models. In quality control, engineers using methods from Walter Shewhart and W. Edwards Deming interpret cummean plots alongside CUSUM charts and control limits developed within Six Sigma programs. Examples include monitoring sensor arrays in projects by Tesla, Inc., climate anomaly tracking by Intergovernmental Panel on Climate Change, and online A/B testing at Meta Platforms, Inc. and Alphabet Inc..

Statistical and Probabilistic Interpretation

From a probabilistic perspective, the cummean sequence is a sequence of estimators m_k for the population mean μ; under independence and identical distribution, m_k is unbiased and its variance decays as 1/k, a fact used in confidence interval construction originating in methods by William Sealy Gosset (Student) and further developed by Jerzy Neyman. The cummean forms a martingale when centered by μ, enabling optional stopping results associated with work by Paul Lévy and Joseph Doob. In Bayesian settings influenced by Pierre-Simon Laplace and Thomas Bayes, the cummean interacts with conjugate priors such as the normal–normal model; sequential posterior means coincide with weighted cummeans when priors are noninformative.

Asymptotic Behavior and Convergence

As k → ∞, mk → μ almost surely under the Strong Law of Large Numbers proved by Andrey Kolmogorov; the Central Limit Theorem attributed to Siméon Denis Poisson and refined by Liapunov and Lindeberg implies √k(m_k - μ) → N(0, σ^2) in distribution for iid sequences. Large deviations principles studied by Srinivasa Varadhan quantify tail probabilities of deviations of mk from μ. For dependent processes modeled by Norbert Wiener–type or Andrey Markov chains, ergodic theorems from Kolmogorov and Andrey Krylov provide convergence conditions; rates may be slower or require mixing assumptions as in work by David Aldous.

Related constructs include moving averages used in technical analysis popularized by John Bollinger, exponentially weighted moving averages developed by Brown and applied in Holt–Winters forecasting, and cumulative sums (CUSUM) introduced by E. S. Page for change detection. Robust extensions replace arithmetic means with running medians or trimmed cummeans inspired by robust statistics from Peter Huber and John Tukey. Multivariate analogues connect to sample mean vectors and empirical covariance estimators used in multivariate analysis by Harold Hotelling and in principal component methods from Karl Pearson. Online learning algorithms by Yann LeCun and Geoffrey Hinton integrate running mean estimators into stochastic gradient schemes and adaptive optimizers.

Category:Statistical estimators