Jensen's inequality

Jensen's inequality
Name	Jensen's inequality
Field	Mathematics
Introduced	1906
Named after	Johan Jensen

Contents

Statement
Proofs
Special cases and corollaries
Applications
Generalizations and extensions

Jensen's inequality is a fundamental result in convex analysis and probability theory connecting convex functions with averages. It provides a comparison between the value of a convex function at an average and the average of the function values, with broad impact across Measure theory, Probability theory, Functional analysis, Economics, Statistics, Information theory, Control theory, Optimization and Machine learning. The inequality underlies many classical results associated with Cauchy–Schwarz inequality, Arithmetic mean–geometric mean inequality, Chebyshev's inequality, Markov chains, Kullback–Leibler divergence, Fisher information, Central limit theorem and Law of large numbers.

Statement

Let φ be a convex function defined on a convex subset of a real vector space. For any finite set of points x1, x2, ..., xn and nonnegative weights α1, α2, ..., αn summing to 1, Jensen's inequality states that φ(Σ αi xi) ≤ Σ αi φ(xi). In the context of integration, for a probability measure μ on a measurable space and an integrable function X, the inequality takes the form φ(E[X]) ≤ E[φ(X)]. This statement links to classical results proved by Johan Jensen and can be specialized to settings involving Banach spaces associated with Steinhaus, Hahn–Banach theorem, Riesz representation theorem and spaces studied by Fréchet and Lebesgue.

Proofs

Several proofs are available drawing on different techniques. A standard convexity-based proof uses supporting hyperplanes: for convex φ there exists an affine function supporting φ at any point x0, derived from subgradients as in Fenchel conjugate and Subdifferential theory; comparing φ at convex combinations yields the inequality. Another proof uses induction on n combined with the two-point case, which itself follows from the definition of convexity or from the Mean value theorem when φ is differentiable; differentiable proofs invoke the derivative at x0 and relate to the Taylor theorem in finite dimensions. Measure-theoretic proofs use conditional expectation and the monotone class theorem associated with Kolmogorov and Doob, or exploit the convexity properties in the context of Radon–Nikodym theorem and Tonelli's theorem. Geometric proofs invoke separation theorems such as the Hahn–Banach theorem and supporting hyperplane arguments prominent in the work of Minkowski and Carathéodory.

Special cases and corollaries

Many classical inequalities appear as special cases or corollaries. The arithmetic mean–geometric mean inequality follows by applying the inequality to the logarithm φ = log and linking to results by Gauss and Jacques Bernoulli. Hölder's inequality and the Cauchy–Schwarz inequality can be obtained through convexity arguments related to φ(x)=x^p and duality as in Hölder, Hermann Schwarz and Young's inequality. The convexity of the exponential function yields bounds used in concentration inequalities developed by Hoeffding, Chernoff, and Azuma. In information theory, applying Jensen to the negative logarithm connects to concepts introduced by Shannon and furthered by Kullback and Leibler in the form of divergence inequalities. Risk measures in mathematical finance use Jensen to relate expected utilities studied by von Neumann and Morgenstern and notions in Expected utility theory elaborated by Samuelson.

Applications

Jensen's inequality is ubiquitous. In probability and statistics it bounds expectations and variances in contexts treated by Kolmogorov, Feller, Karl Pearson and Neyman and Pearson. In information theory it underpins proofs of data-processing inequalities and entropy bounds attributed to Shannon and Kraft–McMillan theorem methods. In optimization and convex analysis it supports algorithms analyzed by Kuhn–Tucker, Dantzig, Nesterov and Yann LeCun in machine learning, and appears in duality frameworks such as Lagrange multiplier techniques. In economics it justifies Jensen-based comparisons in expected utility and risk aversion introduced by Kenneth Arrow and John W. Pratt. In statistical physics and thermodynamics, convexity arguments related to Jensen produce free energy bounds in work by Boltzmann, Gibbs, and modern developments in Renormalization group theory. In control and signal processing, bounds derived from Jensen play roles in stability criteria linked to Lyapunov and performance bounds in estimators developed by Kalman.

Generalizations and extensions

Extensions include Jensen's inequality for vector-valued functions and operator convex functions in the setting of C*-algebras and von Neumann algebras; these lead to noncommutative variants studied by Kadison and Elliott Lieb. Martingale versions relate to conditional expectations and inequalities in the tradition of Doob and Burkholder–Davis–Gundy inequalities. Integral refinements and reverse Jensen inequalities connect to results by Karamata and Hardy–Littlewood–Pólya majorization theory. Functional generalizations include variations using Bregman divergences studied by Lev Bregman and convexity in metric spaces treated by Alexandrov, Gromov and in optimal transport by Cédric Villani. Operator means, matrix inequalities and relations to the Loewner order extend Jensen to matrix arguments explored in work by Tsuyoshi Ando and Rajendra Bhatia.

Category:Mathematical inequalities