k-Median

k-Median
Name	k-Median
Field	Operations research; Theoretical computer science
Introduced	1980s
Problems	Clustering; Facility location; Combinatorial optimization

Contents

Definition and problem statement
Algorithms and complexity
Approximation algorithms and guarantees
Variants and related problems
Applications and empirical evaluations

k-Median

k-Median is a combinatorial optimization problem arising in IBM-style operations research and Bell Labs-era facility location studies that seeks to choose k center locations to minimize total distance from demand points to centers. Originating in applied work at institutions such as AT&T and formalized in theoretical venues like STOC and FOCS, the problem connects to models studied at MIT, Stanford University, and Princeton University. k-Median has driven research spanning contributions from groups at Microsoft Research, Google Research, Harvard University, Brown University, and University of California, Berkeley.

Definition and problem statement

The k-median problem is defined on a finite metric space (V, d) with a nonnegative distance function introduced in algorithmic studies presented at ICALP and SODA conferences; given an integer k, the goal is to select a subset S ⊆ V of size k minimizing the sum of distances from every vertex to its nearest center in S. Formalizations in papers associated with Karp-style NP-completeness results and reductions from problems studied at Bell Labs and IBM Research treat inputs as weighted demand points, opening connections to formulations used at RAND Corporation and in textbooks from MIT Press and Cambridge University Press. The decision and optimization variants were compared with facility location problems examined in reports from AT&T Labs Research and theoretical results disseminated through SIAM Journal on Computing.

Algorithms and complexity

Exact algorithms for k-median include brute-force enumeration and integer programming formulations used in practice at McKinsey & Company and in logistics studies by FedEx; these scale poorly due to NP-hardness proved via reductions akin to those in foundational work by Cook and Karp presented at STOC and FOCS. Polynomial-time solvable special cases exist on tree metrics studied by researchers at Princeton University and in algorithmic graph theory texts from Cambridge University Press. Fixed-parameter tractable approaches parameterized by k have been developed in research groups at ETH Zurich and University of Oxford, often building on techniques from parameterized complexity workshops hosted by Dagstuhl and Banff International Research Station. Practical heuristics inspired by centroid methods from studies at Bell Labs and clustering pipelines used at Google and Facebook rely on local search, LP relaxation, and primal-dual schemas that were refined in publications from Microsoft Research and IBM Research.

Approximation algorithms and guarantees

Approximation algorithms for k-median were advanced by seminal papers from researchers affiliated with Princeton University, MIT, Stanford University, and UC Berkeley, producing constant-factor guarantees through local search and LP-rounding techniques popularized in SODA proceedings. The best-known approximation ratios were obtained via primal-dual constructions and dependent rounding approaches influenced by work at ETH Zurich and University of Toronto, with hardness thresholds linked to PCP-theorem related results that circulated from Bell Labs and authors connected to Columbia University. Lower bounds and inapproximability results reference reductions to problems studied at Harvard University and complexity frameworks discussed at FOCS symposia. Bicriteria approximations and trade-offs between facility opening and assignment costs were formalized in collaborations involving Microsoft Research and IBM Research.

Variants include the capacitated k-median studied by operations groups at Duke University and in industrial reports from UPS; the facility location problem with opening costs historically examined at AT&T; the k-means problem popularized in applied machine learning research at Google Brain and DeepMind; and the k-center problem analyzed in algorithmic theory at University of Washington. Other related formulations such as the discrete median, continuous Weber problem studied in classical location theory by researchers at ETH Zurich, and submodular facility location discussed at workshops at Dagstuhl connect to combinatorial optimization literature produced at INRIA and IMB research groups. Connections to clustering benchmarks used in studies at Carnegie Mellon University and University of Illinois at Urbana-Champaign are common.

Applications and empirical evaluations

Empirical evaluations of k-median algorithms appear in logistics optimization case studies by FedEx and UPS, in network design and content distribution analyses at Netflix and Akamai Technologies, and in location planning reports from municipal projects involving City of New York and infrastructure teams at Siemens. Machine learning and data mining applications leveraged in deployments by Google, Facebook, Amazon, and Microsoft compare k-median heuristics against k-means baselines in empirical papers from NeurIPS, ICML, and KDD. Benchmark studies and datasets from UCI Machine Learning Repository and competitions organized by Kaggle have been used to evaluate scalability and solution quality, with industrial collaborations reported in proceedings of SIGMOD and VLDB.

Category:Combinatorial optimization

Definition and problem statement

Algorithms and complexity

Approximation algorithms and guarantees

Variants and related problems

Applications and empirical evaluations