graph partitioning

graph partitioning
Name	Graph partitioning
Domain	Computer science
Related	Combinatorial optimization, Spectral graph theory, Parallel computing

Contents

Introduction
Definitions and Problem Variants
Algorithms and Techniques
Applications
Complexity and Theoretical Results
Practical Considerations and Implementation

graph partitioning

Graph partitioning is a problem in computer science and operational research concerned with dividing the vertices of a graph into disjoint subsets while optimizing criteria such as edge cuts, balance, or conductance. It connects to topics in theoretical computer science, numerical linear algebra, and high-performance computing, and has driven development of methods used in large-scale systems like Amazon (company), Google, and Microsoft. Research on the subject has been advanced at institutions including Massachusetts Institute of Technology, Stanford University, and University of California, Berkeley.

Introduction

Graph partitioning seeks a partition of a graph's vertex set that minimizes an objective (for example, cut size) subject to constraints (for example, balance between parts). The problem arises in contexts such as distributing work across nodes in IBM clusters, organizing data in Intel-based supercomputers, and preprocessing meshes for solvers developed at Los Alamos National Laboratory and Sandia National Laboratories. Historical milestones include algorithms influenced by work at Bell Labs, and theoretical foundations related to results from researchers affiliated with Princeton University and Harvard University.

Definitions and Problem Variants

Common formalizations include the k-way partition, bisection, minimum cut, and community detection variants studied at Cornell University and California Institute of Technology. Notable objective measures include edge cut, edge expansion, conductance, and modularity—metrics popularized in literature from Columbia University and University of Toronto. Constraints often require balance by vertex weight or capacity, as considered by researchers at Royal Institute of Technology and ETH Zurich. Specialized variants include partitioning directed graphs (studied at University of Oxford), hypergraph partitioning (pursued at Tokyo Institute of Technology), and dynamic partitioning for streaming graphs explored by groups at Facebook and Twitter.

Algorithms and Techniques

Algorithms span combinatorial, spectral, and multilevel frameworks developed across labs such as IBM Research, Microsoft Research, and Google Research. Spectral methods use eigenvectors of the Laplacian matrix, building on mathematics from École Normale Supérieure and techniques related to work by researchers linked to Princeton University and University of Cambridge. Multilevel schemes involve coarsening, initial partitioning, and refinement phases; influential implementations include tools originating from Lawrence Livermore National Laboratory and software developed at Argonne National Laboratory. Flow-based algorithms rely on max-flow/min-cut results tied to classic work at Stanford University and University of Waterloo. Heuristics such as Kernighan–Lin and Fiduccia–Mattheyses were introduced in venues associated with Bell Labs and later refined in projects at Northeastern University.

Applications

Graph partitioning is integral to parallel finite-element solvers used at Los Alamos National Laboratory and Argonne National Laboratory, and to social network analysis performed by teams at Facebook and LinkedIn. It underlies load balancing in distributed databases at Oracle Corporation and query planning at Amazon (company), and informs VLSI circuit layout research originating from Massachusetts Institute of Technology and University of Illinois Urbana-Champaign. In machine learning, partitioning supports mini-batch construction and model parallelism deployed by groups at Google DeepMind and OpenAI. Bioinformatics pipelines at Broad Institute and European Bioinformatics Institute also exploit graph partitioning for clustering and sequence assembly.

Complexity and Theoretical Results

Many variants are NP-hard, a classification stemming from complexity theory developed at Princeton University and Massachusetts Institute of Technology. Approximation algorithms and inapproximability results have been proved in work associated with California Institute of Technology and University of California, Berkeley. Spectral bounds linking conductance to second eigenvalues trace to results related to Courant and researchers at University of Chicago; Cheeger-type inequalities adapted to graphs were studied by groups at University of Amsterdam and École Polytechnique. Hardness of graph bisection and related reductions often reference foundational theorems from Stanford University and Carnegie Mellon University.

Practical Considerations and Implementation

Implementations balance quality, speed, and memory for deployment on hardware from NVIDIA GPUs to CPU clusters at Oak Ridge National Laboratory. Widely used software packages include tools influenced by projects from Lawrence Berkeley National Laboratory and libraries integrated in systems at Intel Corporation and Cray Inc.. Practical engineering addresses partition quality metrics, rebalancing strategies for evolving graphs as in platforms by Twitter, and integration with distributed file systems like those at Yahoo!. Benchmarking and standard datasets for evaluation have been curated by consortia including DIMACS challenges and data initiatives tied to Stanford Large Network Dataset Collection.

Category:Graph theory