set disjointness problem

set disjointness problem
Name	Set disjointness problem
Field	Theoretical computer science; Communication complexity; Algorithmic complexity
Introduced	1990s
Notable	Razborov, Kalyanasundaram, Schnitger, Babai, Nisan

Contents

Definition and problem statement
Complexity and communication complexity
Algorithms and protocols
Variations and related problems
Applications and significance

set disjointness problem The set disjointness problem is a fundamental decision problem in theoretical computer science that asks whether two parties holding subsets have an empty intersection. It arises in studies of communication complexity, lower bounds, streaming algorithms, and circuit complexity, and it has connections to work by prominent figures associated with Razborov, Kalyanasundaram, Nisan, Szegedy, Babai, Saks, Srinivasan, Schwartz, Yao and Karp. The problem underpins separations and lower bounds in models studied by researchers connected to Princeton University, MIT, IBM Research, Microsoft Research, and Bell Labs.

Definition and problem statement

In the canonical two-party formulation, one party, often associated with institutions like Stanford University or Harvard University, receives a subset A of a universe [n], while another party, linked to University of California, Berkeley or Carnegie Mellon University, receives a subset B of [n]; they must determine whether A ∩ B = ∅ using as little communication as possible. This decision task is studied in the context of models introduced by Andrew Yao and developed in seminars at Rutgers University and University of Chicago, with formalizations appearing in workshops sponsored by ACM and IEEE. The standard promise-free version asks for exact determination; randomized and approximate variants were investigated in conferences such as STOC and FOCS and influenced research at Microsoft Research Redmond and Google Research.

Complexity and communication complexity

Set disjointness became a cornerstone for proving lower bounds in deterministic, randomized, and quantum communication models, building on methods by Razborov and others affiliated with IHÉS and Cornell University. Deterministic complexity results relate to frameworks developed at Bell Labs Research and are often contrasted with randomized bounds proven via discrepancy methods that trace to work by researchers at University of California, San Diego and Tel Aviv University. The randomized two-party communication complexity of the problem is Theta(n), with tight lower bounds shown using techniques connected to the Probabilistic Method and analytic tools used by scholars at Princeton University and ETH Zurich. Quantum communication complexity studies, influenced by teams at University of Waterloo and Perimeter Institute, show separations that leverage entanglement-related frameworks discussed at CERN and Caltech.

Algorithms and protocols

Efficient protocols for variants of the problem have been designed using fingerprinting methods introduced by investigators at IBM T.J. Watson Research Center and hashing techniques related to work by researchers at Google and Yahoo!. Streaming algorithms that test disjointness in sublinear space were developed in collaborations associated with EPFL and University of Cambridge, employing sketches tied to research at Bell Labs and AT&T Labs Research. Parallel and distributed protocols have been implemented in systems inspired by projects at Microsoft Azure and Amazon Web Services and are analyzed via models promoted in papers from Stanford and MIT Computer Science and Artificial Intelligence Laboratory. Lower-level algorithmic constructions trace roots to combinatorial designs studied at Institute for Advanced Study and coding-theoretic techniques from Tata Institute of Fundamental Research.

Numerous variants include multi-party number-in-hand and number-on-forehead models investigated by groups at Rutgers University and University of Illinois Urbana-Champaign, approximate intersection counting considered by teams at Facebook AI Research and DeepMind, and homomorphic encryption-based protocols explored by cryptographers at RSA Laboratories and University of Waterloo. Related problems studied in the literature include the set intersection query studied at Amazon Research and theoretical primitives like equality and indexing analyzed at Caltech and Columbia University. Reductions connect the problem to streaming heavy hitters work associated with Yale University and graph streaming problems pursued at University of Washington and New York University.

Applications and significance

The set disjointness problem serves as a canonical hard problem used to prove communication lower bounds for tasks in databases researched at Oracle Corporation and IBM Research, distributed systems studied at Intel Labs and Cisco Systems, and privacy-preserving computations explored at MIT and Harvard cryptography groups. Its complexity determines resource trade-offs in big data frameworks used by Netflix and LinkedIn and influences protocol design for secure multi-party computation developed by teams at Microsoft Research and Stanford University. Foundational results about set disjointness have shaped curricula and research agendas at universities such as University of Oxford, University of Cambridge, Princeton University, ETH Zurich, and UCLA and continue to inform theoretical advances presented at venues like SODA, ICALP, COLT, and PODS.

Category:Theoretical computer science

set disjointness problem

Definition and problem statement

Complexity and communication complexity

Algorithms and protocols

Variations and related problems

Applications and significance