Bulk Synchronous Parallel

Bulk Synchronous Parallel
Name	Bulk Synchronous Parallel
Introduced	1990s
Developers	Leslie Valiant
Paradigm	Parallel computing model
Influences	PRAM, message passing, MapReduce
Notable implementations	BSPonMPI, BSPlib, Hama

Contents

Overview
Model Definition
Applications and Implementations
Performance and Scalability
Limitations and Criticisms
Extensions and Related Models

Bulk Synchronous Parallel

Bulk Synchronous Parallel is a theoretical model for designing and analysing parallel algorithms that structures computation into synchronized supersteps. It provides an abstract framework linking algorithm design, hardware architecture and performance analysis, enabling comparisons among systems such as IBM supercomputers, Cray platforms, Intel clusters and distributed systems used by Google and Facebook. The model aims to balance computation, communication and synchronization to predict scalability and cost on large-scale platforms like Oak Ridge National Laboratory systems and Lawrence Livermore National Laboratory machines.

Overview

The model was proposed to bridge theoretical models like PRAM and practical systems including MPI-based clusters and early message-passing machines such as CM-5 and Thinking Machines Corporation hardware. It formalizes computation as a sequence of global synchronization points, drawing conceptual ties to models used in MapReduce and frameworks developed at institutions like Microsoft Research and DARPA programs. BSP emphasizes three cost components—local computation, communication volume, and synchronization latency—making it relevant to studies conducted at centers such as Stanford University, MIT, UC Berkeley, Lawrence Berkeley National Laboratory and industry labs at IBM Research.

Model Definition

In the model, computation proceeds in a sequence of supersteps; each superstep consists of independent local computation on processors, point-to-point or collective communication among processors, and a global barrier synchronization. The formal parameters often include the number of processors P, the computation cost per processor, a communication cost parameter g (gap) representing bandwidth characteristics as measured on systems like Cray XT and Blue Gene installations, and a synchronization latency parameter L influenced by interconnects such as InfiniBand, Ethernet and proprietary networks from Mellanox. The cost metric aggregates local work, message counts and sizes, and synchronization delays, permitting performance bounds and complexity analysis similar to techniques used in theoretical work at Bell Labs and Princeton University. Designers map algorithmic steps to supersteps to analyze asymptotic costs and practical constants, facilitating comparisons to worst-case bounds in models referenced by researchers at Cornell University and University of Texas at Austin.

Applications and Implementations

BSP has been applied to numerical linear algebra, graph analytics, sorting, sparse matrix operations and scientific simulations developed at organizations such as Los Alamos National Laboratory and Argonne National Laboratory. Implementations include libraries like BSPlib and research systems such as Hama (built in the Apache Software Foundation ecosystem) and adaptations for MPI in projects at universities including University of Cambridge and University of Edinburgh. Large-scale data processing frameworks at Google and Yahoo! have inspired BSP-like iterative processing in systems underpinned by work from groups at Carnegie Mellon University and University of Washington. BSP has influenced production tools used by companies like Amazon and Microsoft for distributed analytics and by research teams in high-performance computing groups at NASA and European Organisation for Nuclear Research (CERN).

Performance and Scalability

Performance analysis under the model yields cost expressions that combine computation time, communication overhead and synchronization delays; such expressions have been validated against measurements on platforms from vendors like Dell, HP and SGI. Scalability studies contrast BSP predictions with empirical results on petascale systems such as Titan and Sequoia, and exascale projections considered by projects at NERSC and PRACE. The model helps identify bottlenecks linked to network contention seen on fabrics like Myricom and Cray Aries, and to latency effects studied in cluster experiments at Lawrence Livermore National Laboratory and industrial testbeds maintained by Intel Labs. Cost parameters g and L enable calibration against microbenchmarks such as those developed at University of Illinois and ETH Zurich, allowing designers to estimate superstep granularity needed to achieve strong and weak scaling on specific hardware profiles like multi-socket nodes from AMD and Intel.

Limitations and Criticisms

Critics argue BSP’s global synchronization abstraction may be too coarse for asynchronous workloads and irregular computations found in graph processing at scale studied by teams at Facebook and LinkedIn, or in streaming systems developed at Netflix and Twitter. Empirical studies at Columbia University and UC San Diego show that frequent barriers can degrade performance on heterogeneous clusters used by Dropbox and Box, and that the model abstracts away memory hierarchies emphasized in research at HP Labs and Google Brain. Other critiques, in literature from University College London and KTH Royal Institute of Technology, note that BSP’s cost parameters can be difficult to estimate accurately on cloud platforms provided by Amazon Web Services, Google Cloud Platform and Microsoft Azure, where virtualization and multi-tenancy affect network and synchronization behavior.

Several extensions relax strict synchronization or augment communication modeling, including asynchronous BSP variants and hybrid models combining BSP with MPI or shared-memory paradigms studied at ETH Zurich and Tsinghua University. Related models and frameworks include PRAM, LogP, the GASPI model, and programming systems like Pregel (influenced by work at Google), Giraph (originating from Facebook research), and MapReduce which influenced iterative adaptations at Yahoo! and Cloudera. Research at University of California, Santa Barbara and Northwestern University explores fault tolerance, elasticity and dynamic load balancing within BSP-like frameworks, while projects at Imperial College London and University of Southampton investigate compiler support and automated mapping from high-level languages to BSP-style execution.

Category:Parallel computing models