ScaLAPACK — LLMpedia

ScaLAPACK
Name	ScaLAPACK
Developer	Lawrence Berkeley National Laboratory; Argonne National Laboratory; Oak Ridge National Laboratory
Released	1990s
Programming language	Fortran (programming language); C (programming language)
Operating system	Unix; Linux; Microsoft Windows
Platform	Distributed computing; High-performance computing
License	Proprietary and open-source implementations

Contents

Overview
History and Development
Design and Architecture
Algorithms and Routines
Performance and Scalability
Implementations and Usage
Applications and Limitations

ScaLAPACK

ScaLAPACK is a library of high-performance routines for linear algebra on distributed-memory supercomputers and clusters, designed to extend the LAPACK project for parallel environments and to interoperate with BLAS and BLACS. The project arose from collaborations among researchers at Lawrence Berkeley National Laboratory, Argonne National Laboratory, and Oak Ridge National Laboratory and has been used on systems such as Cray-2, IBM System/360, Fujitsu machines and modern x86-based Beowulf clusters. It targets dense linear systems, eigenproblems, and singular value decompositions with a focus on scalability across Message Passing Interface-based networks and distributed memory topologies.

Overview

ScaLAPACK provides distributed-memory implementations of routines for solving systems of linear equations, computing eigenvalues and singular values, and performing matrix factorizations, building on algorithms from LAPACK and communication layers from BLACS. The library relies on the standardized BLAS interface for local kernels and on MPI or vendor-supplied communication libraries for inter-node messaging, enabling deployment on architectures ranging from Cray XT and IBM Blue Gene to Intel Xeon Phi clusters and ARM architecture-based supernodes. ScaLAPACK influenced subsequent efforts including PLASMA and Elemental and is cited in performance studies alongside tools such as PETSc and Trilinos.

History and Development

ScaLAPACK was conceived in the early 1990s as an extension of LAPACK to distributed-memory machines by teams at Lawrence Berkeley National Laboratory, Argonne National Laboratory, and Oak Ridge National Laboratory, with contributions from researchers associated with University of California, Berkeley and University of Tennessee. Early development targeted architectures produced by vendors like Cray Research, IBM and Sun Microsystems, and was documented in technical reports and proceedings of conferences such as the International Conference on Parallel Processing and the Supercomputing Conference. The project progressed alongside standards like MPI and libraries such as BLAS and BLACS, and influenced numerical software curricula at institutions including Massachusetts Institute of Technology and Stanford University.

Design and Architecture

ScaLAPACK adopts a two-level design that separates local computation from global communication: optimized BLAS kernels perform node-local operations while BLACS or MPI manage process-grid communication, a model compatible with machines from Cray and IBM to commodity clusters built by Dell and HP. Data is distributed using a block-cyclic scheme across a two-dimensional process grid inspired by work at Argonne National Laboratory and the University of Tennessee, facilitating load balance and minimizing communication similar to strategies used in Cylon (software) and ScaRT. The architecture supports mixed-language bindings between Fortran (programming language) and C (programming language), enabling integration with numerical frameworks such as Matlab and R (programming language).

Algorithms and Routines

ScaLAPACK implements parallel variants of classical dense linear algebra algorithms, including LU, QR, and Cholesky factorizations, parallel versions of the QR decomposition and Singular value decomposition, and eigensolvers for symmetric and nonsymmetric matrices, drawing on algorithmic foundations from Golub and Van Loan-era texts and techniques promoted by researchers at Bell Labs and Los Alamos National Laboratory. Routines are named to echo LAPACK conventions and rely on scalable panel factorization, look-ahead techniques, and tournament pivoting strategies comparable to methods evaluated in studies at Lawrence Livermore National Laboratory and Sandia National Laboratories. The library also includes utility routines for distributed matrix redistribution and condition-number estimation used in workflows with PETSc and Trilinos.

Performance and Scalability

ScaLAPACK’s performance depends on BLAS efficiency, network latency and bandwidth characteristics of interconnects such as InfiniBand and Myrinet, and the quality of the process-grid mapping on topologies like Dragonfly and Torus network. Benchmarks on platforms including IBM Blue Gene and Cray XC systems demonstrate strong scaling for large dense problems but reveal limits for extremely large process counts where communication overhead dominates, leading to follow-on work in communication-avoiding algorithms by teams at ETH Zurich and University of Illinois Urbana–Champaign. Performance studies compare ScaLAPACK against libraries like Elemental, PLASMA, and vendor-tuned packages from Intel and NVIDIA.

Implementations and Usage

Implementations of ScaLAPACK are distributed in both reference and vendor-tuned forms, including packages from Netlib and optimized builds by vendors such as Intel Corporation, IBM, and Cray. Users integrate ScaLAPACK into scientific applications developed at institutions like CERN, NASA, and European Organization for Nuclear Research to solve dense linear systems in simulations originating from projects at Los Alamos National Laboratory and Lawrence Livermore National Laboratory. The library is wrapped by higher-level environments including MATLAB toolboxes, interfaces for Python (programming language) via SciPy, and job workflows managed by schedulers like SLURM and PBS Professional on clusters provisioned by Amazon Web Services and Microsoft Azure for cloud HPC.

Applications and Limitations

ScaLAPACK is widely used in computational fields requiring dense linear algebra, including computational fluid dynamics codes from National Renewable Energy Laboratory, climate modeling at NOAA, and electronic structure calculations at Argonne National Laboratory and Oak Ridge National Laboratory, and supports simulations similar to those performed for projects at Los Alamos National Laboratory and Lawrence Berkeley National Laboratory. Limitations include reduced efficiency for sparse problems—where libraries such as SuiteSparse and Trilinos excel—sensitivity to network characteristics on exascale platforms championed by Oak Ridge National Laboratory and algorithmic challenges addressed by initiatives at DOE laboratories. Ongoing research at institutions like University of Illinois Urbana–Champaign and ETH Zurich seeks to extend scalability with communication-avoiding algorithms and GPU-aware implementations influenced by work at NVIDIA and AMD.

Category:Numerical linear algebra libraries