High Performance Linpack

High Performance Linpack
Name	High Performance Linpack
Author	Jack Dongarra, Jim Bunch, Cleve Moler, Pete Stewart
Developer	Netlib
Released	0 1979
Programming language	Fortran
Operating system	Cross-platform
Genre	Benchmark

Contents

Overview
Algorithm and Implementation
Performance Metrics and the TOP500
Historical Development
Variants and Related Benchmarks
Software and Usage

High Performance Linpack. It is a benchmark for measuring the floating-point performance of supercomputers by solving a dense system of linear equations. The benchmark is derived from the LINPACK library, a collection of Fortran subroutines for solving linear algebra problems. Its results are famously used to rank the world's most powerful systems on the TOP500 list, providing a standardized metric for comparing high-performance computing capabilities across different architectures and generations.

Overview

The benchmark solves a dense system of linear equations, **Ax = b**, using Gaussian elimination with partial pivoting. It is designed to stress the floating-point computation and memory subsystem of a supercomputer, providing a measure of its performance in FLOPS. The problem size, *N*, can be scaled to fit the memory of the system being tested, with a larger *N* typically yielding a higher percentage of the machine's theoretical peak performance. The result is a single number, the achieved Rmax, which is reported to the TOP500 project. This project, maintained by researchers including Jack Dongarra, Hans Meuer, and Erich Strohmaier, has become the definitive authority on supercomputer performance since its inception in 1993.

Algorithm and Implementation

The core algorithm is a right-looking version of LU factorization with partial pivoting, a standard method for solving linear systems. The implementation is highly optimized to exploit cache memory hierarchies and parallel computing paradigms, often utilizing Basic Linear Algebra Subprograms (BLAS) for critical matrix operations. For parallel systems, the data matrix is distributed across processors using a two-dimensional block-cyclic distribution, a scheme effectively implemented in libraries like ScaLAPACK. Tuning for specific architectures, such as those from NVIDIA (using CUDA) or AMD, often involves optimizing these BLAS kernels, particularly DGEMM, to maximize data reuse and computational throughput on hardware like GPUs or many-core CPUs from Intel.

Performance Metrics and the TOP500

The primary performance result is **Rmax**, the sustained performance in FLOPS achieved when solving the largest problem that fits in the system's memory. A secondary metric, **Rpeak**, represents the theoretical peak performance. The TOP500 list, published twice yearly at conferences like the International Supercomputing Conference and the Supercomputing Conference, ranks systems based on their Rmax value. This ranking has driven competition among nations, laboratories like Lawrence Livermore National Laboratory and Oak Ridge National Laboratory, and vendors such as IBM, Cray, and Fujitsu. The pursuit of exascale computing has made achieving high scores a key goal for projects like Frontier at Oak Ridge National Laboratory and Fugaku at the RIKEN Center for Computational Science.

Historical Development

The benchmark originated from the LINPACK library, created in the 1970s by Jack Dongarra, Jim Bunch, Cleve Moler, and Pete Stewart. Its adaptation for benchmarking parallel computers began in the late 1980s. The formal establishment of the TOP500 list in 1993 by Jack Dongarra, Hans Meuer, Erich Strohmaier, and Horst Simon institutionalized its use. Over decades, it has tracked the evolution from vector processor systems like those from Cray Research to massively parallel MPP machines, and now to hybrid systems using GPU accelerators from NVIDIA and AMD. This history mirrors the broader trends in high-performance computing architecture.

Several variants have been developed to address different constraints or components. The **HPL-AI** benchmark measures mixed-precision performance, reflecting the needs of artificial intelligence workloads. The **LINPACK for Clusters** project facilitated running on Beowulf clusters. Other important benchmarks for a fuller system assessment include the High Performance Conjugate Gradient benchmark, the Graph500 list for data-intensive applications, and the HPCG benchmark, which emphasizes memory bandwidth and latency rather than just peak FLOPS. Projects like the SPEC CPU suite also provide complementary metrics for processor performance.

Software and Usage

The source code is freely available from Netlib. Successful execution requires significant system-specific tuning, including optimization of BLAS libraries (e.g., OpenBLAS, MKL, BLIS), configuration of Message Passing Interface parameters, and adjustment of problem size and block size. It is widely used by system integrators, research institutions like the National Science Foundation-funded centers, and vendors for acceptance testing and public performance demonstrations. While criticized for not representing all real-world workloads, its simplicity and historical continuity ensure its enduring role in the high-performance computing community.

Category:Computer benchmarks Category:Supercomputing Category:Numerical linear algebra