Generated by GPT-5-mini| High Performance Linpack | |
|---|---|
| Name | High Performance Linpack |
| Acronyms | HPL |
| Developer | Jack Dongarra |
| Released | 1990s |
| Genre | Benchmark |
| License | Varies |
High Performance Linpack High Performance Linpack is a benchmark program used to measure floating-point computing performance of supercomputers and high-performance computing systems. It exercises dense linear algebra routines and produces a single performance metric that has been central to Top500 listings, Oak Ridge National Laboratory, Lawrence Berkeley National Laboratory procurement assessments, and system architecture comparisons among Cray Inc., IBM, Fujitsu, Hewlett Packard Enterprise, and Lenovo. The benchmark's results have influenced procurement decisions by organizations such as NASA, Department of Energy (United States), Argonne National Laboratory, Sandia National Laboratories, and Los Alamos National Laboratory.
HPL implements a distributed-memory version of the LINPACK dense linear algebra benchmark using Message Passing Interface communication and Basic Linear Algebra Subprograms for local computation. It solves a dense system of linear equations via LU factorization with partial pivoting on matrix sizes chosen to maximize performance on a target system. Metrics produced by HPL have historically underpinned comparisons among architectures offered by Intel Corporation, AMD, NVIDIA Corporation, ARM Holdings, and accelerator vendors such as Tesla hardware initiatives and Xilinx. HPL's role in benchmarking intersects with centers like National Energy Research Scientific Computing Center, Pawsey Supercomputing Centre, Leibniz Supercomputing Centre, and procurement programs at European Centre for Medium-Range Weather Forecasts.
HPL traces its lineage to the original LINPACK project developed by teams associated with University of Tennessee (Knoxville), Oak Ridge National Laboratory, and the National Institute of Standards and Technology. Key contributors include Jack Dongarra, Piotr Luszczek, and collaborators from University of Tennessee, University of Manchester, and industry partners at Intel and IBM Research. Over successive decades, HPL evolved to accommodate distributed-memory clusters built by vendors such as Cray Research, SGI, NEC Corporation, Fujitsu, and hyperscale deployments at Google, Amazon Web Services, and Microsoft Azure. Milestones include adaptation for MPI standards, tuning for multicore processors from Intel Xeon and AMD EPYC, and extensions to exploit accelerators from NVIDIA and FPGA projects from Xilinx.
HPL solves Ax = b for a dense matrix A using LU decomposition; it relies on tuned implementations of Level 3 BLAS routines and an MPI communication layer. Benchmark runs select a problem size N and process grid (P × Q) to maximize use of system memory and compute resources on platforms ranging from clusters at CERN to national machines at Riken and Jülich Research Centre. Input parameters include block size, threshold pivoting options, and process layout; these are tuned for systems such as HPE Cray EX installations, Fujitsu Fugaku-class designs, and hybrid CPU–GPU deployments at Oak Ridge Leadership Computing Facility. The methodology prescribes reporting of best Rmax (sustained) and Rpeak (theoretical) values for comparability across submissions to Top500.
Primary metrics are Rmax (measured in teraflops or petaflops) and Rpeak derived from manufacturer specifications for processors like Intel Xeon Phi, AMD Instinct, or NVIDIA A100. Efficiency is computed as Rmax/Rpeak and cited in comparisons among systems such as those at Los Alamos National Laboratory and Lawrence Livermore National Laboratory. Aggregate rankings also influence awards and programs like the Green500 and procurement evaluations by European Commission research infrastructures. Secondary metrics include memory footprint, matrix size N, and time-to-solution, all important in comparisons between architectures from ARM Ltd. licensees and proprietary designs by HPE and IBM.
Reference and optimized HPL implementations are distributed with sources that link to optimized BLAS libraries such as OpenBLAS, Intel Math Kernel Library, AMD BLIS, and vendor-provided packages from NVIDIA cuBLAS and ARM Performance Libraries. Community-driven ports target runtime systems including OpenMPI, MPICH, and vendor MPI stacks used at facilities like NERSC and PSC (Ranger) sites. Tuning guides and scripts for HPL are maintained by developers at University of Tennessee, Netlib, and collaborators in the HPC community to adapt HPL for MPI+X hybrid models, multi-GPU nodes, and exascale platforms such as Frontier.
HPL results are the primary benchmark for the Top500 list, which catalogs the world's fastest supercomputers at biannual conferences like International Supercomputing Conference and SC (conference). Vendors submit HPL runs from installations at national labs including Oak Ridge, Lawrence Livermore, Argonne, and international centers at Riken and Fujitsu-partnered sites to claim positions in the top tiers. HPL performance has driven publicity around systems such as IBM Summit, Fujitsu Fugaku, Cray Titan, and HPE Apollo deployments, shaping architecture choices at European Centre for Medium-Range Weather Forecasts and industry cloud providers like IBM Cloud and Google Cloud Platform.
Critics from institutions including ACM, IEEE, and national lab benchmarking groups note that HPL emphasizes dense floating-point throughput and may not reflect performance on application workloads like those in Los Alamos scientific codes, climate modeling at ECMWF, or data-analytic pipelines at CERN. Alternative benchmarks and suites such as High Performance Conjugate Gradients, Graph500, SPEC, and application proxies have been proposed to complement HPL for heterogeneous systems and I/O-bound workloads found at NERSC and PRACE-supported centers. Debates continue among system architects at Intel Labs, NVIDIA Research, and academic groups about the role of HPL in procurement, energy efficiency comparisons like Green500, and relevance to exascale initiatives such as US Exascale Computing Project.
Category:Benchmarks