HPC — LLMpedia

HPC
AI-generated (Stable Diffusion 3.5) · CC BY 4.0 · source
Name	High-performance computing
Abbreviation	HPC
Type	Technology
First	1960s
Developer	Cray Research; Lawrence Livermore National Laboratory; Los Alamos National Laboratory
Key people	Seymour Cray; Grace Hopper
Platform	Cray-1; IBM Watson; Fugaku
Language	Fortran; C; C++; MPI; OpenMP; CUDA

Contents

Overview
History
Architecture and Components
Programming Models and Software
Applications
Performance Metrics and Benchmarking
Energy Efficiency and Cooling
Security and Reliability

HPC

High-performance computing refers to the use of massively parallel and vectorized computer systems to solve computationally intensive problems that exceed the capacity of standard workstations. It underpins advanced studies and operations across scientific, industrial, and national-security domains, connecting supercomputers, accelerators, interconnect fabrics, and specialized software stacks. Modern deployments blend hardware advances from vendors with research from national laboratories and academic centers to tackle grand challenges in simulation, data analysis, and modeling.

Overview

High-performance installations combine custom processors, vector supercomputer architectures, and accelerator technologies to maximize floating-point throughput, memory bandwidth, and inter-node communication. Leading facilities often host systems from Cray Research, IBM, Fujitsu, NVIDIA, and Intel and are sited at institutions such as Oak Ridge National Laboratory, Argonne National Laboratory, Lawrence Berkeley National Laboratory, and RIKEN. The ecosystem includes community projects like TOP500 rankings, software toolchains developed at Los Alamos National Laboratory and Lawrence Livermore National Laboratory, and international collaborations exemplified by PRACE and EuroHPC.

History

Early pedigree traces to efforts at Lawrence Livermore National Laboratory and Los Alamos National Laboratory in the 1960s, and to commercial milestones such as the Cray-1 developed by Seymour Cray at Cray Research. Cold War priorities influenced funding through programs at Oak Ridge National Laboratory and procurement by defense agencies connected to DARPA initiatives. The rise of massively parallel processing in the 1980s and 1990s involved projects at IBM and academic centers including Stanford University and Massachusetts Institute of Technology, while open-source middleware emerged from collaborations involving National Energy Research Scientific Computing Center and Sandia National Laboratories. The 21st century saw leadership systems like Jaguar and Summit at Oak Ridge National Laboratory, and Fugaku at RIKEN, driven by co-design partnerships with industrial firms.

Architecture and Components

A typical installation integrates multicore CPU nodes, manycore accelerators from NVIDIA and AMD, high-bandwidth memory subsystems from HPE and Intel, and low-latency interconnects such as InfiniBand and proprietary fabrics. Storage tiers include parallel file systems like Lustre and GPFS, often served by disk arrays from Seagate or flash from Samsung Electronics. System software stacks involve firmware from vendors like Lenovo and orchestration tools used by centers such as NERSC. Cooling, power distribution, and facility design are coordinated with engineering teams at Oak Ridge National Laboratory and commercial partners like Schneider Electric.

Programming Models and Software

Developers employ languages and models including Fortran and C++ augmented with message-passing via MPI and shared-memory parallelism via OpenMP. Accelerator programming uses models provided by CUDA from NVIDIA and heterogeneous runtimes from OpenCL and SYCL championed by Khronos Group and Codeplay Software. Scientific libraries such as BLAS and LAPACK are optimized by vendor teams at Intel and NVIDIA, while application frameworks come from research groups at Los Alamos National Laboratory and Sandia National Laboratories. Workflow and data management integrate middleware like Slurm and provenance systems influenced by projects at Lawrence Berkeley National Laboratory.

Applications

High-performance systems run large-scale simulations for climate modeling by teams at NOAA and NASA, astrophysics codes developed at Princeton University and Caltech, and materials discovery initiatives linked to Argonne National Laboratory. Computational chemistry applications are advanced by researchers at MIT and University of California, Berkeley, while genomics pipelines accelerate discovery at Broad Institute and Wellcome Sanger Institute. Industry adopters include automotive simulation groups at BMW and aerodynamics teams at Boeing, and finance firms using Monte Carlo methods developed in collaboration with Goldman Sachs and academic centers.

Performance Metrics and Benchmarking

Performance is evaluated with benchmarks such as LINPACK used by the TOP500 project, and with application-specific proxies like those from SPEC and community kernels curated by DOE laboratories. Metrics encompass FLOPS, memory bandwidth, network latency measured against standards from IEEE and throughput tests used by centers such as NERSC. Co-design activities between hardware vendors like IBM and software teams at Argonne National Laboratory refine benchmarking suites for real-world workloads.

Energy Efficiency and Cooling

Energy and thermal management are critical, driving innovations such as liquid-immersion cooling trialed by NVIDIA and facility-level optimizations at Oak Ridge National Laboratory and Lawrence Berkeley National Laboratory. Metrics like Green500 rankings complement TOP500 lists to emphasize FLOPS per watt; companies including HPE and Fujitsu compete on energy efficiency. Research projects at ETH Zurich and Imperial College London explore waste-heat reuse and modular data-center designs.

Security and Reliability

Operational security relies on hardened firmware from vendors like Intel and AMD, access controls and audit tools developed at Sandia National Laboratories and Los Alamos National Laboratory, and supply-chain practices coordinated with agencies such as NIST. Resilience strategies include ECC memory from manufacturers like Micron Technology, checkpoint/restart frameworks developed at Argonne National Laboratory, and fault-injection testing performed in collaboration with Lawrence Livermore National Laboratory. Availability for user communities is managed through allocation programs run by centers including XSEDE and regional consortia such as PRACE.

Category:Supercomputing