High Performance Computing and Communications

High Performance Computing and Communications
Name	High Performance Computing and Communications
Field	Computational Science
Related	Supercomputing, Parallel Computing, Networking

Contents

High Performance Computing and Communications is a multidisciplinary area focused on the design, deployment, and use of advanced supercomputer systems, high-speed Internet2-class networks, and scalable parallel computing software to solve computationally intensive problems. It encompasses research and engineering spanning Cray Research, IBM, Intel Corporation, NVIDIA, AMD, Hewlett-Packard, Sun Microsystems, Microsoft Research, and Google infrastructures, integrating innovations from projects like TOP500 rankings, DARPA initiatives, and programs at institutions such as Lawrence Livermore National Laboratory, Oak Ridge National Laboratory, Argonne National Laboratory, Los Alamos National Laboratory, and National Renewable Energy Laboratory.

Overview

System design integrates multicore processors from Intel Xeon, AMD EPYC, and accelerators such as NVIDIA Tesla, AMD Radeon Instinct, and specialized units like Google TPU. Interconnect fabrics include InfiniBand, Omni-Path Architecture, Cray Aries, and custom designs used in Fujitsu systems. Storage hierarchies draw on technologies from Seagate Technology, Western Digital, and projects like Lustre and GPFS. Facility considerations reference cooling solutions used at Oak Ridge National Laboratory and power planning akin to deployments at Lawrence Livermore National Laboratory and Los Alamos National Laboratory, with procurement influenced by agencies such as National Science Foundation, Department of Energy, and European Union research programs.

High-performance networking relies on protocols, hardware, and middleware developed by communities around IETF, IEEE 802, Mellanox Technologies, Cisco Systems, Juniper Networks, and backbone projects including Internet2 and ESnet. Technologies like RDMA, TCP/IP, MPI, UCX, and Open MPI mediate communication in clusters deployed at centers such as Argonne National Laboratory and NERSC. Grid and cloud movements led by Globus Toolkit, OpenStack, Amazon Web Services, Microsoft Azure, and Google Cloud Platform integrate with HPC workflows used in collaborations with European Organization for Nuclear Research, Max Planck Society, Riken, and RIKEN AICS.

Programming models include MPI, OpenMP, CUDA, OpenCL, HIP, Chapel, UPC, and languages from research at Lawrence Berkeley National Laboratory and Sandia National Laboratories. Compiler and runtime systems from GNU Compiler Collection, LLVM, Intel Parallel Studio, and PGI Compiler support optimizations used in numerical libraries like BLAS, LAPACK, ScaLAPACK, PETSc, Trilinos, and HDF5. Algorithmic advances draw on work by researchers affiliated with Courant Institute, Los Alamos National Laboratory, Stanford University, University of Illinois, and Cornell University for solvers, multigrid methods, fast multipole methods, and machine learning frameworks such as TensorFlow, PyTorch, and MXNet.

HPC accelerates science and engineering in domains seen at CERN experiments, Large Hadron Collider, Human Genome Project, Weather Research and Forecasting Model, European Centre for Medium-Range Weather Forecasts, NOAA, NASA, Pharmaceutical Research collaborations with Pfizer, Merck & Co., and GlaxoSmithKline, computational chemistry used by BASF, Dow Chemical Company, and ExxonMobil, as well as financial simulations employed by Goldman Sachs, JPMorgan Chase, and Morgan Stanley. Emerging uses include deep learning research at OpenAI, DeepMind, Allen Institute for AI, and climate modeling projects linked to Intergovernmental Panel on Climate Change assessments and initiatives at IPCC-affiliated centers.

Performance evaluation uses metrics and suites such as LINPACK, HPL, SPEC, IOZone, and community efforts like TOP500 and Green500 to assess throughput, energy efficiency, and scalability. Scalability studies reference parallel scaling results from projects at National Center for Supercomputing Applications, Pawsey Supercomputing Centre, Pangea, and benchmarks run on systems like Blue Waters and Tianhe-2. Procurement and evaluation often align with standards from ISO, IEEE, and funding bodies including NSF and DOE.

Challenges include exascale readiness pursued by DOE exascale projects, supply-chain concerns involving Semiconductor Manufacturing International Corporation, TSMC, and GlobalFoundries, and workforce development at universities like Carnegie Mellon University and Georgia Institute of Technology. Security and resilience address threats investigated by National Institute of Standards and Technology, US Cyber Command, and research at MITRE Corporation and RAND Corporation, covering side-channel attacks, firmware vulnerabilities, and trusted execution environments exemplified by Intel SGX. Future directions highlight quantum computing research at IBM Quantum, Google Quantum AI, IonQ, Rigetti Computing, neuromorphic efforts at Intel Labs and IBM Research, and convergence with AI initiatives at OpenAI and DeepMind, as pursued by consortiums including EuroHPC, PRACE, and national programs in Japan, China, India, and European Union.

Category:Supercomputing