NVLink — LLMpedia

NVLink
Name	NVLink
Developer	NVIDIA
Type	Point-to-point interconnect
Generation	1, 2, 3, 4, 5
Width	Up to 72 lanes (NVLink 4.0)
Speed	Up to 900 GB/s (bidirectional)
Protocol	Packet switching
Related	PCI Express, InfiniBand, AMD Infinity Fabric

Contents

Overview
Technical specifications
Architecture and design
Comparison with other interconnects
Applications and implementations

NVLink. It is a high-bandwidth, energy-efficient interconnect technology developed by NVIDIA to facilitate fast communication between CPUs and GPUs, and between multiple GPUs. The technology was created to overcome the bandwidth limitations of traditional PCI Express slots, enabling more efficient data movement for HPC and AI workloads. Since its introduction, it has become a foundational element in NVIDIA DGX systems and various supercomputing platforms.

Overview

NVLink was first publicly introduced by NVIDIA in 2014 alongside the Pascal GPU architecture, marking a strategic shift from reliance on PCI Express. The primary motivation was to provide a direct, high-speed pathway for data to move between the CPU memory and GPU memory, and between GPUs themselves, which is critical for scalable parallel computing. Its development was driven by the demands of emerging workloads in scientific research, deep learning, and computational simulation. The technology has evolved through several generations, each significantly increasing bandwidth and reducing latency to support ever-larger AI models and simulations.

Technical specifications

Each generation of the interconnect has substantially increased the available bandwidth. The first generation, used in Pascal-based Tesla P100 products, offered 160 GB/s of bidirectional bandwidth per link. The second generation, featured in the Volta GV100 GPU, increased this to 300 GB/s. The third generation, implemented in the Ampere architecture like the A100, provided 600 GB/s. The fourth generation, found in the Hopper GH200 Grace Hopper Superchip, reaches 900 GB/s. The physical link utilizes multiple lanes, with NVLink 4.0 employing up to 72 lanes, and uses a packet-switched network protocol for efficient data routing. These specifications far exceed the bandwidth of contemporary PCI Express standards.

Architecture and design

The architecture employs a point-to-point or switched network topology, allowing multiple GPUs within a system to communicate directly without passing data through the CPU or the PCI Express root complex. This design is central to the NVIDIA DGX platform, where it creates a unified, high-bandwidth fabric. Key design elements include support for cache-coherent operations between CPU and GPU memory in systems utilizing IBM Power processors or NVIDIA Grace CPUs, enabling a unified memory space. The physical layer is engineered for energy efficiency, providing more bandwidth per watt than traditional interconnects, which is crucial for large-scale deployments in data centers and supercomputers like the Perlmutter system.

Comparison with other interconnects

When compared to the ubiquitous PCI Express standard, it offers substantially higher bandwidth and lower latency, which is vital for GPU-centric computing. Unlike PCI Express, which is a general-purpose I/O bus, it is a specialized, high-performance interconnect designed explicitly for processor-to-processor communication. Compared to open-standard network interconnects like InfiniBand or Ethernet, which connect nodes across a network, it is designed for intra-node connectivity within a single server or cabinet. Competitor AMD offers a similar technology called AMD Infinity Fabric for connecting its CPUs and GPUs. The technology's integration with NVIDIA's overall SoC and platform strategy, such as in the Grace Hopper Superchip, gives it a tightly optimized performance profile for the company's ecosystem.

Applications and implementations

Its primary application is in accelerating large-scale HPC and AI workloads. It is a core component of NVIDIA DGX integrated systems, which are turnkey appliances for deep learning research and development. Major supercomputing installations leveraging the technology include the Summit and Sierra systems at Oak Ridge National Laboratory and Lawrence Livermore National Laboratory, respectively, and the more recent Perlmutter system. The technology is also fundamental to the NVIDIA HGX platform, a baseboard design used by major OEMs like HPE, Dell, and Lenovo to build powerful servers for data centers. Its role is critical in training massive generative AI models and conducting complex scientific simulations in fields like climate science and genomics. Category:Computer hardware Category:Computer buses Category:NVIDIA