Hopper (microarchitecture)

Hopper (microarchitecture)
Name	Hopper
Designer	Nvidia
Model	GPU
Succeeded by	Blackwell
Preceded by	Ampere
Released	2022
Fab	TSMC 4N
Compute	CUDA 9.0
Type	Parallel computing
Used in	Nvidia H100, Nvidia GH200

Contents

Overview
Architecture
Performance and features
Products
Software support
Reception and impact

Hopper (microarchitecture). Hopper is a GPU microarchitecture developed by Nvidia, succeeding the Ampere architecture and officially unveiled in 2022. It is named for computer science pioneer Grace Hopper and is designed primarily for HPC and data center AI workloads. The architecture introduced several groundbreaking technologies, including a new Transformer engine and NVLink-based Chiplet design, to accelerate the largest-scale computational tasks.

Overview

The Hopper architecture represents a significant leap in Nvidia's data center strategy, focusing on Exascale computing and advanced generative AI models. Its development was driven by the computational demands of large language models like GPT-4 and scientific simulations conducted at facilities such as the Lawrence Livermore National Laboratory. Key to its design is the integration of a dedicated Transformer engine for dynamic mixed-precision calculations and a coherent memory system via NVLink-C2C technology. The first product based on Hopper, the Nvidia H100, began shipping in late 2022 to partners like Amazon Web Services and Microsoft Azure.

Architecture

At its core, the Hopper architecture is built on TSMC's customized 4N manufacturing process. It introduces the Streaming Multiprocessor (SM) with fourth-generation Tensor Cores capable of new 8-bit floating-point formats like FP8 and transformer engine acceleration. A major innovation is its modular design using Chiplets connected by NVLink-based interconnects, specifically NVLink-C2C, enabling a coherent interface with Grace CPUs. The architecture also features a revamped Memory hierarchy with HBM3 support and advanced security features via Confidential computing capabilities.

Performance and features

Hopper delivers a substantial performance increase over its predecessor, Ampere, particularly for AI training and AI inference. The Transformer engine can accelerate transformer model training by up to six times, while the new dynamic programming capabilities improve efficiency. Its NVLink fourth-generation interconnect provides 900 GB/s of bandwidth, facilitating massive GPU cluster scaling. Support for PCIe 5.0 and CXL 3.0, alongside HBM3 memory with ECC, ensures high data throughput and reliability for workloads run on platforms like Microsoft Azure and Google Cloud Platform.

Products

The flagship product implementing the Hopper architecture is the Nvidia H100 accelerator, available in PCIe and SXM form factors. This was followed by the Nvidia GH200 Grace Hopper Superchip, which pairs a Hopper GPU with a Grace CPU using NVLink-C2C. These products are integrated into servers and supercomputers from original equipment manufacturers like Hewlett Packard Enterprise, Dell Technologies, and Lenovo. Systems powered by Hopper are deployed in major facilities, including the EuroHPC JU's LEONARDO and the Texas Advanced Computing Center.

Software support

Hopper is supported by Nvidia's comprehensive software stack, including the CUDA 12.0 toolkit and associated libraries like cuDNN and NCCL. Key frameworks such as PyTorch, TensorFlow, and JAX are optimized to leverage the Transformer engine and new data formats. The platform also benefits from the Nvidia AI Enterprise suite and the Nvidia Omniverse for simulation workloads. System management is handled through tools like Nvidia Base Command Manager and the Nvidia DGX platform, which provides a full-stack solution for AI research institutions like OpenAI.

Reception and impact

Upon its announcement at Nvidia GTC, the Hopper architecture was widely praised for its targeted advancements in AI acceleration. Industry analysts from firms like Moor Insights & Strategy highlighted its potential to redefine data center economics for large-scale Machine learning. Its adoption by major Cloud computing providers, including Oracle Cloud Infrastructure and Alibaba Cloud, underscored its market impact. The architecture's design influenced the industry's move toward Chiplet-based GPUs and set the stage for its successor, the Blackwell architecture, cementing Nvidia's leadership in the AI accelerator market. Category:Nvidia microarchitectures Category:Graphics processing units Category:2022 in computing