Volta (microarchitecture)

Volta (microarchitecture)
Name	Volta
Developer	NVIDIA
Codename	Volta
Architecture	GPU microarchitecture
Introduced	2017
Process	12 nm (TSMC)
Successor	Turing

Contents

Overview
Architecture and Design
Performance and Benchmarks
Compute Features (Tensor Cores, FP formats)
Memory and Cache Subsystem
Implementation and Products
Reception and Legacy

Volta (microarchitecture) is a GPU microarchitecture developed by NVIDIA and launched in 2017 for high-performance computing and deep learning. It was introduced with products targeting HPC, AI research, and data center workloads, and represented a major shift in NVIDIA's roadmap between the Pascal architecture and Turing (microarchitecture). Volta emphasized mixed-precision acceleration, memory bandwidth, and programmability for frameworks such as TensorFlow, PyTorch, and Caffe.

Overview

Volta was announced by Jensen Huang at NVIDIA GTC and positioned for customers including Oak Ridge National Laboratory, Lawrence Livermore National Laboratory, and Argonne National Laboratory. The architecture followed design work influenced by previous NVIDIA architectures like Kepler, Maxwell (microarchitecture), and Pascal (microarchitecture), while anticipating features later seen in Turing (microarchitecture) and Ampere (microarchitecture). Volta targeted markets served by competitors such as AMD and Intel for compute, and integrated with software ecosystems including CUDA, cuDNN, and cuBLAS. Products based on Volta were adopted by organizations like Facebook, Google, Microsoft, and Amazon Web Services for model training and inferencing.

Architecture and Design

Volta introduced a new streaming multiprocessor design and architectural blocks that expanded on concepts from Fermi (microarchitecture), Kepler (microarchitecture), and Pascal (microarchitecture). The microarchitecture incorporated specialized hardware for matrix operations akin to accelerators in Google TPU projects and leveraged mixed-precision units inspired by research from Stanford University and MIT. Volta's die integrated hundreds of compute cores and tensor units, and used a high-bandwidth memory interface connected to HBM2 stacks, similar to implementations by SK Hynix, Samsung Electronics, and Micron Technology. The design also reflected influences from academic work at institutions such as Berkeley and ETH Zurich on parallel processing and memory hierarchies.

Performance and Benchmarks

Benchmarks for Volta-based cards like the NVIDIA Tesla V100 were published by laboratories including Lawrence Berkeley National Laboratory and vendors such as Dell Technologies, Hewlett Packard Enterprise, and Supermicro. Results showed substantial gains over Pascal in deep learning training throughput for networks used by DeepMind, OpenAI, and research groups at University of Toronto. Macrobenchmarks compared Volta to offerings from AMD Radeon Instinct and projections from Intel Nervana efforts. Standardized suites such as SPEC, MLPerf, and vendor-specific benchmarks from NVIDIA highlighted performance in workloads including image recognition models used in competitions like ImageNet and sequence models from groups at Carnegie Mellon University.

Compute Features (Tensor Cores, FP formats)

A defining feature of Volta was the introduction of Tensor Core hardware to accelerate mixed-precision matrix multiply-accumulate operations used in deep learning training and inference. Tensor Cores performed operations combining FP16 and FP32 precision, and supported accumulation schemes relevant to numerical analysis research at Lawrence Livermore National Laboratory and Argonne National Laboratory. Volta also supported FP64 double precision for scientific computing workloads common in weather modeling at NOAA and simulations run at CERN. The architecture enabled integration with software stacks such as cuDNN, TensorRT, Horovod, and libraries developed by researchers at University of California, Berkeley and Stanford University.

Memory and Cache Subsystem

Volta employed a high-bandwidth memory subsystem using HBM2 stacks and a widened memory interface to deliver bandwidth critical to large-model training used by Facebook AI Research and Google Brain. On-die cache structures and shared memory improvements borrowed concepts from past NVIDIA designs and academic cache coherence work from MIT CSAIL and ETH Zurich. The memory hierarchy supported unified addressing for programming models in CUDA and interoperability with interconnects like NVLink and system fabrics used by Cray and HPE in multi-GPU clusters. Error correction and reliability features aligned with datacenter expectations set by Amazon Web Services and Microsoft Azure.

Implementation and Products

The flagship product implementing Volta was the NVIDIA Tesla V100, available in PCIe and SXM2 form factors, deployed in systems by vendors including IBM, Dell EMC, HPE, and Lenovo. Cloud offerings from Google Cloud Platform, Amazon EC2, and Microsoft Azure provided Volta-based instances for customers in AI and HPC. Volta also powered specialized appliances and supercomputers, featuring in systems like Summit and contributing to projects coordinated by Oak Ridge National Laboratory and University of Illinois Urbana-Champaign.

Reception and Legacy

Volta received praise from research communities at Stanford University, MIT, and University of Toronto for accelerating deep learning research and was cited in papers by groups at DeepMind and OpenAI. Its Tensor Cores influenced subsequent NVIDIA architectures including Turing (microarchitecture) and Ampere (microarchitecture), and the microarchitecture shaped expectations for accelerators developed by competitors such as AMD Instinct and proposals from Intel labs. Volta's impact persisted in production deployments across cloud computing providers and research institutions, and it is frequently referenced in academic and industrial literature on GPU-accelerated computing.

Category:NVIDIA microarchitectures