NVIDIA Volta — LLMpedia

NVIDIA Volta
Name	Volta
Designer	NVIDIA
Model	GPU Microarchitecture
Succeeded by	Turing
Preceded by	Pascal
Date	May 10, 2017
Fab	TSMC 12 nm FinFET
Products	Tesla V100, Titan V, Quadro GV100
Cache	L1 (128 KB per SM), L2 (6 MB)
Shader	CUDA cores (up to 5120)
Brand	Tesla, Quadro, Titan

Contents

Architecture
Features
Products
Performance
Software and ecosystem
Reception and impact

NVIDIA Volta. Volta is a GPU microarchitecture developed by NVIDIA, succeeding the Pascal architecture and preceding Turing. It was formally announced in 2017 at the GPU Technology Conference, with a primary focus on accelerating artificial intelligence and high-performance computing workloads. The architecture's flagship innovation was the Tensor Core, a dedicated processing unit designed for mixed-precision matrix multiplication.

Architecture

The Volta architecture represented a significant redesign from its predecessor, Pascal, introducing several foundational changes. It was manufactured on a custom version of the TSMC 12 nm FinFET process, allowing for greater transistor density and efficiency. A major structural innovation was the new Streaming Multiprocessor (SM) design, which partitioned integer and floating-point datapaths to improve instruction-level parallelism and scheduling efficiency. The architecture also featured a new memory subsystem with an enlarged 6 MB L2 cache and an updated NVLink 2.0 interconnect for significantly higher bandwidth between GPUs or between a GPU and a CPU like the IBM Power9. This design was optimized for the massive parallelism required by deep learning frameworks and scientific computing applications.

Features

Volta's most defining feature was the introduction of dedicated Tensor Cores, which could perform 4x4 matrix multiplication and accumulation operations in mixed FP16 and FP32 precision, dramatically accelerating operations central to neural network training. The architecture also debuted Independent Thread Scheduling through its new Volta Multi-Processor, enabling finer-grained synchronization and more efficient execution of parallel threads, a concept leveraged by programming models like CUDA. Other key features included enhanced HBM2 memory support with improved ECC, and comprehensive support for the PCI Express 3.0 interface. For virtualization and cloud deployments, Volta incorporated new hardware capabilities for GPU virtualization, improving isolation and performance in multi-tenant environments used by cloud providers like Amazon Web Services.

Products

The first and primary product based on Volta was the Tesla V100 accelerator, launched in 2017 and available in PCI Express and SXM2 form factors, with 16 GB or 32 GB of HBM2 memory. For the professional visualization market, NVIDIA released the Quadro GV100. The architecture also reached consumer enthusiasts with the Titan V, launched by NVIDIA CEO Jensen Huang in late 2017, which featured 12 GB of HBM2 memory. These products were integrated into high-performance systems from partners like Dell Technologies, Hewlett Packard Enterprise, and supercomputing installations worldwide, including the U.S. Department of Energy's Summit and Sierra systems at Oak Ridge National Laboratory and Lawrence Livermore National Laboratory.

Performance

The performance leap with Volta, particularly in AI workloads, was substantial. The Tesla V100's Tensor Cores delivered up to 125 TFLOPS of deep learning performance, a 12x increase over the previous-generation Pascal-based Tesla P100 for certain training tasks. In traditional high-performance computing, the architecture also showed significant gains, with the SPECrate2017_fp_base benchmark demonstrating strong scaling. The improved NVLink 2.0 allowed multiple GPUs to act as a single, large memory pool, boosting performance for applications like molecular dynamics simulations run on software such as NAMD and AMBER.

Software and ecosystem

Volta's capabilities were unlocked through updates to NVIDIA's software stack. Key libraries like the CUDA Toolkit, cuDNN, and NVIDIA TensorRT were optimized to leverage the new Tensor Cores. The architecture was fully supported by major deep learning frameworks, including TensorFlow, PyTorch, and MXNet. For scientific computing, support was extended in platforms like OpenACC and through directives-based programming. NVIDIA also introduced the NGC catalog, a hub for GPU-optimized containers and software, simplifying deployment for researchers and enterprises using systems from Dell EMC or Super Micro Computer.

Reception and impact

Volta was widely hailed as a groundbreaking architecture that redefined the landscape of accelerated computing. Reviewers from outlets like AnandTech and TechPowerUp praised its revolutionary Tensor Cores and massive performance gains in AI. It quickly became the standard in academic and industrial AI research, powering breakthroughs from organizations like OpenAI and Google Brain. The architecture's deployment in the Summit supercomputer, which claimed the top spot on the TOP500 list in 2018, underscored its impact on computational science. Volta's focus on specialized AI hardware directly influenced the industry, prompting responses from competitors like AMD and Intel, and set the stage for its successor, Turing, which would bring Tensor Cores to the broader consumer market.

Category:NVIDIA microarchitectures Category:Graphics processing units Category:2017 in computing