CDNA (microarchitecture)

CDNA (microarchitecture)
Name	CDNA
Designer	Advanced Micro Devices
Bits	64-bit
Introduced	2020
Design	ISA
Predecessor	Graphics Core Next

Contents

Overview and Development
Architecture and Design
Generations and Features
Applications and Implementations
Comparison with RDNA Architecture

CDNA (microarchitecture). CDNA is a microarchitecture developed by Advanced Micro Devices (AMD) specifically for high-performance computing (HPC) and data center accelerator workloads. It represents a bifurcation from the company's consumer-focused RDNA graphics architecture, prioritizing compute efficiency and double-precision performance for scientific and enterprise applications. The architecture is implemented in AMD Instinct series accelerators, such as the MI100 and MI200, which compete in markets served by Nvidia's Tesla and Hopper GPUs.

Overview and Development

The development of CDNA was announced by Advanced Micro Devices in 2020 as a strategic move to create a dedicated architecture for the data center and high-performance computing segments. This decision was influenced by the growing market for accelerators in exascale computing projects, such as those funded by the United States Department of Energy. The architecture evolved from the foundational Graphics Core Next (GCN) design, which had long powered AMD's professional and consumer GPUs, but was optimized for radically different performance characteristics. Key figures in its development included executives like Lisa Su and engineering teams at AMD's research facilities, aiming to challenge the dominance of Nvidia in the artificial intelligence and supercomputer arenas. The first product showcasing the architecture was the AMD Instinct MI100, which targeted leadership-class systems like Oak Ridge National Laboratory's Frontier.

Architecture and Design

The CDNA architecture is fundamentally designed for massive parallel compute throughput, eschewing many graphics-specific hardware elements found in consumer RDNA designs. Its core is built around a scalable array of compute units (CUs) that feature enhanced matrix cores for accelerating mixed-precision operations critical for machine learning workloads like training and inference. A major innovation is the implementation of Infinity Fabric links on the processor die itself, enabling high-bandwidth, low-latency communication between multiple GPUs in a system without relying solely on PCI Express. The memory subsystem employs advanced High Bandwidth Memory (HBM) stacks, such as HBM2 and HBM2E, connected via a wide memory controller interface to feed the compute units. This design is complemented by robust error-correcting code memory (ECC) support, a necessity for the reliability demands of scientific computing and enterprise environments.

Generations and Features

The first generation, CDNA 1, debuted in the AMD Instinct MI100 accelerator, introducing the foundational matrix core technology and support for PCI Express 4.0. Its successor, CDNA 2, represented a significant leap and powered the AMD Instinct MI200 series, including the MI250X. CDNA 2 introduced AMD Infinity Cache, a large last-level cache to reduce memory latency, and was the first GPU to use a multi-chip module design, codenamed "Aldebaran," combining two compute dies. It also featured enhanced matrix cores supporting new data types like FP64 and BFLOAT16, and was the first to offer unified memory access between CPU and GPU in conjunction with AMD EPYC processors via AMD Infinity Fabric. The subsequent CDNA 3 generation, announced for future MI300 accelerators, promises further integration, potentially leveraging chiplet technology and advanced packaging like 3D-stacked memory to target upcoming exascale computing systems.

Applications and Implementations

CDNA-based accelerators are primarily deployed in large-scale supercomputer installations and private data centers for computationally intensive tasks. Notable deployments include the Frontier system at Oak Ridge National Laboratory and the LUMI system in Finland, both of which leverage AMD Instinct MI250X accelerators to achieve exascale computing milestones. These systems are used for simulations in fields like computational fluid dynamics, climate modeling, and molecular dynamics, as well as for training large artificial intelligence models. In the commercial sector, companies utilize these accelerators for data analytics, financial modeling, and genomic sequencing through partnerships with original equipment manufacturers like Hewlett Packard Enterprise and Supermicro. The architecture also supports key software ecosystems, including ROCm, AMD's open software platform for GPU computing, which competes with Nvidia's CUDA platform.

Comparison with RDNA Architecture

While both architectures share a lineage from Graphics Core Next, CDNA and RDNA are optimized for distinctly different markets. The RDNA architecture, used in Radeon RX series graphics cards like the RX 6000 series, prioritizes high frames per second and visual fidelity for video games, featuring hardware for geometry processing and pixel shading. In contrast, CDNA removes most graphics-focused hardware, reallocating the transistor budget towards more compute units, larger matrix cores, and high-bandwidth Infinity Fabric interconnects. This gives CDNA a decisive advantage in double-precision (FP64) performance, a key metric for scientific computing, whereas RDNA excels in single-precision (FP32) tasks common in gaming. Furthermore, CDNA incorporates extensive reliability features like full error-correcting code memory support, which are typically absent or reduced in consumer-focused RDNA designs.