Compute Express Link

Compute Express Link
Name	Compute Express Link
Caption	Official CXL Consortium logo
Other names	CXL
Inventor	Intel Corporation
Superseded	PCI Express
Related	Gen-Z, CCIX
Website	https://www.computeexpresslink.org

Contents

Overview
Technical Specifications
Architecture and Protocol
Implementations and Products
Comparison with Alternatives
Industry Adoption and Ecosystem

Compute Express Link. It is a high-speed, cache-coherent interconnect for processors, memory, and accelerators designed to address the growing performance demands of data-centric workloads. Developed by a consortium of industry leaders, it builds upon the physical and electrical interfaces of the widely adopted PCI Express standard while adding crucial new protocols for coherent memory sharing. This enables efficient, low-latency resource pooling and sharing across heterogeneous computing elements, a key requirement for modern data centers, high-performance computing, and artificial intelligence systems.

Overview

The standard was initially introduced by Intel Corporation in 2019 to overcome bottlenecks in traditional system architectures where accelerators like GPUs, FPGAs, and SmartNICs operated with isolated, inefficient memory pools. The CXL Consortium, founded by Intel, AMD, Google, HPE, Microsoft, and Dell Technologies, was formed to steward its development as an open industry standard. Its primary goal is to maintain a unified, coherent memory space between the host CPU and attached devices, dramatically improving performance for workloads in cloud computing, scientific computing, and machine learning. This approach contrasts with earlier interconnects that required complex software-managed memory copies between separate address domains.

Technical Specifications

The specification leverages the physical layer of PCI Express 5.0 and later generations, ensuring electrical compatibility with existing infrastructure. It defines three distinct protocol layers: CXL.io, which is essentially PCI Express for initialization, device discovery, and I/O operations; CXL.cache, which allows a device to cache host memory; and CXL.mem, which enables the host processor to access device-attached memory coherently. The latest versions, including CXL 3.0, introduce features like memory pooling, sharing, and fabric capabilities, supporting switching for multi-host systems. Key performance metrics include significantly reduced latency for memory accesses compared to traditional PCI DMA operations and support for data transfer rates aligning with PCIe 5.0's 32 GT/s per lane.

Architecture and Protocol

The architecture is built on a layered model that operates in conjunction with the CPU's coherence domain, typically managed by the processor's memory controller. The CXL.cache protocol allows an accelerator to snoop and cache data from the host's DDR memory, while CXL.mem makes high-bandwidth memory on an accelerator, such as HBM or GDDR, appear as part of the system's physical address space. This coherence is managed using a directory-based or snoop-based model, extending the CPU's existing cache coherence mechanisms. The CXL.io layer handles all standard PCI-SIG enumeration and I/O traffic, ensuring backward compatibility with the vast ecosystem of PCI Express devices and software like the Linux kernel.

Implementations and Products

Initial implementations have been led by major semiconductor and system vendors. Intel's Xeon Scalable processors, starting with the Sapphire Rapids microarchitecture, integrated the controller directly into the CPU. AMD has also incorporated support in its EPYC server processors under the brand name Infinity Fabric. Accelerator vendors like NVIDIA with its BlueField DPUs, Xilinx (now part of AMD) for FPGAs, and Samsung for computational storage drives have announced compatible products. Major OEMs such as HPE, Dell, and Lenovo have begun shipping servers with enabled systems.

Comparison with Alternatives

Several other cache-coherent interconnects emerged to solve similar problems, leading to a period of competition and convergence. Gen-Z was a memory-semantic protocol that also aimed for memory pooling but has since been merged into the specification. The CCIX consortium, backed by Arm, Xilinx, and others, provided coherence over PCI Express but saw limited adoption. NVLink, developed by NVIDIA, offers high-bandwidth coherence primarily between NVIDIA GPUs and IBM POWER processors, making it a more proprietary solution. The key advantage is its strong industry consortium backing, PCI Express compatibility, and its focus on being an open, CPU-agnostic standard, which has led to broader support across the x86-64 and emerging Arm architecture server ecosystems.

Industry Adoption and Ecosystem

Adoption is being driven by the insatiable demand for efficient heterogeneous computing in enterprise and hyperscale environments. The CXL Consortium has grown to include over 200 members, including key players like IBM, Meta, AWS, SK Hynix, and Micron Technology. The technology is central to emerging architectures for composable disaggregated infrastructure, where resources like memory and accelerators can be dynamically pooled and allocated across servers. Industry groups like the Open Compute Project are developing specifications for enabled hardware. While widespread deployment in data centers is ongoing, the ecosystem is rapidly maturing with development tools from companies like Synopsys and Cadence Design Systems, and operating system support within the Linux kernel and Microsoft Windows driver models.