Why AI Needs Compute Express Link (CXL)

Artificial intelligence (AI) is entering an era of unprecedented scale. From training trillion-parameter large language models (LLMs) to enabling real-time multimodal inference, AI workloads are reshaping the very foundations of data center infrastructure. While GPUs and accelerators have become the face of AI, a critical bottleneck lies behind the scenes: memory, bandwidth, latency, and scalability challenges often determine the success or limits of AI systems. This is where Compute Express Link (CXL) steps in, offering a transformative solution.

The Memory Bottleneck in AI

These are some of the key items creating the memory bottleneck in AI:

Training foundation models require enormous memory capacity, often exceeding what is available in a single GPU.
Inference at scale demands rapid access to large datasets without duplicating memory across GPUs.
Traditional architectures force CPUs, GPUs, and accelerators to operate in silos, creating inefficiencies.

CXL chart — ***Figure 1.*** *A diagram comparing traditional siloed memory (CPU, GPU islands) vs. CXL-enabled pooled memory (Source: Rambus)*

As per figure 1, in siloed architectures, each CPU or GPU is tied to its own isolated memory, often leaving capacity underused. With CXL pooling, all processors can access a unified shared memory space. This shift enables flexible scaling, better utilization, and improved performance for AI and data-intensive workloads.

How CXL Comes to the Rescue

Compute Express Link is an open-industry standard interconnect created to address the growing performance and scalability demands of modern workloads, especially in AI and high-performance computing. Unlike PCIe (Peripheral Component Interconnect Express), which is optimized for general-purpose I/O, CXL is specifically designed to connect CPUs, GPUs, accelerators, and memory devices with low latency and full cache coherency, ensuring that data remains synchronized across heterogeneous processors. By extending beyond traditional device-to-host communication, CXL enables a unified fabric where computing and memory resources can be shared seamlessly across an entire system or even a cluster of servers.

One of the most impactful capabilities CXL introduces is memory pooling, which allows memory to be dynamically allocated to different devices based on workload needs rather than being statically bound to a single CPU or GPU. This eliminates stranded capacity and enables higher utilization of memory resources. Another key feature is tiered memory, where high-performance local DRAM can be complemented by larger pools of CXL-attached memory—such as DDR5 or persistent memory—delivering a balance of speed, capacity, and cost efficiency. Finally, CXL is a cornerstone for composable infrastructure, where resources like compute, accelerators, and memory can be assembled and reassembled in real time to match shifting workload demands, from training massive, large language models to running latency-sensitive inference tasks.

In short, CXL represents a shift from static, siloed architectures towards flexible, fabric-based computing, paving the way for next-generation AI and data-intensive systems.

Table 1. Comparison table between PCIe vs. CXL in terms of latency, coherency and scalability for AI infrastructure:

Features	PCIe Technology	CXL Technology
Primary Use Case	High-speed I/O for peripherals (GPUs, NICs, SSDs)	Memory and accelerator interconnect (CPU ↔ GPU ↔ Memory
Latency	Higher latency (optimized for throughput, not cache coherency)	Low latency with cache coherency for memory sharing
Memory Coherency	No built-in coherency across devices	Full cache coherency between CPU, GPU, accelerators, and memory
Scalability	Point-to-point, limited memory sharing	Memory pooling, disaggregation, and fabric-based scalability
Data Movement	Requires copies between device memory spaces	Direct memory access across heterogeneous devices
Flexibility	Fixed per-device memory	Dynamic, composable infrastructure (shared memory pools)
AI/ML Impact	Memory duplication overhead, inefficient scaling	Efficient training/inference, reduced duplication, scalable LLMs

Why CXL Matters for AI Infrastructure

Large Language Models – CXL enables memory pooling across nodes, reducing the need for costly memory duplication.
Multi-GPU Inference – shared memory pools simplify deployment and reduce infrastructure overhead.
Composable AI Data Centers – instead of over-provisioning memory, CXL allows flexible scaling.

As per figure 2, CXL memory pooling allows multiple GPUs to share a unified memory pool, enabling efficient scaling of large language models.

The Road Ahead: CXL 3.0 and Beyond

With CXL 3.0, the industry is moving beyond incremental improvements to a fundamental shift in data center architecture. By introducing fabric topologies, multi-level switching, and coherent memory sharing across hosts, CXL 3.0 allows entire racks of servers to function as a unified, flexible AI fabric. This is especially significant for AI workloads such as large language models, where traditional GPU islands are constrained by memory limits and forced into complex model parallelism.

With shared, coherent memory pools accessible across GPUs, training becomes faster, duplication is reduced, and larger models can be supported more efficiently. Hyperscalers, cloud providers, and HPC facilities are already piloting CXL-enabled deployments, with vendors from Intel and AMD to Samsung and HPE building support into their roadmaps. Looking further ahead, CXL is expected to evolve toward even faster interconnects, finer-grained composability, and AI frameworks natively optimized for pooled memory.

In summary, CXL 3.0 is a pivot point—from server-centric computing to fabric-centric AI infrastructure—laying the foundation for the next generation of scalable AI and LLM systems.

Table 2. Timeline of CXL evolution (1.0 → 2.0 → 3.0) with AI-specific use cases:

Timeline Point	CXL Version	Evolution Highlights	AI-Relevant Use Cases
2019	CXL 1.0	First release with basic coherency and memory protocols.	Enables unified memory access by CPU & device; foundational for heterogeneous compute.
2020–2021	CXL 2.0	Introduced memory pooling, switching, and persistent memory.	Supports flexible memory pooling—critical for large model LLM training and inference.
2022–2023+	CXL 3.0	Advanced fabric architecture, coherent memory sharing, multi-host scaling.	Designed for disaggregated AI clusters, multi-GPU sharing, and coherent memory access at rack scale.

CXL Provides the Missing Link

AI’s future depends on more than GPUs—it requires rethinking how memory is connected, shared, and scaled. CXL provides the missing link, transforming isolated resources into a coherent, flexible AI infrastructure. For the AI infrastructure community, CXL represents not just technology, but a foundation for building the next generation of data centers.

Originally published: EE|Times, From GPUs to Memory Pools: Why AI Needs Compute Express Link (CXL), October 27, 2025

‍

SMART Modular Technologies helps customers around the world enable high performance computing through the design, development, and advanced packaging of integrated memory solutions. Our portfolio ranges from today’s leading edge memory technologies like CXL to standard and legacy DRAM and Flash storage products. For more than three decades, we’ve provided standard, ruggedized, and custom memory and storage solutions that meet the needs of diverse applications in high-growth markets. Contact us today for more information.

Talk to the Experts at
Penguin Solutions

At Penguin, our team designs, builds, deploys, and manages high-performance, high-availability HPC & AI enterprise solutions, empowering customers to achieve their breakthrough innovations.

Reach out today and let's discuss your infrastructure solution project needs.

From GPUs to Memory Pools: Why AI Needs Compute Express Link (CXL)

The Memory Bottleneck in AI

How CXL Comes to the Rescue

Why CXL Matters for AI Infrastructure

The Road Ahead: CXL 3.0 and Beyond

CXL Provides the Missing Link

Related Articles

Talk to the Experts at
Penguin Solutions

Solving complexity. Accelerating results.

Get in touch

Partners

Company

The Memory Bottleneck in AI

How CXL Comes to the Rescue

Why CXL Matters for AI Infrastructure

The Road Ahead: CXL 3.0 and Beyond

CXL Provides the Missing Link

Related Articles

CXL® Memory Solutions: For Memory Expansion & Memory Pooling

Industry-Specific Integrated Memory Solutions

Talk to the Experts atPenguin Solutions

Solving complexity. Accelerating results.

Get in touch

Partners

Company

Talk to the Experts at
Penguin Solutions