Large Memory Servers Designed for Real Time AI Inferencing

Why Big Memory Servers

Deploy Abundant Memory to Unlock AI Performance

AI system performance for inference is fundamentally constrained by memory.

As the complexity of AI models increases and organizations deploy enterprise-scale inference, clusters face the "memory wall"—current limitations in memory capacity and bandwidth, which constrain GPU performance and drastically increase latency.

Penguin Solutions leverages 30 years of experience in advanced memory to solve this problem with breakthrough CXL-based memory servers that enable large pools of shared system memory. This disaggregation frees GPUs from current memory limitations to give each node what it needs, when it needs it, and provides new flexibility and efficiency to cluster architectures.

By deploying abundant, high-performance memory, you can maximize existing GPU investments and efficiently scale new AI infrastructure for enterprise-scale, production inference.

Create Pooled Memory for Shared System Use

Disaggregated memory is accessible to all nodes, enabling efficient resource utilization and supporting memory-intensive AI workloads.

Meet Latency SLAs for Inference

Delivers consistent, low-latency performance to support demanding real-time AI inference requirements.

Optimize Cluster Performance

Maximize throughput and scalability with balanced memory architecture and reduced compute bottlenecks.

Key Benefits of Big Memory Servers

Penguin Solutions MemoryAI™ KV Cache Server: Redefining AI Inference Speed and Scalability

Penguin Solutions’ patent-pending MemoryAI KV cache server is the industry’s first production-ready KV cache server leveraging CXL memory to deliver a high-capacity memory appliance purpose-built to address the memory wall and support high-performance inference at scale.

The MemoryAI KV cache server works by storing and reusing computed key/value (KV) pairs, offloading the KV cache from GPU memory. This solution eliminates memory constraints, reduces time-to-first-token (TTFT), and addresses re-compute challenges, enabling superior performance for demanding AI inference workloads.

Designed to expand local RAM accessible to GPUs, it alleviates memory bottlenecks and allows organizations to fully leverage existing GPU investments or design clusters optimized for high-throughput inference.

As models, context windows, and concurrency demands grow, the KV cache server is essential for enterprises looking to meet stringent latency and performance expectations in AI-driven environments.

Break Through Memory Barriers

Offload the KV cache to a dedicated, high-capacity CXL-based server, eliminating AI inference memory bottlenecks.

Accelerated AI Processing

Significantly reduce latency and boost throughput with intelligent KV cache reuse.

Scale Without Limits

Deploy with up to 11 TB of total memory, including an impressive 8 TB of advanced DDR5 CXL memory.

Unlock GPU Efficiency

Prevent compute idle time.

CXL-Enabled Memory Servers

Processor

PCIe Slots

Memory Capacity

No items found.

Processor

PCIe Slots

Memory Capacity

No items found.

Processor

PCIe Slots

Memory Capacity

No items found.

Processor

PCIe Slots

Memory Capacity

MemoryAI™ KV Cache Server

Dual AMD EPYC™ 9005 Series

8x PCIe Gen5 x16 FHFL, 2x PCIe Gen5 x16 LP

11TB DDR5 - 6400 MT/s (88DIMMs)

MemoryAI™ KV Cache Server

Penguin Solutions’ MemoryAI™ KV Cache Server is an Altus-based 4U rackmount server designed specifically for use as a KV cache server during AI inference. It features dual AMD EPYC CPUs and enables up to 11 TB of CXL-based memory (SMART CXA-8F2W).

Rack Unit Size:

Processor:

Dual AMD EPYC™ 9005 Series

PCIe Slots:

8x PCIe Gen5 x16 FHFL, 2x PCIe Gen5 x16 LP

Memory Capacity:

11TB DDR5 - 6400 MT/s (88DIMMs)

Request Pricing

Altus XE4318GT-CXL

Dual AMD EPYC™ 9005 Series

8x PCIe Gen5 x16 FHFL, 2x PCIe Gen5 x16 LP

11TB DDR5 - 6400 MT/s (88DIMMs)

Altus XE4318GT-CXL

Altus® XE4318GT-CXL server is a big memory power house. Leveraging the latest compute technology from AMD and latest memory technology from SMART Modular, the Altus XE4318GT-CXL is a fully validated and supported CXL server from Penguin Solutions with CXL memory expansion up to 22TB per server.

Rack Unit Size:

Processor:

Dual AMD EPYC™ 9005 Series

PCIe Slots:

8x PCIe Gen5 x16 FHFL, 2x PCIe Gen5 x16 LP

Memory Capacity:

11TB DDR5 - 6400 MT/s (88DIMMs)

Request Pricing

Processor

PCIe Slots

Memory Capacity

No items found.

Processor

PCIe Slots

Memory Capacity

No items found.

Processor

PCIe Slots

Memory Capacity

No items found.

Open Compute Project Infrastructure

21-inch CXL-Enabled OCP Server

Leading-edge organizations can choose Open Compute Project (OCP) infrastructure to scale out their infrastructure cost-effectively. There is a strong argument for using OCP-based hardware in a data center: OCP is less expensive to buy and to maintain, reduces points of failure, is designed for more efficient power management, and significantly reduces security issues.

1OU

Processor

PCIe Slots

Memory Capacity

No items found.

2OU

Processor

PCIe Slots

Memory Capacity

No items found.

Request a Callback

Talk to the CXL Experts at Penguin Solutions

Reach out today and learn more how we help you reach your AI & HPC infrastructure project goals. Our team designs, builds, deploys, and manages high-performance, high-availability enterprise solutions, empowering customers to achieve their breakthrough innovations.

The AI Factory Platform Company

Penguin Solutions is a leading provider of memory and AI infrastructure, powering the AI factories of the future for enterprises, sovereign AI initiatives, and neocloud providers.

‍

Built on decades of engineering expertise at the intersection of memory and AI/HPC infrastructure, we bring together differentiated infrastructure software, advanced memory, compute systems, end-to-end services, and industry-leading partner solutions in a full-stack AI factory platform designed to help customers deploy and scale AI workloads with speed and precision.

Break Through the Memory Wall. Unleash Next-Level AI Inference.

Deploy Abundant Memory to Unlock AI Performance

Create Pooled Memory for Shared System Use

Meet Latency SLAs for Inference

Optimize Cluster Performance

Penguin Solutions MemoryAI™ KV Cache Server: Redefining AI Inference Speed and Scalability

Break Through Memory Barriers

Accelerated AI Processing

Scale Without Limits

Unlock GPU Efficiency

CXL-Enabled Memory Servers

MemoryAI™ KV Cache Server

Altus XE4318GT-CXL

21-inch CXL-Enabled OCP Server

Talk to the CXL Experts at Penguin Solutions

The AI Factory Platform Company

Get in touch

Partners

Company