Managing AI Factory Memory
Products > MemoryAI™ KV Cache Server

Break Through the Memory Wall. Unleash Next-Level AI Inference.

Accelerate your AI applications and scale efficiently with Penguin Solutions’ big memory server appliances.

Let's Talk
Request Pricing
Why Big Memory Servers

Deploy Abundant Memory to Unlock AI Performance

AI system performance for inference is fundamentally constrained by memory.

As the complexity of AI models increases and organizations deploy enterprise-scale inference, clusters face the "memory wall"—current limitations in memory capacity and bandwidth, which constrain GPU performance and drastically increase latency.

Penguin Solutions leverages 30 years of experience in advanced memory to solve this problem with breakthrough CXL-based memory servers that enable large pools of shared system memory. This disaggregation frees GPUs from current memory limitations to give each node what it needs, when it needs it, and provides new flexibility and efficiency to cluster architectures.

By deploying abundant, high-performance memory, you can maximize existing GPU investments and efficiently scale new AI infrastructure for enterprise-scale, production inference.

Download datasheet

Create Pooled Memory for Shared System Use

Disaggregated memory is accessible to all nodes, enabling efficient resource utilization and supporting memory-intensive AI workloads.

Meet Latency SLAs for Inference

Delivers consistent, low-latency performance to support demanding real-time AI inference requirements.

Optimize Cluster Performance

Maximize throughput and scalability with balanced memory architecture and reduced compute bottlenecks.

Key Benefits of Big Memory Servers

Penguin Solutions MemoryAI™ KV Cache Server: Redefining AI Inference Speed and Scalability

Penguin Solutions’ patent-pending MemoryAI KV cache server (Altus® XE4318GT-KVC) is the industry’s first production-ready KV cache server leveraging CXL memory to deliver a high-capacity memory appliance purpose-built to address the memory wall and support high-performance inference at scale.

The MemoryAI KV cache server works by storing and reusing computed key/value (KV) pairs, offloading the KV cache from GPU memory. This solution eliminates memory constraints, reduces time-to-first-token (TTFT), and addresses re-compute challenges, enabling superior performance for demanding AI inference workloads.

Designed to expand local RAM accessible to GPUs, it alleviates memory bottlenecks and allows organizations to fully leverage existing GPU investments or design clusters optimized for high-throughput inference.

As models, context windows, and concurrency demands grow, the KV cache server is essential for enterprises looking to meet stringent latency and performance expectations in AI-driven environments.

Let's Talk
Request Pricing

Break Through Memory Barriers

Offload the KV cache to a dedicated, high-capacity CXL-based server, eliminating AI inference memory bottlenecks.

Accelerated AI Processing

Significantly reduce latency and boost throughput with intelligent KV cache reuse.

Scale Without Limits

Deploy with up to 11 TB of total memory, including an impressive 8 TB of advanced DDR5 CXL memory.

Unlock GPU Efficiency

Prevent compute idle time.

CXL-Enabled Memory Servers

1U
Processor
PCIe Slots
Memory Capacity
No items found.
2U
Processor
PCIe Slots
Memory Capacity
No items found.
3U
Processor
PCIe Slots
Memory Capacity
No items found.
4U
Processor
PCIe Slots
Memory Capacity
  • Altus XE4318GT-KVC

    Altus XE4318GT-KVC

    Penguin Solutions’ Altus XE4318GT-KVC server is an Altus-based 4U rackmount server designed specifically for use as a KV cache server during AI inference. It features dual AMD EPYC CPUs and enables up to 11 TB of CXL-based memory (SMART CXA-8F2W).

    Rack Unit Size:
    4U
    Processor:
    Dual AMD EPYC™ 9005 Series
    PCIe Slots:
    8x PCIe Gen5 x16 FHFL, 2x PCIe Gen5 x16 LP
    Memory Capacity:

    11TB DDR5 - 6400 MT/s (88DIMMs)

    Download Datasheet
  • The Penguin Solutions® Altus® XE4318GT-CXL server.

    Altus XE4318GT-CXL

    Altus® XE4318GT-CXL server is a big memory power house. Leveraging the latest compute technology from AMD and latest memory technology from SMART Modular, the Altus XE4318GT-CXL is a fully validated and supported CXL server from Penguin Solutions with CXL memory expansion up to 22TB per server.

    Rack Unit Size:
    4U
    Processor:
    Dual AMD EPYC™ 9005 Series
    PCIe Slots:
    8x PCIe Gen5 x16 FHFL, 2x PCIe Gen5 x16 LP
    Memory Capacity:

    11TB DDR5 - 6400 MT/s (88DIMMs)

    Download Datasheet
  • 5U
    Processor
    PCIe Slots
    Memory Capacity
    No items found.
    5U
    Processor
    PCIe Slots
    Memory Capacity
    No items found.
    5U
    Processor
    PCIe Slots
    Memory Capacity
    No items found.
    Open Compute Project Infrastructure

    21-inch CXL-Enabled OCP Server

    Leading-edge organizations can choose Open Compute Project (OCP) infrastructure to scale out their infrastructure cost-effectively. There is a strong argument for using OCP-based hardware in a data center: OCP is less expensive to buy and to maintain, reduces points of failure, is designed for more efficient power management, and significantly reduces security issues.

    1OU
    Processor
    PCIe Slots
    Memory Capacity
    No items found.
    2OU
    Processor
    PCIe Slots
    Memory Capacity
    No items found.
    Data center room aisle
    Request a Callback

    Talk to the CXL Experts at Penguin Solutions

    Reach out today and learn more how we help you reach your AI & HPC infrastructure project goals. Our team designs, builds, deploys, and manages high-performance, high-availability enterprise solutions, empowering customers to achieve their breakthrough innovations.

    Let's Talk