Graid Technology Unveils AI Storage Portfolio

on

SupremeRAID Technology Enhances GPU Performance

On April 21, 2026, Graid Technology announced the release of its Agentic AI Storage Portfolio from Sunnyvale, California. This suite of KV cache solutions aims to remove storage bottlenecks in AI systems, crucial for continuous AI operations.

Innovative Deployment Tiers for AI Systems

Designed with SupremeRAID technology, the portfolio includes three deployment tiers: KV Cache Server, KV Cache Rack, and KV Cache Platform. These tiers enhance NVMe storage capabilities, supporting the increasing demands of multi-step AI tasks.

SupremeRAID aggregates up to 32 NVMe drives into a single 280 GB/s virtual pool. This setup delivers KV cache reads at speeds 77 times faster than standard NVMe, with latency reduced to 1.3 milliseconds.

The KV Cache Server provides single-node NVMe acceleration, making it suitable for individual inference servers and edge AI deployments. This tier is currently available for integration.

For enterprise multi-GPU clusters, the KV Cache Rack offers a rack-scale solution. Developed in collaboration with leading server OEM partners, this tier is also available now.

The highest tier, KV Cache Platform, is purpose-built for NVIDIA’s STX reference architecture. It plans to include native BlueField-4 DPU execution and rack-scale storage expansion in the second half of 2026.

“A year ago, at GTC 2025, Jensen Huang predicted that storage would become GPU-accelerated for the first time. This year, NVIDIA turned that concept into an architecture with STX and CMX,” said Leander Yu, CEO of Graid Technology. “Our KV Cache Portfolio is built for precisely this moment, delivering the storage performance that agentic AI demands, at storage-tier economics.”

As agentic AI transitions from experimentation to production, infrastructure assumptions are changing. Models now require continuous context maintenance over hours, resulting in increased KV cache demands that can overwhelm GPU HBM. This can cause latency spikes up to 18 times and reduce GPU utilization to 50%, leading to failures in model reasoning.

SupremeRAID technology directly addresses these issues by bypassing the CPU via GPU Direct Storage. This strategy ensures that AI models maintain their performance without latency spikes or reasoning errors. As a result, Graid Technology’s solutions help prevent costly recovery efforts for enterprises.

Enterprises evaluating agentic AI deployments can find detailed deployment architecture, technical specifications, and NVIDIA STX compatibility details in the solution brief on Graid Technology’s website.

Daniel Rolph
Daniel Rolphhttp://melbourne-insider.au/
Daniel Rolph is the editor of Melbourne Insider, covering hospitality, venue openings and events across Melbourne. With over 15 years’ experience in marketing and media, he brings a commercial, newsroom-focused approach to accurate and timely local reporting.
Daniel Rolph
Daniel Rolph is the editor of Melbourne Insider, covering hospitality, venue openings and events across Melbourne. With over 15 years’ experience in marketing and media, he brings a commercial, newsroom-focused approach to accurate and timely local reporting.