NVIDIA eyes next-gen AI memory: Kioxia’s ultra-fast SSDs could outpace HBM
The race to supercharge AI systems is accelerating, and the next big leap may come from an unexpected place: SSDs. According to a recent report, NVIDIA is working with Kioxia on “AI SSDs” that could be up to 100 times faster than today’s conventional solid-state drives, with the potential to challenge or even replace HBM in certain workloads.
The concept is bold. Instead of leaning solely on high-bandwidth memory, NVIDIA is exploring storage-class memory that lives closer to the GPU—potentially mounted directly on the accelerator—to deliver massive throughput and capacity. The target is staggering: 200 million IOPS. Kioxia is expected to hit that figure by pairing two SSDs rated at 100 million IOPS each, connected via next-gen PCIe 7.0 to keep data moving at breakneck speeds.
Achieving that goal will require more than incremental tweaks. The architecture needs a ground-up rethink to make an SSD-like solution behave like high-bandwidth memory. That’s where High-Bandwidth Flash, or HBF, comes in. Originally developed by SanDisk and now championed by Kioxia, HBF is designed to overcome traditional NAND bottlenecks while scaling to terabytes of capacity per device. For data centers running massive inferencing workloads, that combination of speed and scale could be transformative.
HBF isn’t the only arrow in Kioxia’s quiver. The company is also investing in XL-Flash, a high-performance NAND technology aimed at slashing latency and boosting throughput—both crucial for feeding data-hungry GPUs without the usual storage stalls.
Why this matters: HBM has been the gold standard for raw bandwidth, but it comes with constraints that limit how far and how fast AI infrastructure can scale. By merging the strengths of flash—immense capacity and improving performance—with ultra-fast interconnects like PCIe 7.0, NVIDIA and Kioxia are laying the groundwork for a new tier in the memory hierarchy tailored for AI.
If these “AI SSDs” deliver as promised, data centers could tap into vast pools of near-GPU storage to accelerate inferencing, streamline model serving, and push more workloads through the same hardware. It’s a glimpse of an AI future where NAND-based technologies play a central role in overcoming the bandwidth and capacity walls that stand in the way of larger, faster, and more efficient models.





