d-Matrix and AIchip have validated a new 3D-stacked DRAM technology designed to supercharge AI inference, and it’s now ready for commercial launch. The breakthrough memory, based on a 3D stacked digital in-memory compute architecture (3DIMC), will debut in d-Matrix’s upcoming Raptor inference accelerators and is claimed to deliver up to 10x faster inference performance compared to today’s fastest HBM4-based solutions.
With demand for generative and agentic AI exploding, data centers are running headlong into the memory wall—where moving data back and forth between compute and memory eats up time, power, and cost. d-Matrix, which gained traction with its Corsair C8 accelerator during recent AI hardware shortages, is positioning Raptor as its next major step: an accelerator built around mass-produced 3D-stacked DRAM that brings compute closer to where data lives.
The core of the advance is 3DIMC. Instead of relying solely on external high-bandwidth memory, the new design integrates digital in-memory compute directly within a vertically stacked DRAM structure. By minimizing data movement and exploiting massive on-stack bandwidth, 3DIMC targets the chief bottlenecks of modern inference: latency, throughput, and energy efficiency. The companies say this approach not only boosts performance, but also improves total cost of ownership by reducing the need for oversized, power-hungry memory subsystems.
Developed in collaboration with Taiwan-based AI infrastructure ASIC integrator AIchip, the 3D DRAM has moved from concept to a commercially viable component. It is currently running on d-Matrix’s Pavehawk test chips, clearing the path for full-scale production. The first product to ship with the technology will be the Raptor inference accelerator, set to replace the Corsair series in d-Matrix’s lineup.
According to d-Matrix, Raptor powered by 3DIMC could achieve up to a 10x speedup in inference workloads when stacked against leading HBM4 accelerators. While independent benchmarks will be key to validating those numbers, the direction is clear: collapsing compute and memory into a tightly coupled 3D stack is emerging as one of the most promising ways to sustain the pace of AI at scale.
Company leadership frames the milestone as both a performance and sustainability win. By cutting data movement—the single biggest source of energy waste in many AI pipelines—3DIMC aims to deliver more tokens per second per watt, larger model support without ballooning memory costs, and higher throughput per rack, all vital for production-scale generative AI.
What to watch next:
– Real-world benchmarks comparing Raptor to top-tier HBM4 accelerators across LLM and vision inference
– Software stack and framework support, including compiler optimizations that exploit in-memory compute
– Form factors, capacity options, and power envelopes suited to mainstream data center deployments
– Availability timelines and volume ramp, given the push toward mass production
If the claims hold, Raptor could give cloud providers and enterprises a compelling alternative path to scale inference without relying exclusively on traditional HBM-centric designs. For teams wrestling with latency, throughput, and TCO in production AI services, 3D-stacked DRAM with in-memory compute may be the kind of architectural shift that keeps up with the breakneck growth of modern models.






