NVIDIA’s GB200 NVL72 Leaves AMD’s MI355X Behind with a 28× Performance Leap and Best-in-Class Intelligence-per-Dollar

Fresh benchmark data from an MoE (Mixture of Experts) testing environment suggests NVIDIA’s Blackwell GB200 NVL72 AI racks are delivering a major performance lead over AMD’s Instinct MI355X—especially in the exact workload style many next-generation AI models are rapidly moving toward.

Why MoE models are changing the AI hardware conversation
MoE architectures are gaining momentum because they can be far more compute-efficient than traditional dense models. Instead of activating the entire network for every request, an MoE model routes a query through smaller specialized sub-networks known as “experts.” The tradeoff is that MoE introduces a different kind of scaling pain: huge all-to-all communication demands. As the model spans more nodes, data must move constantly between experts, increasing latency and putting extreme pressure on interconnect bandwidth. For hyperscalers and enterprise AI buyers, this makes performance-per-dollar and total cost of ownership (TCO) central to deciding which platform wins.

In a recent analysis cited from Signal65, NVIDIA’s GB200 NVL72 is positioned as the strongest choice for MoE deployments today because it addresses those communication and scaling bottlenecks more effectively than competing options.

Blackwell GB200 NVL72 vs. AMD Instinct MI355X: the reported throughput gap
Using benchmark figures referenced from SemiAnalysis’s InferenceMAX, the report claims NVIDIA’s Blackwell-based AI servers can reach around 28x higher throughput per GPU, roughly 75 tokens per second, compared with AMD’s MI355X in a similar cluster configuration. In practical terms, higher token throughput at usable latency can translate into serving more users, completing more jobs, or reducing the number of GPUs needed for the same workload—depending on how an organization deploys inference.

The “co-design” strategy behind NVIDIA’s MoE advantage
The large gap in MoE-oriented results is tied to how NVIDIA has built the platform for scale-out AI. According to the report’s description, NVIDIA leans on an “extreme co-design” approach: aligning hardware configuration, memory architecture, and communication fabric around the unique demands of MoE. The GB200 NVL72 integrates a 72-chip configuration and pairs it with an unusually large pool of fast shared memory—reported as 30TB—built to keep experts fed with data while reducing communication overhead. The goal is to push expert parallelism further without hitting the same latency and bandwidth walls that often appear when MoE models grow.

Cost-per-token and AI economics: where GB200 NVL72 is said to shine
In AI infrastructure buying, raw speed is only half the story. The bigger question is often how much it costs to produce each unit of useful output. Signal65, referencing Oracle Cloud pricing, claims the GB200 NVL72 racks can deliver about 1/15th the relative cost per token, while also achieving a higher interactivity rate. If those economics hold across real deployments, it helps explain why NVIDIA’s AI stack remains widely adopted: better responsiveness and lower cost per unit of work is exactly what large-scale inference customers want.

This also ties into NVIDIA’s rapid product rhythm. With frequent platform updates, the company has been able to stay competitive across new AI phases and workloads—whether that’s inference, prefill, decode, and other emerging optimization targets.

Important context: this isn’t the final word on AMD vs. NVIDIA
These MoE-focused results don’t settle the broader debate across every AI workload or deployment scenario. AMD hasn’t yet introduced a newer generation of rack-scale solutions to directly counter at the same level, and the Instinct MI355X remains an aggressive option in dense environments—especially given its high HBM3e capacity, which can be attractive for memory-heavy workloads.

Still, if your priority today is MoE performance and efficiency at rack scale, the current picture painted by these benchmarks is clear: NVIDIA is setting the pace. And with next-wave rack-scale platforms expected from both sides (including names like Helios and Vera Rubin), the competition in large-scale AI infrastructure is likely to intensify.