NVIDIA’s Blackwell AI chips have made a remarkable debut at the MLPerf v4.1 benchmarks, delivering record-breaking performance across various AI tasks. Slated for data center release later this year, these Blackwell AI chips exhibit a generational performance increase of up to 4x.
In the latest MLPerf Inference v4.1 evaluation, NVIDIA topped all AI benchmark categories, which include:
– Llama 2 70B (Dense LLM)
– Mixtral 8x7B MoE (Sparse Mixture of Experts LLM)
– Stable Diffusion (Text-to-Image)
– DLRMv2 (Recommendation)
– BERT (NLP)
– RetinaNet (Object Detection)
– GPT-J 6B (Dense LLM)
– 3D U-Net (Medical Image Segmentation)
– ResNet-50 v1.5 (Image Classification)
NVIDIA’s Blackwell AI solutions, particularly in the Llama 2 70B benchmark, demonstrate significant performance enhancements over previous-gen Hopper H100 chips. For server workloads, a single Blackwell GPU achieves a 4x boost with 10,756 Tokens per second, and for offline scenarios, it achieves a 3.7x boost with 11,264 Tokens per second. Additionally, Blackwell GPUs showcased the first public performance measurements using FP4.
While Blackwell is exceedingly powerful, NVIDIA’s Hopper GPUs have also seen impressive performance improvements thanks to continuous optimizations in the CUDA stack. Both the H200 and H100 chips outperform all competition in the new benchmarks like the 56-billion parameter “Mixtral 8x7B” LLM.
The NVIDIA HGX H200, equipped with eight Hopper H200 GPUs and NVSwitch, delivers significant performance gains in Llama 2 70B benchmarks, achieving a token generation speed of 34,864 (Offline) and 32,790 (Server) with a 1000W configuration, and 31,303 (Offline) and 30,128 (Server) tokens per second with a 700W setup. This represents a 50% performance boost over the H100 solution, which still outperforms AMD’s Instinct MI300X in AI workloads.
In Mixtral 8x7B multi-GPU server tests, the H100 and H200 deliver token outputs of 59,022 and 52,416 per second, respectively. AMD did not submit performance data for its Instinct MI300X in this workload. Similarly, NVIDIA’s Hopper AI chips have seen up to a 27% performance increase in the Stable Diffusion XL benchmark due to new full-stack improvements, with no data submission from AMD under this workload either.
NVIDIA’s continuous software enhancements have played a crucial role in these advancements, underlining the importance of a robust software ecosystem to complement powerful hardware. This combination is vital for enterprises investing heavily in AI infrastructure.
The general availability of the HGX H200 is now being announced through various partners, highlighting NVIDIA’s commitment to providing comprehensive AI solutions to enterprises globally.
Notably, even NVIDIA’s Edge solutions like the Jetson AG Orin have benefited from these optimizations, achieving a 6x performance boost since MLPerf v4.0, significantly impacting GenAI workloads at the Edge.
Looking ahead, Blackwell’s already impressive pre-launch performance suggests continued strength and further optimization benefits, much like the progress seen with Hopper, with enhancements expected to carry over to Blackwell Ultra in the coming years.






