NVIDIA and AMD have both unveiled their latest MLPerf Inference performance benchmarks, showcasing the incredible prowess of their new GPUs, including NVIDIA’s Blackwell B200 and AMD’s Instinct MI325X.
The MLPerf Inference v5.0 benchmarks put a spotlight on the fierce competition between these tech giants, as they strive to demonstrate not only raw computing power but also the seamless integration of hardware with cutting-edge software optimizations and support for emerging AI technologies.
NVIDIA has certainly raised the bar high with their Blackwell GPU series, setting new records in the process. The GB200 NVL72 system, which ingeniously connects 72 NVIDIA Blackwell GPUs into a singular, powerful unit, achieved a staggering 30-fold increase in throughput on the Llama 3.1 405B benchmark compared to their NVIDIA H200 NVL8 submission. This impressive performance leap was facilitated by more than tripling each GPU’s capability and expanding the NVIDIA NVLink interconnect domain by nine times.
While numerous firms test their hardware using MLPerf benchmarks, NVIDIA and its collaborators were unique in submitting and publicizing results for the demanding Llama 3.1 405B benchmark. This achievement illustrates not just hardware strength, but how effectively such power can be harnessed to minimize latency in practical inference deployments. The benchmarks focus on crucial metrics like Time to First Token (TTFT) and Time Per Output Token (TPOT), which are vital for a smooth and quick user interaction with large language models.
On another demanding benchmark, the Llama 2 70B Interactive test, NVIDIA once again excelled. Their DGX B200 system equipped with eight Blackwell GPUs achieved triple the performance over a comparable setup with eight H200 GPUs, setting new standards for this test’s more demanding parameters. This boost in performance hints at the transformative potential of blending NVIDIA’s Blackwell architecture with its finely-tuned software stack.
Meanwhile, AMD isn’t standing still either. With the introduction of their Instinct MI325X 256 GB accelerator, configured in an x8 setup, AMD’s results are head-to-head with NVIDIA’s H200 systems, especially benefiting from the larger memory capacity to handle substantial language models. Nevertheless, they have a challenge ahead as they continue to trail NVIDIA’s Blackwell B200. The pressure is on as AMD gears up for the upcoming B300 release, requiring them to keep the momentum in developing both hardware and software.
The Hopper H200 series also shines this time around, demonstrating a 50% improvement in inference performance compared to the previous year. This is an important gain for companies that heavily rely on robust and efficient platforms for their computing needs.
The constant advancements from NVIDIA and AMD indicate a thrilling race in the technology realm, with each brand vying to push the boundaries of AI and machine learning accelerators. These innovations not only promise to enhance user experiences but also pave the way for a future rich with possibilities.






