In a significant development within the realm of artificial intelligence and data centers, AMD has unveiled the performance benchmarks of its latest hardware, including the Instinct MI300X AI accelerators and the next-gen EPYC “Turin” CPUs, during MLPerf Inference v4.1. These benchmarks are designed to reveal the capabilities of state-of-the-art hardware from prominent tech companies like AMD, Intel, and NVIDIA.
AMD’s introduction of the Instinct MI300X accelerator at this benchmark event, alongside the upcoming EPYC Turin CPUs based on the Zen 5 core architecture, marks a pivotal moment. The performance analysis involved using a Supermicro AS-8125GS-TNMR2 system. AMD’s benchmarks covered both the Offline and Server scenarios, with comparisons made between the current 4th Gen EPYC “Genoa” CPUs and the upcoming 5th Gen EPYC “Turin” CPUs.
### Performance Evaluations
**Powerful GPU-CPU Synergy:**
– **Configuration:** 8x AMD Instinct MI300X accelerators with 2x AMD EPYC 9374F (Genoa) CPUs.
– **Category:** Available.
– **Performance:** Delivered performance within 2-3% of NVIDIA DGX H100 equipped with 4th Gen Intel Xeon CPUs for AI workloads, in both server and offline scenarios at FP8 precision.
**Preview with Next-Gen CPU:**
– **Configuration:** 8x AMD Instinct MI300X with 2x AMD EPYC “Turin” CPUs.
– **Category:** Preview.
– **Performance:** Demonstrated gains from the 5th Gen “Turin” CPUs, slightly outperforming NVIDIA DGX H100 with Intel Xeon in server scenarios and maintaining competitive performance in offline scenarios.
**Single GPU Efficiency:**
– **Configuration:** 1x AMD Instinct MI300X accelerator with 2x 4th Gen AMD EPYC 9374F CPUs (Genoa).
– **Category:** Available.
– **Highlight:** Showcased the 192 GB memory capacity of the MI300X, efficiently running the LLaMA2-70B model on a single GPU, eliminating the need for network overhead due to model splitting across multiple GPUs.
**Compelling Dell Server Results:**
– **Configuration:** 8x AMD Instinct MI300X accelerators with 2x Intel(R) Xeon(R) Platinum 8460Y+.
– **Category:** Available.
– **Highlight:** Dell’s PowerEdge XE9680 server validated platform-level performance of AMD Instinct accelerators, demonstrating robust results with the LLaMA2-70B model.
### Comparison and Future Prospects
The detailed performance results indicate that with the 4th Gen EPYC Genoa CPUs, AMD achieved 21,028 tokens per second in server and 23,514 tokens per second in offline scenarios. With the 5th Gen EPYC Turin CPUs, these numbers improved to 22,021 tokens per second in server and 24,110 tokens per second in offline scenarios, reflecting a 4.7% and 2.5% improvement, respectively.
Compared to the NVIDIA H100, AMD’s Instinct MI300X displayed slightly slower performance in server scenarios and a more noticeable lag in offline scenarios. However, the Turin configuration outpaced its competitors by 2% in server performance.
An integral advantage of the Instinct MI300X lies in its substantial memory capacity, which surpasses that of the NVIDIA H100, thereby meeting the demands of the largest language models across diverse data formats.
### Future Directions
AMD has expressed intentions to further enhance its ROCm stack with AI optimizations, aiming to boost performance in upcoming MLPerf submissions. The anticipated MI325X, set to debut next quarter, promises a 50% capacity upgrade over the MI300X, hinting at even stronger future performance metrics. Moreover, the EPYC Turin “Zen 5” CPUs are expected to hit the market later this year, adding to the anticipation surrounding AMD’s continued advancements in AI and data center technologies.






