AMD has officially unveiled ROCm 7, the latest iteration of its open software stack technologies, designed to supercharge AI capabilities and boost developer productivity. This new version marks a significant leap from ROCm 6, embodying AMD’s commitment to advancing AI computing.
ROCm 7 introduces a host of impressive features aimed at enhancing AI inference capabilities. The software stack will incorporate advanced frameworks like vLLM v1, llm-d, and SGLang, emphasizing optimizations such as Distributed Inference, Prefill, and Disaggregation. Noteworthy additions include new kernels and algorithms like GEMM Autotuning, MoE, Attention mechanisms, and Python-Based Kernel Authoring.
A key highlight is the support for the MI350 series, with full compatibility for advanced datatypes such as FP8, FP6, FP4, and Mixed Precision. These updates demonstrate AMD’s focus on making significant strides in inference performance, boasting performance enhancements of up to 3.5 times over previous versions. Specific boosts include a 3.2x increase in Llama 3.1 70B, a 3.4x rise in Qwen2-72B, and up to 3.8x improvement in Deep Seek R1 compared to ROCm 6.
In a direct comparison, ROCm 7 running on an Instinct MI355X GPU exhibited a 30% faster throughput in Deep Seek R1 (FP8 Throughput) than the NVIDIA Blackwell B200 platform utilizing CUDA.
While ROCm 7 delivers substantial improvements in inference, it doesn’t fall short on training performance either, offering a 3x uplift across models like Llama 2 70B, Llama 3.1 8B, and Quen 1.5 7B.
This new software stack is also set to revolutionize Enterprise AI with end-to-end solutions, secure data integration, and straightforward deployment processes. It is engineered to operate seamlessly with GPUs, CPUs, and DPUs, while supporting a variety of workloads with a focused approach on GenAI tasks. This makes ROCm 7 an essential tool for enterprises aiming to stay at the forefront of AI technology.






