Intel has released its latest MLPerf Inference v6.0 results, and the numbers put a spotlight on two things the company is pushing hard right now: new Arc Pro GPUs built for AI workloads, and steady software tuning that improves performance on existing hardware.
The new entries are Intel Arc Pro B70 and Arc Pro B65, recently introduced workstation-class GPUs based on the company’s “Big Battlemage” design. In MLPerf Inference v6.0 testing, Intel validated these GPUs in a four-GPU server configuration paired with the latest Intel Xeon 6 CPUs. With up to 128GB of total VRAM across the four cards, Intel says the setup is capable of running very large models—up to 120B parameters—while targeting the kind of multi-GPU scaling AI teams rely on for modern inference.
According to Intel, the four-GPU Arc Pro B70/B65 system delivers about an 80% inference performance uplift compared to the previous flagship Arc Pro B60 (which offers 24GB per GPU). In the reported figures, the four-card Arc Pro B70 configuration reached 1536.90 tokens/s in the Offline scenario and 951.67 tokens/s in the Server scenario. For comparison, a four-card Arc Pro B60 configuration (96GB total) posted 841.04 tokens/s Offline and 452.19 tokens/s Server.
Beyond the generational gain from new silicon, Intel is also emphasizing “continuous AI optimizations” on the software side. The company reports an 18% performance boost for existing Arc Pro GPUs like the Arc Pro B60 thanks to ongoing tuning—an important point for buyers who care about performance improvements arriving through drivers, libraries, and the overall inference stack rather than requiring a full hardware refresh.
Intel’s announcement also highlights how it’s positioning these Arc Pro GPU systems as an all-in-one inference platform, combining validated hardware and software in a containerized Linux-focused solution. The target is simpler deployment and stronger scaling behavior, with features meant to matter in enterprise environments: multi-GPU support, PCIe peer-to-peer transfers, ECC, SR-IOV, telemetry, and remote firmware updates. Intel also claims Arc Pro B70 can support significantly larger models and longer context windows in comparable multi-GPU setups, citing up to 1.6x more KV cache capacity for larger-model runs.
On the CPU side, Intel submitted results for Xeon 6 processors as well, arguing that AI inference performance is increasingly shaped by the entire system—not just GPU throughput. CPUs still handle key responsibilities like memory management, orchestration, workload scheduling, and the security and reliability needs of production infrastructure. Intel points to built-in acceleration such as AMX and AVX-512 as part of why Xeon 6 can drive strong inference results without always depending exclusively on dedicated accelerators, and says the latest Xeon 6 lineup with P-cores offers a 90% generation performance gain.
Overall, Intel’s MLPerf Inference v6.0 showing is designed to send a clear message: Arc Pro B70/B65 is aimed at scalable, enterprise-friendly AI inference—while ongoing optimization work is also helping current Arc Pro owners squeeze more performance out of the hardware they already have.






