Microsoft Azure Gets An Ultra Upgrade With NVIDIA's GB300 "Blackwell Ultra" GPUs, 4600 GPUs Connected Together To Run Over Trillion Parameter AI Models 1

Microsoft Supercharges Azure with NVIDIA Blackwell Ultra GB300s: 4,600-GPU Clusters to Power Trillion-Parameter AI

Microsoft has switched on its first at-scale production cluster built on NVIDIA’s GB300 Blackwell Ultra GPUs, bringing unprecedented AI training and inference power to Azure. Designed to handle models with hundreds of trillions of parameters, this new Azure deployment is engineered to cut training times from months to weeks and accelerate the rollout of advanced AI systems.

At the heart of the rollout is a large production cluster featuring more than 4,600 NVIDIA GB300 GPUs configured in NVL72 systems and connected by next-generation InfiniBand networking. This foundation is built to scale to tens of thousands—and ultimately hundreds of thousands—of Blackwell Ultra GPUs across global datacenters, all focused on AI workloads that demand extreme performance and reliability.

The new Azure ND GB300 v6 virtual machines are tuned for reasoning-heavy models, agentic AI, and multimodal generative AI with longer context windows. Each rack integrates a tightly coupled set of VMs and GPUs to maximize throughput and minimize latency, enabling larger models and faster iteration.

Key capabilities at a glance:
– 72 NVIDIA Blackwell Ultra GPUs per rack, paired with 36 NVIDIA Grace CPUs
– Up to 800 Gbps per GPU of cross-rack scale-out bandwidth via NVIDIA Quantum-X800 InfiniBand
– 130 TB/s of NVLink bandwidth within a rack
– 37 TB of fast memory per rack
– Up to 1,440 PFLOPS of FP4 Tensor Core performance

Within each rack, NVIDIA NVLink and NVSwitch minimize memory and bandwidth bottlenecks, delivering 130 TB/s of intra-rack bandwidth tied to 37 TB of fast memory. The result is higher inference throughput, lower latency, and better responsiveness for large-scale models and long-context workloads.

Beyond a single rack, Azure employs a full fat-tree, non-blocking topology using NVIDIA Quantum-X800 InfiniBand, one of the fastest networking fabrics available. This architecture allows efficient scaling to tens of thousands of GPUs with minimal communication and synchronization overhead, improving end-to-end training throughput and enabling better GPU utilization. Azure’s co-engineered software stack—featuring custom protocols, collective libraries, and in-network computing—helps ensure the fabric is fully utilized by real-world applications. Technologies like NVIDIA SHARP accelerate collective operations by performing calculations in the switch, effectively boosting bandwidth and improving the efficiency and reliability of large-scale training and inference.

The platform is matched with advanced datacenter engineering. Azure uses standalone heat exchangers and facility-level cooling to maintain thermal stability for dense GB300 NVL72 clusters while reducing water use. New power distribution models support high energy density and dynamic load balancing, aligning with the demanding power profiles of ND GB300 v6 GPU clusters.

NVIDIA’s strong inference performance—validated in industry benchmarks—pairs with Azure’s scale to enable faster deployment of state-of-the-art AI models, from multimodal generators to complex reasoning and agentic systems. According to NVIDIA, this collaboration underscores a leadership moment for the United States in the global AI race.

The latest ND GB300 v6 VMs are now available on Azure, opening the door for enterprises, research labs, and AI startups to train and deploy the next generation of ultra-large models with greater speed, scale, and cost efficiency.