NVIDIA Rubin Enters Google Cloud VMs, Pushing Multi‑Site GPU Clusters Toward the 1‑Million Mark

Google and NVIDIA are expanding their AI partnership in a big way, giving customers access to up to nearly one million NVIDIA GPUs through Google Cloud’s newly introduced A5X instances. The goal behind the collaboration is clear: reduce AI inference costs while boosting token throughput, especially for the fast-growing wave of agentic AI workloads.

A5X is part of Google Cloud’s AI Hypercomputer portfolio, the same platform foundation used to power Gemini and many of Google’s consumer and enterprise AI products. Alongside the A5X debut, Google also outlined broader AI Hypercomputer upgrades, including new virtual machines built on custom Arm-based CPUs, eighth-generation Tensor Processing Units (TPUs), and native PyTorch support for TPUs. Taken together, these updates are aimed at making large-scale AI training and inference more efficient, more flexible, and easier to deploy across modern cloud infrastructure.

What makes the A5X instances stand out is their focus on agentic artificial intelligence. Agentic AI systems typically use multiple specialized agents that work step-by-step to solve complex problems, rather than relying on a single model response. That approach can demand significantly more inference capacity and faster interconnects—exactly what Google and NVIDIA are trying to address with this launch.

A key detail is that A5X is Google’s first instance type designed to work with NVIDIA’s latest Vera Rubin AI GPUs. On the networking side, A5X will use NVIDIA ConnectX-9 network interface cards (NICs), built to accelerate AI workloads across Ethernet-based cloud environments. Combined with Google’s Virgo platform, this setup is intended to let customers scale far beyond traditional single-cluster limits.

According to the announced capabilities, Virgo and the A5X architecture will allow access to as many as 80,000 Rubin GPUs within a single cluster—and up to 960,000 GPUs across a multi-site, multi-cluster deployment. Virgo is also designed to connect massive numbers of AI chips within Google’s data centers and isn’t limited to GPUs; it supports Google TPUs as well. Google says Virgo can connect up to 134,000 TPUs in one data center and more than a million AI chips across multiple locations.

On performance and efficiency, NVIDIA states that A5X can deliver up to 10 times lower inference cost per token and up to 10 times higher throughput per megawatt compared to the previous generation. If those gains hold in real-world deployments, they could significantly improve the economics of running large-scale inference—especially for enterprises deploying agentic workflows that generate high token volumes.

The announcement also highlights how this infrastructure supports broader AI use cases, including physical and industrial AI. Software and engineering platforms from companies like Cadence and Siemens are cited as being powered by NVIDIA infrastructure and available through Google Cloud. Google’s Gemini platform is also positioned to deploy agentic models and workflows across industries, including areas such as cybersecurity.

By combining next-generation NVIDIA Rubin GPUs, high-speed ConnectX-9 networking, and Google’s scalable Virgo interconnect, A5X is designed to give organizations a path to run advanced agentic AI systems with better performance, lower costs, and the ability to scale from single clusters to multi-site deployments.

NVIDIA Rubin Enters Google Cloud VMs, Pushing Multi‑Site GPU Clusters Toward the 1‑Million Mark

Share this:

Related Posts: