Google Doubles Down on Agentic AI With an “AI Hypercomputer” Blending 8th‑Gen TPUs, NVIDIA Rubin GPUs, and Axion CPUs

Google is betting that the next big leap in artificial intelligence won’t be powered by traditional “supercomputers,” but by something more flexible and far more scalable: the AI Hypercomputer. Revealed at Google Cloud Next 26, this new approach is built to meet the demands of the Agentic AI era, where AI systems don’t just respond to prompts, but plan, coordinate, and take action across tools, workflows, and massive datasets.

At its core, Google’s AI Hypercomputer is a high-performance data center architecture that brings together compute, storage, networking, open software, and machine learning frameworks into one performance-optimized platform. The big idea is choice and scale: customers can tap into Google’s latest in-house TPU technology, its Arm-based Axion cloud CPUs, and powerful NVIDIA Rubin GPUs—depending on what their workloads need.

A major part of the announcement is Google’s 8th-generation TPU lineup, launching in two versions designed for different AI jobs: TPU 8t for training and TPU 8i for inference.

TPU 8t is Google’s new training-focused chip, built to shrink the time it takes to bring frontier-scale models into deployment—from months down to weeks. It’s engineered for maximum throughput, high shared memory, and huge interconnect bandwidth, while staying power efficient. In a single pod, TPU 8t delivers 121 exaflops of FP4 compute, which Google says is 2.84x higher than the prior Ironwood generation.

What makes TPU 8t stand out is how aggressively it scales. A single TPU 8t superpod can expand to 9,600 chips backed by two petabytes of shared high-bandwidth memory. Google also doubled inter-chip bandwidth compared to the last generation, so giant models can operate as if they’re working inside one massive memory pool rather than being chopped into smaller, slower pieces.

Google is also targeting the real-world bottlenecks that slow training down. Storage access is now 10x faster, and TPUDirect helps move data directly into the TPU more efficiently, supporting better end-to-end utilization. On the networking side, Google’s new Virgo Network works with JAX and Pathways software to drive near-linear scaling to as many as a million chips in a single logical cluster.

Another key improvement is native 4-bit floating point (FP4). TPU 8t introduces FP4 to reduce memory bandwidth pressure and energy-hungry data movement, while still preserving accuracy for large models even under lower-precision quantization. The payoff is better throughput, more efficient scaling, and model layers that can fit into local buffers for higher peak utilization.

For inference, Google introduced TPU 8i—built for serving, sampling, and reasoning workloads where low latency and high throughput are essential. TPU 8i combines 288 GB of HBM with 384 MB of on-chip SRAM, a 3x increase in on-chip memory capacity versus the previous generation. With that much SRAM, more model activity—and especially large KV caches—can stay on the chip, reducing delays caused by constant memory shuffling.

In FP8 compute, TPU 8i reaches 331.8 exaflops per pod, which Google says is 6.74x higher than Ironwood. Google also redesigned the system around efficiency, doubling physical CPU hosts per server and shifting to its custom Axion Arm-based CPUs. Using NUMA for isolation helps optimize performance as the platform scales.

TPU 8i is also tuned for Mixture of Experts (MoE) models, where communication overhead can become the limiting factor. Google doubled interconnect bandwidth to 19.2 Tb/s and introduced a new “Boardfly” architecture that reduces maximum network diameter by more than half—keeping latency down and helping the system behave like one cohesive unit at scale. To further cut lag, TPU 8i includes an on-chip Collectives Acceleration Engine (CAE) that offloads global operations and reduces on-chip latency by up to 5x.

On value and efficiency, Google claims TPU 8t delivers about 2.7x better performance per dollar for large-scale training versus the prior generation, while TPU 8i improves performance per dollar for inference by 80% in low-latency MoE targets. Both chips also deliver roughly 2x better performance per watt—an increasingly critical factor as AI infrastructure costs and power constraints rise.

Cooling is also part of the design. Both TPU 8t and TPU 8i support Google’s 4th-generation liquid cooling technology, aimed at sustaining the higher compute densities that air cooling can’t practically handle.

While Google’s TPUs are a centerpiece, the AI Hypercomputer is not limited to in-house silicon. NVIDIA GPUs remain a core part of the accelerator portfolio. Google says it will be among the first to offer NVIDIA Vera Rubin NVL72, complementing existing instances built around earlier NVIDIA architectures. Google also plans to be one of the first AI infrastructure providers to offer NVIDIA VR200 (Vera Rubin) accelerators, paired with the new Virgo network to enable massive-scale training clusters alongside the 8th-gen TPU family.

Beyond accelerators, Google is positioning Axion as the CPU backbone for modern AI systems—especially agentic workloads that need sustained, efficient operation. The company highlighted its N4A Axion instances, which it says deliver significantly better price-performance than comparable x86 instances.

Network-heavy workloads also get new options via network-optimized compute. Google is expanding with new C4N and M4N machine series targeted at high-volume agent communication, network-intensive telecom 5G core deployments, and enterprise databases. Google says C4N can provide close to a 4x increase in network bandwidth per vCPU compared to standard C4 instances.

Storage innovations are another major pillar. Managed Lustre has been improved to deliver up to 10 TB/s of throughput to A5X or TPU 8t over RDMA, helping remove data pipeline friction during large training runs. Rapid Storage performance has also been boosted—from 6 TB/s to 15 TB/s—to accelerate both training and inference. Google also introduced Smart Storage, which applies semantic meaning to unstructured data and is positioned as a foundation for an Enterprise Knowledge Graph—useful for agentic systems that must retrieve, connect, and reason over large bodies of enterprise information.

Tying the system together is the Virgo Network, a purpose-built, AI-optimized network designed to connect either NVIDIA Vera Rubin NVL72 systems or TPU 8t superpods into massive training complexes with hundreds of thousands of accelerators. The goal is straightforward: faster, more efficient distributed training for the world’s largest and most capable models.

Google says the AI Hypercomputer is already set to be used by a range of major organizations, including the US Department of Energy, Boston Dynamics, Citadel Securities, Thinking Machine Labs, and Axia Energy—an early signal that the platform is aimed at both cutting-edge research and high-stakes commercial AI deployment.

With the AI Hypercomputer, Google is effectively presenting a full-stack answer to the biggest problems in modern AI infrastructure: scaling beyond a single cluster, feeding data fast enough to keep accelerators busy, reducing latency for real-time inference, and controlling power and cost as models continue to grow. For teams building agentic AI systems that must reason, act, and communicate at scale, the combination of TPU 8t, TPU 8i, Axion CPUs, NVIDIA Rubin GPUs, Virgo networking, and upgraded storage is meant to deliver a more unified path from experimentation to production.

Google Doubles Down on Agentic AI With an “AI Hypercomputer” Blending 8th‑Gen TPUs, NVIDIA Rubin GPUs, and Axion CPUs

Share this:

Related Posts: