A large, green circuit board with four prominent chips is mounted vertically within a data center environment.

Google’s Two-Chip TPU Bet: Why Marvell’s Rumored Role Could Redefine AI Inference ASICs

Google is reportedly teaming up with Marvell on two new chips designed to push its Tensor Processing Unit (TPU) platform further, with a strong focus on faster, more efficient AI inference.

According to a report from The Information, discussions between Google and Marvell have begun around a two-chip approach. While it’s not clear how far along the talks are, the proposal outlines two very different pieces of silicon: one meant to enhance today’s TPU deployments, and another that would serve as a next-generation TPU built for future AI workloads.

The first chip isn’t described as a new TPU replacement. Instead, it’s said to be a memory processing unit (MPU) intended to work alongside a TPU. The idea centers on easing one of the biggest bottlenecks in modern AI systems: memory movement and bandwidth demand. By offloading certain memory-related processing to a dedicated companion chip, Google could reduce pressure on the main accelerator and make data access more efficient during inference. In practical terms, that could help AI models respond faster and run more smoothly at scale, especially when serving real-time results.

The second chip under discussion is more straightforward: a next-generation TPU focused specifically on AI inference. Google’s current flagship accelerator line is its TPU v7 family, also referred to as Ironwood. As a reference point for where Google is today, TPU v7 is reported to feature 192GB of HBM memory and up to 4,614 TFLOPs of peak performance, with large-scale deployments packaged into a “Superpod” configuration containing 9,216 chips.

All of this is happening as more companies look to custom ASIC accelerators to improve inference efficiency and reduce dependence on general-purpose alternatives. But even with growing momentum behind Google’s TPU demand, major constraints remain—manufacturing capacity is stretched across the semiconductor industry, and scaling production is difficult when every leading chipmaker is operating near its limits.

If these efforts move forward, the combination of next-generation Google TPUs paired with specialized MPUs could significantly improve the performance of the memory subsystem—an area that increasingly determines how fast and cost-effective AI inference can be. That could translate into quicker model responses, better throughput in data centers, and a stronger TPU ecosystem for the next wave of AI applications.