FuriosaAI Ditches GPU Playbook For 2nm Broadcom-Built Inference Chip, Claims HBM4/E Bandwidth Beats Even The Most Efficient GPUs

FuriosaAI Bets on 2nm Broadcom Inference Chip to Outpace GPUs With HBM4/E Bandwidth

FuriosaAI and Broadcom Team Up on Next-Generation AI Accelerator With 2nm Chiplets and HBM4E Memory

FuriosaAI is preparing a major leap in AI chip performance through a new partnership with Broadcom. The companies are working together on a third-generation AI accelerator designed for future data centers, large AI inference workloads, and massive compute clusters that demand extreme memory bandwidth and energy efficiency.

The upcoming accelerator will build on FuriosaAI’s current RNGD platform, a second-generation AI chip already in mass production using TSMC’s 5nm process. RNGD is a 180W PCIe-based accelerator aimed at large language models and agentic AI workloads. With demand for AI inference continuing to rise, FuriosaAI’s next chip is being positioned as a more advanced solution for the next wave of AI infrastructure.

The third-generation FuriosaAI accelerator is expected to combine 2nm compute chiplet technology with next-generation HBM4 and HBM4E memory. This pairing is designed to deliver the high bandwidth needed for rack-scale AI systems, where huge amounts of data must move quickly between processors, memory, and networking hardware.

One of the most important parts of this design is its chiplet-based architecture. Instead of relying on a single large piece of silicon, the accelerator is expected to use multiple dies integrated into one high-performance package. Broadcom’s advanced packaging expertise will play a key role in connecting these components efficiently, helping FuriosaAI create a powerful system-on-chip for AI workloads.

Early information about the design suggests that the accelerator may feature two large 2nm compute chiplets, two I/O controller dies, and 12 HBM4 or HBM4E memory stacks. If FuriosaAI uses 12-high 36GB memory stacks, the chip could offer up to 432GB of high-bandwidth memory. That would make it suitable for demanding AI inference tasks where memory capacity and bandwidth are just as important as raw compute power.

FuriosaAI is focusing heavily on AI inference, especially real-world workloads such as post-training sampling and high-throughput token generation. As AI models become larger and more complex, the ability to move data efficiently becomes a major performance factor. The company believes that prioritizing memory bandwidth and data movement can provide better performance per watt and higher token density than traditional GPU-based approaches.

This strategy is especially important for what FuriosaAI describes as the “token factory” era, where data centers are optimized to generate AI responses at massive scale. In this environment, infrastructure must deliver fast output, low latency, and strong efficiency while serving large numbers of users and applications.

Broadcom will also contribute Ethernet and PCIe technologies to support high-bandwidth connectivity across large AI clusters. This networking capability is critical for modern AI data centers, where individual accelerators must work together across racks and clusters to process large-scale AI tasks.

Another important part of FuriosaAI’s plan is its software stack. The company says its developer tools are designed to make it easier to deploy new AI models while meeting strict performance and latency goals. Its SDK includes a general compiler that can automatically map high-level PyTorch code to the accelerator’s hardware.

For developers who need deeper control, FuriosaAI also offers a Virtual ISA with a declarative programming model. This approach is intended to provide more predictable hardware control while avoiding some of the complexity often associated with traditional GPU programming.

FuriosaAI CEO and cofounder June Paik said the collaboration brings together Broadcom’s infrastructure capabilities with Furiosa’s Tensor Contraction Processor architecture and software platform. According to Paik, the goal is to move beyond building a single chip and instead deliver a complete solution for large-scale AI inference infrastructure.

The third-generation FuriosaAI accelerator is expected to begin sampling in the first half of 2028. If the timeline holds, the chip could arrive at a time when AI data centers are looking for more specialized, power-efficient alternatives to conventional GPU systems.

With 2nm chiplets, HBM4E memory support, advanced packaging, and rack-scale networking, FuriosaAI’s next AI accelerator could become a significant option for companies building next-generation AI infrastructure. Its focus on inference, memory bandwidth, and token throughput reflects where the AI hardware market is heading as demand for faster and more efficient AI services continues to grow.