Elon's 'Colossus' Super Computer Built With 100K H100 NVIDIA GPUs Goes Online, H200 Upgrade Coming Soon 1

Elon’s Megastructure: 100K H100 NVIDIA GPU Supercomputer ‘Colossus’ Activated, H200 Upgrade on Horizon

Elon Musk’s xAI Unleashes Colossal Supercomputer with Unmatched Power

Elon Musk’s ambitious xAI project has unveiled its groundbreaking “Colossus” supercomputer, which has now gone online, boasting an impressive arsenal of 100,000 NVIDIA H100 GPUs. The supercomputer, which has been described as the most powerful AI training system globally, was astonishingly completed in just 122 days.

The Colossus supercomputer hit a major milestone when it went live on Labor Day. It marks a significant achievement for xAI, reflecting the sheer scale and rapid pace of development. But the excitement doesn’t stop there. Musk has announced plans to further enhance the Colossus with an additional 50,000 NVIDIA H200 GPUs, doubling its capacity and computational power in the months to come.

This latest upgrade is set to take Colossus to new heights. The H200 GPUs, which incorporate NVIDIA’s cutting-edge Hopper architecture, deliver around 45% higher compute performance for generative AI and high-performance computing (HPC) applications compared to the already formidable H100 GPUs.

NVIDIA has lauded the xAI team for this extraordinary accomplishment, highlighting Colossus as a testament to accelerated computing and significant advancements in energy efficiency. The rapid development timeline, starting from June in Memphis to running training operations by July, positions Colossus as a premier AI training facility, set to replace the earlier GROK 2 system with GROK 3 by December.

This evolution follows the conclusion of a previous arrangement with Oracle, where xAI rented server capacity. The new supercluster now surpasses what Oracle could offer, and the pending upgrades will further catapult its capabilities, cementing Colossus’s status at the pinnacle of AI training systems.

A closer look at the hardware reveals that H200 GPUs bring a substantial upgrade with 61GB more memory and considerably higher memory bandwidth at 4.8TB/s compared to the H100’s 3.35TB/s. However, with these enhancements comes increased power consumption, requiring sophisticated liquid cooling solutions to manage the 300W power demand.

Currently, Colossus stands as the only supercomputer globally to harness the power of 100,000 NVIDIA GPUs, outpacing giants like Google AI with 90,000 GPUs, followed by OpenAI with 80,000 GPUs, and Meta AI and Microsoft AI utilizing 70,000 and 60,000 GPUs, respectively.

With this remarkable advancement, Elon Musk’s xAI is not only pushing the boundaries of artificial intelligence but is also setting new standards in the realm of supercomputing technology. This project is anticipated to drive substantial innovations, reshaping what’s possible in AI research and applications.