Huawei has been making impressive advances in the artificial intelligence sector, recently unveiling its CloudMatrix rack-scale cluster, marking a significant breakthrough.
Huawei’s latest AI Cluster, dubbed the Atlas 900 A3 Superpod, boasts twice the FP16 performance of the GB200 but comes with a hefty price tag. As Chinese AI firms strive to match and surpass NVIDIA’s offerings, Huawei stands out, not just in terms of performance but also availability. Known for its “Ascend” AI accelerators, Huawei has expanded into rack-scale solutions, making substantial waves in the industry. Their progress has even captured the attention of NVIDIA’s leadership.
At the WAIC conference in Shanghai, Huawei publicly showcased its CloudMatrix cluster for the first time. The CloudMatrix 384 (CM384) AI cluster features 384 Ascend 910C chips connected in an “all-to-all topology” configuration. By strategically deploying five times more Ascend chips than NVIDIA’s GB200, Huawei has covered architectural gaps. This cluster reportedly delivers 300 PetaFLOPS of BF16 computing, nearly double that of the GB200 NVL72. However, the CloudMatrix 384’s power consumption is 3.9 times that of the GB200 NVL72, resulting in less favorable performance per watt metrics across AI workloads.
The cost of a single CloudMatrix 384 AI cluster is around $8 million, nearly triple the price of NVIDIA’s GB200 NVL72. Huawei’s goal is not cost-effective performance but rather developing a powerful in-house solution that competes with Western alternatives. This achievement has even been acknowledged by NVIDIA’s CEO, highlighting Huawei’s competitive edge against other leading systems.






