Zyphra has teamed up with AMD to roll out a new open-source AI cloud platform designed to compete with popular inference services for cutting-edge open-weight models. Built in the US and powered by AMD Instinct MI355X GPUs, Zyphra Cloud is positioning itself as a high-performance option for teams that need fast, scalable AI inference for production workloads.
At its core, Zyphra Cloud is an inference-optimized service aimed at running frontier open-weight models such as DeepSeek V3.2, Kimi K2.6, and GLM 5.1. The platform focuses on the real-world demands of modern AI applications, especially agentic workflows, deep research tasks, and long-horizon projects that benefit from long-context performance. To get there, Zyphra says it combines custom kernels, new long-context inference methods, and advanced parallelism techniques to push higher throughput while keeping latency low—two of the biggest factors that determine whether an AI system feels responsive at scale.
The compute backbone behind Zyphra Cloud comes from TensorWave, which hosts large deployments of AMD Instinct accelerators. Zyphra’s service will tap into 15MW of compute capacity from TensorWave’s MI355X installation, and it’s built with an eye toward future expansion to newer generations such as MI450 and beyond. That forward-looking approach matters for customers planning to scale over time, since AI infrastructure decisions are often multi-year bets tied to model growth and rising context lengths.
While inference is the main focus today, Zyphra is also signaling that it wants to grow beyond being “just” an inference endpoint. The company plans to expand Zyphra Cloud into a more integrated AI platform, adding capabilities like reinforcement learning and fine-tuning. Those upcoming features are expected to run on AMD EPYC CPUs alongside dedicated GPU clusters, aiming to support a wider range of AI development workflows—from deployment to iteration and improvement.
TensorWave describes the partnership as a way to give AI-native companies dedicated access to high-performance AMD compute without compromises, enabling production-ready AI deployments at scale on the latest accelerators. The collaboration also ties into TensorWave’s broader buildout plans announced in 2024, which targeted a massive AMD GPU cluster by 2025 across multiple Instinct generations, including MI300X, MI325X, and MI350X. With that capacity coming online, platforms like Zyphra Cloud are positioned to take advantage of the infrastructure for large-scale agentic AI use cases.
Alongside the cloud service, Zyphra has already introduced several of its own models: ZAYA1-8B (Reasoning), ZAYA1-74B (a mixture-of-experts model with up to 74B parameters), and ZAYA1-VL, the company’s first vision-language model. Combined with the new cloud platform, Zyphra is clearly aiming to serve both builders who want ready-to-run AI infrastructure and teams exploring open models for reasoning and multimodal tasks.
Zyphra Cloud is available now, with the company offering access to its inference services for organizations looking for high-throughput, low-latency performance on open-weight frontier models powered by AMD Instinct MI355X GPUs.






