NVIDIA RTX GPUs: Unmatched Speed for OpenAI’s Cutting-Edge “gpt-oss” AI Models

NVIDIA and OpenAI have teamed up to roll out the latest gpt-oss family of open AI models, designed to provide exceptional performance on RTX GPUs for consumers. This collaboration aims to bring cutting-edge AI capabilities to more users by enabling these models to operate effectively on RTX-powered PCs and workstations.

Jensen Huang, NVIDIA’s founder and CEO, highlighted the significance of this initiative, stating that the development extends the achievements of OpenAI by fostering innovation in open-source software. This advancement is expected to bolster the leadership of U.S. technology in AI on a global scale, utilizing the most extensive AI computing infrastructure available.

The introduction of these models marks a new era of rapid and intelligent on-device AI, powered by the robust capabilities of GeForce RTX and PRO GPUs. Two distinct variants cater to different needs within the ecosystem:

1. The gpt-oss-20b model is tailored for optimal performance on NVIDIA RTX AI PCs with at least 16GB of VRAM, achieving up to 250 tokens per second on an RTX 5090 GPU.

2. The more expansive gpt-oss-120b model is designed for professional workstations, enhanced with NVIDIA RTX PRO GPUs.

These models, trained on NVIDIA H100 GPUs, are pioneers in supporting MXFP4 precision, a method that enhances model quality and accuracy without additional performance costs. They also feature impressive 131,072 context lengths, among the most extended in local inference, utilizing a flexible mixture-of-experts (MoE) architecture. This design accommodates chain-of-thought processes and supports instruction-following and tool use.

This week’s focus on RTX AI Garage sheds light on how AI enthusiasts and developers can begin exploring the new OpenAI models with NVIDIA RTX GPUs through several resources:

– **Ollama App**: This app provides an easy way to experiment with the gpt-oss models, offering an interface that is fully optimized for RTX GPUs.

– **Llama.cpp**: NVIDIA’s collaboration with the open-source community has led to optimizations such as CUDA Graphs to minimize overhead. Developers can explore these improvements through the Llama.cpp GitHub repository.

– **Microsoft AI Foundry**: Windows developers can access the models via Microsoft AI Foundry Local, currently in public preview, by executing simple commands in a terminal.

With these resources, NVIDIA and OpenAI aim to make advanced AI more accessible and effective for developers around the world.