NVIDIA’s open-source Nemotron 3 Super is quickly becoming one of the most talked-about large language models in enterprise AI, and new results from the EnterpriseOps-Gym benchmark help explain why. The model has climbed to the top of the open-source leaderboard, underscoring that NVIDIA is pushing hard not only in AI hardware, but also in high-performance AI software designed for real-world business workflows.
Unveiled in March, Nemotron 3 Super is a 120B-parameter model with 12B active parameters, built on a hybrid Mixture-of-Experts (MoE) design. NVIDIA positions it as a major step forward over earlier Nemotron Super releases, with up to 5x higher throughput and support for a native 1 million-token context window. That large context is especially important for agentic AI use cases where a model needs to retain long-term “memory” across complex tasks, maintain alignment, and deliver consistent reasoning over lengthy enterprise data.
Several technical upgrades are central to Nemotron 3 Super’s performance.
One is Latent MoE, which allows the model to call four times as many expert “specialists” without raising inference cost, by compressing tokens before they reach the experts. Another is multi-token prediction (MTP), which predicts multiple future tokens in a single forward pass, cutting generation time for long responses and enabling built-in speculative decoding for faster output. The model also uses a hybrid Mamba-Transformer backbone, blending Mamba layers for efficient sequence handling with Transformer layers for more precise reasoning, improving throughput while delivering around 4x better memory and compute efficiency.
On top of that, Nemotron 3 Super is pretrained with NVFP4, an approach optimized for NVIDIA’s Blackwell architecture. This reduces memory requirements and can accelerate inference significantly on Blackwell-class GPUs, with NVIDIA citing a 4x inference speed-up on the B200 compared to FP8 on the H100, while still preserving accuracy. Post-training also leans heavily into reinforcement learning, with the model trained across 21 environment configurations using NVIDIA’s tooling and more than 1.2 million environment rollouts.
The latest attention, though, comes from EnterpriseOps-Gym benchmarking. This test suite is built to measure how well AI agents operate inside realistic enterprise settings rather than answering static questions. Models are evaluated across 1,150 tasks in interactive environments and must use 512 functional tools to complete workflows. In practice, that means an agent may need to coordinate actions across multiple systems and utilities to finish a single end-to-end business process, the kind of scenario enterprises care about when deploying AI assistants for operations, support, and productivity.
In the open-source leaderboard results shared for EnterpriseOps-Gym, Nemotron 3 Super takes the number one position with an average score of 27.3 points. It leads in TEAMS, Email, and Hybrid workflows, while also remaining highly competitive in CSM, ITSM, and Drive workflows. In the same ranking, Kimi-K2.5 appears in second place and DeepSeek v3.2 in third, with GPT-OSS-120B listed in fifth.
Nemotron 3 Super is also part of a broader NVIDIA family aimed at different performance and deployment needs. The Nemotron 3 lineup includes Nano, Super, and Ultra variants, and NVIDIA has also introduced Nemotron 3 Nano Omni, which is described as delivering a major throughput boost for agentic AI scenarios.
With Nemotron 3 Super now leading a tool-heavy, workflow-based enterprise benchmark, NVIDIA is making a clear case that it wants to be seen as a full-stack AI provider. The message is straightforward: pairing advanced model architecture, aggressive efficiency optimizations, and enterprise-focused training with its GPU ecosystem is how NVIDIA plans to stay ahead as businesses increasingly demand AI that can actually do work, not just generate text.






