NVIDIA Continues To Do What It Does Best - Intros Neomotron 3 Nano Omni Open Model That Makes Agentic AI 9x Faster

NVIDIA Rallying Foxconn, Palantir, and Oracle to Back Nemotron 3 Nano Omni, a New Open AI Model Promising a 9x Performance Leap

NVIDIA is expanding its lineup of open AI models with the launch of Nemotron 3 Nano Omni, a new open multimodal model designed to power faster, more efficient agentic AI. The company says the model can deliver up to 9x higher agentic AI throughput, aiming to make real-time “AI agents” more responsive while keeping compute costs under control.

Nemotron 3 Nano Omni is built to handle multiple input types in one system, including video, audio, images, and text. Instead of relying on separate models for perception (like one model for vision and another for audio), it объединяет these capabilities into a single production-ready package. The result, according to NVIDIA, is quicker and smarter agent responses with stronger reasoning across mixed media—useful for enterprises and developers who want flexibility in how and where they deploy their AI.

A key part of the model’s focus is efficiency. Nemotron 3 Nano Omni uses a 30B-A3B hybrid mixture-of-experts architecture and includes both vision and audio encoders, removing the overhead of stitching together multiple perception models. NVIDIA positions this as a way to improve inference efficiency at scale without losing responsiveness—one of the biggest challenges when building multimodal AI agents that need to “see,” “hear,” and “read” at the same time.

NVIDIA also highlights accuracy improvements, saying Nemotron 3 Nano Omni sets a new bar for cost-effective multimodal performance. The company notes that it leads six leaderboards in areas like complex document intelligence along with video and audio understanding, signaling that the model is intended for more than just general chat—it’s being framed as a practical engine for enterprise-grade reasoning.

Adoption is already underway across the AI and software ecosystem. Companies reported as adopting Nemotron 3 Nano Omni include Aible, Applied Scientific Intelligence (ASI), Eka Care, Foxconn, H Company, Palantir, and Pyler. NVIDIA also says other major organizations are currently evaluating the model, including Dell Technologies, DocuSign, Infosys, K-Dense, Lila, Oracle, and Zefr—suggesting interest in real-world deployments for productivity, automation, and analytics.

Where this model may stand out most is in agentic workflows—systems where an AI doesn’t just respond, but actively performs tasks using specialized sub-agents. Nemotron 3 Nano Omni is positioned to work alongside other models depending on workload needs. For example, organizations could pair it with other open Nemotron models such as Nemotron 3 Super for frequent execution tasks or Nemotron 3 Ultra for heavier planning, and it can also integrate into stacks that include proprietary models from other providers.

NVIDIA outlined several practical use cases where Nemotron 3 Nano Omni is designed to fit:

For computer-use agents, the model can drive the perception loop for agents that navigate graphical user interfaces, interpret what’s happening on-screen, and track UI state over time. NVIDIA points to H Company’s latest computer-usage agent as an example, using a native 1920×1080 input resolution to support high-fidelity visual reasoning. In early OSWorld benchmark evaluations, this integration reportedly improved navigation through complex interfaces, helped by the model’s ability to process very high-resolution images.

For document intelligence, Nemotron 3 Nano Omni is meant to interpret documents that combine text and visuals—charts, tables, screenshots, and mixed media—so agents can reason across structure and content in a coherent way. This is especially relevant for enterprise analysis, auditing, and compliance workflows where understanding layout and context can matter as much as the words themselves.

For audio and video understanding, the model is designed to keep audio-video context together, connecting what was said, what was shown, and what was documented into a single reasoning stream. That can be valuable in customer service, research, and monitoring scenarios where context often gets lost when systems rely on separate summaries for different media types.

With Nemotron 3 Nano Omni, NVIDIA is clearly targeting a growing demand: open, multimodal AI models that are faster to run, cheaper to scale, and capable of powering real AI agents that can see, hear, and reason across the messy mix of content businesses deal with every day.