OpenAI Unveils Three New Real-Time Audio API Models, Headlined by GPT-Realtime-2

OpenAI is making a major push to turn voice AI into something that feels less like a scripted Q&A bot and more like a capable live agent. The company has introduced three new real-time audio models available through its API, and at the same time it has moved the Realtime API out of beta and into general availability, meaning developers can now deploy it in production with more confidence.

The headline model is GPT-Realtime-2, OpenAI’s first real-time voice system built with GPT-5-class reasoning. Instead of the typical “transcribe first, think second, speak last” pipeline that can create noticeable delays, GPT-Realtime-2 processes audio as a continuous stream. That helps it understand what’s being said in the moment and respond more naturally, without the lag that comes from separate transcription and voice synthesis steps.

A big practical upgrade is context length. GPT-Realtime-2 supports a 128K token context window, up from 32K in the previous version. For businesses building voice agents, that matters because it enables longer calls, more detailed back-and-forth, and multi-step workflows without constantly resetting the conversation or relying on complicated external memory systems.

OpenAI is positioning GPT-Realtime-2 around “agentic” voice behavior, meaning it can do more than just talk. It can listen, reason, call tools, and keep the conversation moving while it works. Features include preambles that fill the silence with natural phrases like “One moment” while the system runs a tool call, parallel tool calls so it can handle multiple backend requests at once, and stronger recovery behavior so it can explain issues out loud rather than stalling. There’s also tone adjustment, allowing the voice to shift style depending on the situation, such as calmer and more measured for support interactions or more upbeat for confirmations.

Performance numbers suggest meaningful gains. OpenAI says GPT-Realtime-2 scores 15.2% higher than GPT-Realtime-1.5 on its Big Bench Audio reasoning benchmark, and 13.8% higher on Audio Multichallenger for instruction following. In applied testing, Zillow reported a 26-point improvement in call success rate on its hardest adversarial benchmark, increasing from 69% to 95% after prompt optimization with GPT-Realtime-2.

Pricing for GPT-Realtime-2 is set at $32 per million audio input tokens and $64 per million audio output tokens, with cached input tokens priced at $0.40 per million.

Alongside GPT-Realtime-2, OpenAI released GPT-Realtime-Translate, a model built specifically for live speech translation. It’s designed to translate continuously in real time without requiring the speaker to pause or finish a full sentence, a key detail for customer support, education, live events, and cross-border sales conversations. The model supports more than 70 input languages and 13 output languages. BolnaAI, which focuses on Indian language markets, reported a 12.5% reduction in word error rates for Hindi, Tamil, and Telugu compared to its previous translation approach. GPT-Realtime-Translate is priced at $0.034 per minute of audio processing.

The third release is GPT-Realtime-Whisper, which brings streaming capabilities to OpenAI’s widely used Whisper speech recognition technology. While traditional transcription tools often work best after recording is complete, this model is aimed at live captions produced as someone is speaking. That opens up uses in live meetings, courtroom documentation, newsroom transcription, and accessibility tools for people who are hearing impaired. It’s also the most affordable of the three models at $0.017 per minute.

All three models are available now through the OpenAI API and the developer playground.

OpenAI is also expanding what developers can build around real-time voice by adding MCP server support, image input capabilities, and SIP phone calling integration to the Realtime API. Together, those additions broaden enterprise telephony options and make it easier to create end-to-end voice agent workflows without leaving the API environment.

One important note for developers and businesses experimenting with new AI tools: high-interest product launches often attract scammers. It’s a good reminder to stick to official sources, verify domains carefully, and avoid downloading unknown “installers” or unofficial clients when evaluating new AI services.

OpenAI Unveils Three New Real-Time Audio API Models, Headlined by GPT-Realtime-2

Share this:

Related Posts: