DeepgramBacked by Deepgram Startup ProgramLearn more
Back to Blog

GPT Realtime in Your Voice Agents: Semantic VAD, BYOK, and a Cheaper Mini

T
Thinnest AI Team
May 11, 2026 5 min read
GPT Realtime in Your Voice Agents: Semantic VAD, BYOK, and a Cheaper Mini
GPT Realtime, end-to-end

The realtime model field, in 2026

It's worth admitting the obvious upfront: in 2026, there's more than one excellent realtime voice model. Gemini Live ships fast turn-taking and 30 native voices at a friendly price. OpenAI's GPT Realtime ships best-in-class instruction following, semantic-VAD turn detection, and the new marin / cedar voices on a native-audio pipeline. Builders shouldn't have to pick a platform that locks them into one camp.

Yesterday thinnestAI shipped Speech-to-Speech with Gemini Live. Today we're adding OpenAI Realtime as a first-class option in the same S2S tab — same one-toggle workflow, same BYOK plumbing, same half-cascade hatch, and one consolidated cost-breakdown row that reshapes itself based on the model you picked.

Two models, one sidebar

The S2S tab has a left-column model list that mirrors the STT and TTS tabs. Two OpenAI entries sit alongside the two Gemini ones:

  • GPT Realtime (GA) — OpenAI's production realtime model on the new native-audio pipeline. Semantic-VAD turn detection by default, full tool calling, the marin and cedar GA voices tuned for this pipeline.
  • GPT Realtime Mini — roughly 60% cheaper for high-volume or cost-sensitive voice flows. Same shape as the full model.

Click a model and the right-hand settings panel reshapes itself: Gemini-only cards (Vertex AI hosting, Affective Dialog, Proactivity) collapse, the OpenAI Turn Detection card appears, and the voice catalog swaps from 30 Gemini voices to OpenAI's 10. The temperature slider re-clamps to OpenAI's accepted 0.6–1.2 band so you can't accidentally save an out-of-range value.

Voices — marin, cedar, and the legacy eight

OpenAI ships two GA voices alongside gpt-realtime's native-audio launch:

  • marin — natural, conversational. Default.
  • cedar — warm, grounded. Good for support / empathy use cases.

Plus the eight legacy voices inherited from the earlier Realtime previews: alloy, ash, ballad, coral, echo, sage, shimmer, verse. The picker is a single searchable dropdown grouped into "GA voices" and "Legacy voices" so the new defaults are easy to find.

Turn Detection — semantic, server, or plugin default

OpenAI Realtime ships two turn-detection modes. Our new Turn Detection card surfaces both, plus a "plugin default" pill for when you want to defer to the LiveKit + OpenAI tuned default:

  • Semantic VAD (default) — uses a classifier over the caller's words to decide they're done speaking. Far less likely to chunk mid-sentence than silence-based VAD. An eagerness pill controls how aggressively the model commits to end-of-turn: auto (the tuned default), low (lets callers take their time), medium, or high (chunks audio as soon as possible). Pick low for elderly or hesitant callers; high for high-throughput dialer-style flows where you need snappy turns.
  • Server VAD — the classic silence-based mode. Three sliders: threshold (energy bar for "this is speech"), prefix padding (ms of audio kept before detected speech, so plosives and soft starts aren't clipped), and silence duration (ms of silence required to mark end-of-turn).
  • Plugin default — we don't send a turn-detection config at all. The LiveKit plugin uses its built-in default. Useful as a baseline when you're A/B testing.

Two more switches apply to both modes: auto-generate reply (the model speaks as soon as turn detection fires) and allow interruption (caller speech during the agent's reply cuts the response). Both default on. A single reset button on the card restores LiveKit + OpenAI's recommended defaults.

BYOK with your own OpenAI key

Same as Gemini, OpenAI gets a Bring-Your-Own-Key path. Paste a key from platform.openai.com/api-keys into the Hosting card and Realtime API usage bills to your OpenAI account — we charge only the platform fee.

The BYO key field validates the sk-… prefix in-browser. If you paste an AI Studio Gemini key (AIza…) by mistake, a warning surfaces immediately. Keys are stored encrypted and never logged.

Vertex AI hosting is a Gemini-only option, so it's hidden when an OpenAI model is selected. Switch back to Gemini and it returns.

Half-cascade with GPT Realtime

The half-cascade pattern (realtime model listens + reasons, your TTS speaks) works the same on OpenAI as on Gemini. Toggle Use custom TTS in the S2S tab and GPT Realtime runs in ["text"] modality, with the reply played through whatever cascaded TTS plugin you have configured on the agent — Cartesia, ElevenLabs, Sarvam, Aero TTS.

Unlike Gemini's native-audio models, both GPT Realtime variants accept TEXT modality, so the half-cascade toggle stays available on every OpenAI model.

Cost breakdown — provider-aware row

In S2S mode the cost-breakdown popup collapses STT + TTS + LLM into a single row. The row label tracks the selected provider:

  • GPT Realtime at $32 / $64 per million audio in/out tokens — roughly ₹9.05 per minute on a typical call.
  • GPT Realtime Mini at $13 / $26 per million — roughly ₹3.65 per minute.
  • BYOK (with your own OpenAI key) zeroes that row on our side. You pay OpenAI directly; we only charge the platform fee.

The half-cascade path restores the TTS row alongside the realtime row so the breakdown shows what each component contributes.

Greeting behaviour, by configuration

  • Pure S2S, GPT Realtime — the agent speaks the configured greeting on its first turn. The greeting text rides in on the initial generate_reply trigger (the LiveKit OpenAI plugin doesn't accept instructions at construction, so we inline the opening line in the trigger payload to keep behaviour consistent with Gemini).
  • S2S half-cascade — the greeting plays through your cascaded TTS plugin via session.say(), same as the classic flow.

When GPT Realtime is the right call

  • You want OpenAI-grade instruction following on every turn.
  • Your callers benefit from semantic VAD — conversation-heavy support, sales discovery, anything where chunking mid-sentence hurts.
  • You have an OpenAI Enterprise / Pro relationship and want consolidated billing.
  • You want the production native-audio pipeline (marin / cedar) rather than the older preview voices.

If cost-per-minute is the top constraint, look at GPT Realtime Mini (~₹3.65/min uncached) or Gemini Live (~₹6.80/min). If you need agent handoffs and parallel tool calls, every option on the S2S tab supports them (except experimental Gemini 3.1, which we flag explicitly).

Try it in 90 seconds

  1. Open any voice agent in Agent Studio → Voice Configuration.
  2. Click Speech-to-Speech in the top-right of the header.
  3. The S2S tab opens with the model sidebar on the left. Click GPT Realtime under "OpenAI Realtime".
  4. Leave Voice on marin (or pick cedar for a warmer feel).
  5. (Optional) Paste your OpenAI key into the Hosting card so usage bills to your OpenAI account.
  6. (Optional) Tune the Turn Detection card — Semantic VAD / eagerness=low for hesitant callers; Server VAD with a higher threshold for noisy environments.
  7. Save and click Try Voice Call.

Try GPT Realtime S2S Free →

Free trial includes 5 voice minutes · No credit card required · BYOK supported on every plan.

Frequently Asked Questions

Subscribe to our newsletter

Get the latest AI updates delivered directly to your inbox.