BYOK (Bring Your Own Key)
BYOK means you bring your own API keys for the LLM, STT and TTS providers, and the voice AI platform routes usage through your accounts instead of bundling the provider costs into its own pricing.
BYOK stands for Bring Your Own Key. In voice AI, BYOK means you bring your own API keys for the underlying LLM, STT and TTS providers (e.g. OpenAI, Anthropic, Deepgram, ElevenLabs) and the voice AI platform uses your keys when making requests. Usage is billed directly by each provider to your own account. BYOK gives you full control over provider choice, cost, compliance, and pricing — the platform only charges a thin orchestration fee on top.
Why BYOK matters
Without BYOK, a voice AI platform bundles provider costs into its own pricing and takes a margin on every minute of audio. With BYOK, you pay providers directly — usually at lower rates than the platform's bundled price — and the platform only charges a predictable orchestration fee. For high-volume workloads the savings are large.
What BYOK is not
BYOK is not self-hosting. You still use the platform's flow editor, phone infrastructure, no-code builder and hosting. The only thing that changes is that provider costs go on your provider invoice, not the platform's invoice.
BYOK on ThinnestAI
ThinnestAI supports BYOK for every major LLM (OpenAI, Anthropic, Groq, Gemini, Sarvam, Mistral, DeepSeek), STT (Deepgram, AssemblyAI, Sarvam Saaras) and TTS (ElevenLabs, Cartesia, Sarvam Bulbul). Some providers — including Sarvam — are also available platform-managed for free-tier usage, so you can get started with zero keys and add your own as you scale.
More definitions
A voice AI agent is an AI-powered system that has real-time spoken conversations — over a phone call, a web widget or a SIP trunk — using speech recognition, a language model and speech synthesis.
Voice AI is the umbrella term for AI systems that understand and generate human speech in real time — powering voice assistants, phone agents, voice chatbots and real-time translation.
Conversational AI is the category of AI systems designed to interact with humans in natural language, across chat, voice, email and messaging — using NLU, LLMs and tool-calling to hold multi-turn conversations that actually accomplish work.
IVR is a rigid scripted decision tree (press 1 for sales). Voice AI is a natural-language agent that understands free-form speech, uses LLM reasoning, and calls tools to take real actions.
BYON means you bring your own phone number — via a Twilio, Vobiz or Exotel account — and connect it to the voice AI platform via SIP, instead of renting a number from the platform itself.
SIP trunking is the protocol that lets a voice AI platform send and receive phone calls over the internet, connecting to the public phone network via a carrier like Twilio or Vobiz.
