Episode
Voice Agent Use Cases
- Podcast
- MLOps.community
- Published
- May 1, 2026
- Duration seconds
- 3064
- Processing state
processed
Actions
POST https://stenobird.com/v1/public/podcasts/mlops-community/episodes/voice-agent-use-cases/transcription-requests
Idempotently request low-priority transcript generation for this episode.GET https://stenobird.com/podcast/mlops-community/voice-agent-use-cases.md
Read the agent-friendly Markdown representation of this episode resource.
Summary
This episode is brought to you by the MLflow team. Check out more information at MLflow.org . What does it actually take to build voice AI at a billion-interaction scale? This episode features an ex-Amazon voice AI engineer who built customer support systems handling 2 billion+ interactions β now working on next-gen voice agent platforms. Anurag digs deep into the real engineering tradeoffs, design patterns, and use cases that separate production-grade voice agents from demos. Voice Agent Use Cases // MLOps Podcast #372 with Anurag Beniwal, Member of the Technical Staff at ElevenLabs ποΈ Topics covered: πΉ Cascaded vs. speech-to-speech β Why cascaded systems still win in production, and how to make them feel natural without sacrificing control πΉ Latency masking β Foreground/background model architecture and how to buy yourself time while deep retrieval runs πΉ Constellation of models β Using Haiku for tool calling, fine-tuned smaller models for response generation, and why "one model for everything" breaks at scale πΉ Turn-taking & ASR challenges β Why voice is harder than chat: accents, noise, silence detection, and domain-specific fine-tuning πΉ Level 1 vs Level 2 customer support β Why today's agents max out at Level 1 and what it takes to capture Level 2 expert judgment πΉ Inbound vs. outbound sales agents β Where voice agents are already winning, and why inbound lead qualification beats cold outbound πΉ Booking, reservations & concierge β The clearest near-term wins for voice agents across hospitality, home services, and SMBs πΉ Continual learning from natural language feedback β How to build agents that improve from real operator feedback without ML expertise πΉ Conversational TTS β Why passing full conversation history to your TTS model changes everythiβ¦