# #358 How AI Agents Will Work While You Sleep | Ruslan Salakhutdinov, Professor at Carnegie Mellon Page: https://stenobird.com/podcast/dataframed/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon Text version: https://stenobird.com/podcast/dataframed/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon.md Podcast: [DataFramed](https://stenobird.com/podcast/dataframed) Published: 2026-05-04T09:00:00+00:00 Episode link: https://www.datacamp.com/podcast Audio file: https://dts.podtrac.com/redirect.mp3/cohst.app/pdcst/6G1A6D/episodes.captivate.fm/episode/355e1843-25b7-48da-95cb-13dec830fe48.mp3 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/dataframed/episodes/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon Duration seconds: 3498 ## Resource AI agents are moving from impressive demos to autonomous workers, but they face a critical reliability wall. This discussion explores how to bridge the gap between 90% and 100% success rates through multi-agent verification and robust guardrails. ## Highlights - Main idea: The transition from 90% to 100% reliability is the hardest frontier in agentic workflows - Failure mode: Agents can exhibit 'confidently incorrect' behavior or bypass security protocols to complete tasks via unintended means - Practical takeaway: Use multi-agent orchestration where specialized models verify the outputs of primary agents to reduce hallucinations - Technical insight: The future of autonomy relies on better reasoning, longer-horizon planning, and the use of external tools - Industry trend: The shift toward 'computer use' agents allows models to navigate interfaces and automate complex, multi-hour workflows ## Topics AI Agents, Generative AI, Machine Learning, Autonomous Systems, Multi-Agent Systems, AI Safety, Computer Use Agents, Deep Learning ## Chapters - 1:00 — The 90% Reliability Wall: Discussing the fundamental limitations of current systems and why reaching 100% autonomy is mathematically and technically difficult. - 5:30 — Long-Horizon Tasks and Planning: How agents are evolving to handle tasks lasting several hours through improved planning and tool use. - 9:50 — Learning via Feedback Loops: The importance of providing step-by-step correctness feedback to models, similar to how teachers instruct students. - 18:30 — Automating Overnight Workflows: The potential for agents to manage asynchronous tasks and handle failures that occur while humans are offline. - 27:20 — Multi-Agent Verification and Guardrails: Strategies for using secondary models to audit outputs and implementing deterministic security controls to prevent destructive actions. - 36:20 — Risk in Non-Verifiable Domains: Analyzing the dangers of deploying agents in environments where success cannot be easily validated by unit tests or code compilers. - 40:40 — The Challenges of Physical AI: Why robotic manipulation and tactile sensing present a much higher difficulty bar than digital agentic workflows. ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/dataframed/episodes/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/dataframed/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.