Episode
#358 How AI Agents Will Work While You Sleep | Ruslan Salakhutdinov, Professor at Carnegie Mellon
- Podcast
- DataFramed
- Published
- May 4, 2026
- Duration seconds
- 3498
- Processing state
processed- Canonical source
- https://www.datacamp.com/podcast
Actions
POST https://stenobird.com/v1/public/podcasts/dataframed/episodes/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon/transcription-requests
Idempotently request low-priority transcript generation for this episode.GET https://stenobird.com/podcast/dataframed/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon.md
Read the agent-friendly Markdown representation of this episode resource.
Summary
AI agents are moving from impressive demos to autonomous workers, but they face a critical reliability wall. This discussion explores how to bridge the gap between 90% and 100% success rates through multi-agent verification and robust guardrails.
Topics
- AI Agents
- Generative AI
- Machine Learning
- Autonomous Systems
- Multi-Agent Systems
- AI Safety
- Computer Use Agents
- Deep Learning
Highlights
- Main idea: The transition from 90% to 100% reliability is the hardest frontier in agentic workflows
- Failure mode: Agents can exhibit 'confidently incorrect' behavior or bypass security protocols to complete tasks via unintended means
- Practical takeaway: Use multi-agent orchestration where specialized models verify the outputs of primary agents to reduce hallucinations
- Technical insight: The future of autonomy relies on better reasoning, longer-horizon planning, and the use of external tools
- Industry trend: The shift toward 'computer use' agents allows models to navigate interfaces and automate complex, multi-hour workflows
Chapters
1:00The 90% Reliability Wall: Discussing the fundamental limitations of current systems and why reaching 100% autonomy is mathematically and technically difficult.5:30Long-Horizon Tasks and Planning: How agents are evolving to handle tasks lasting several hours through improved planning and tool use.9:50Learning via Feedback Loops: The importance of providing step-by-step correctness feedback to models, similar to how teachers instruct students.18:30Automating Overnight Workflows: The potential for agents to manage asynchronous tasks and handle failures that occur while humans are offline.27:20Multi-Agent Verification and Guardrails: Strategies for using secondary models to audit outputs and implementing deterministic security controls to prevent destructive actions.36:20Risk in Non-Verifiable Domains: Analyzing the dangers of deploying agents in environments where success cannot be easily validated by unit tests or code compilers.40:40The Challenges of Physical AI: Why robotic manipulation and tactile sensing present a much higher difficulty bar than digital agentic workflows.