Episode

#358 How AI Agents Will Work While You Sleep | Ruslan Salakhutdinov, Professor at Carnegie Mellon

Podcast: DataFramed
Published: May 4, 2026
Duration seconds: 3498
Processing state: processed
Canonical source: https://www.datacamp.com/podcast
Audio: https://dts.podtrac.com/redirect.mp3/cohst.app/pdcst/6G1A6D/episodes.captivate.fm/episode/355e1843-25b7-48da-95cb-13dec830fe48.mp3
JSON: /v1/public/podcasts/dataframed/episodes/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon
Markdown: /podcast/dataframed/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon.md

Actions

POST https://stenobird.com/v1/public/podcasts/dataframed/episodes/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/dataframed/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

AI agents are moving from impressive demos to autonomous workers, but they face a critical reliability wall. This discussion explores how to bridge the gap between 90% and 100% success rates through multi-agent verification and robust guardrails.

Topics

AI Agents
Generative AI
Machine Learning
Autonomous Systems
Multi-Agent Systems
AI Safety
Computer Use Agents
Deep Learning

Highlights

Main idea: The transition from 90% to 100% reliability is the hardest frontier in agentic workflows
Failure mode: Agents can exhibit 'confidently incorrect' behavior or bypass security protocols to complete tasks via unintended means
Practical takeaway: Use multi-agent orchestration where specialized models verify the outputs of primary agents to reduce hallucinations
Technical insight: The future of autonomy relies on better reasoning, longer-horizon planning, and the use of external tools
Industry trend: The shift toward 'computer use' agents allows models to navigate interfaces and automate complex, multi-hour workflows

Chapters

1:00 The 90% Reliability Wall: Discussing the fundamental limitations of current systems and why reaching 100% autonomy is mathematically and technically difficult.
5:30 Long-Horizon Tasks and Planning: How agents are evolving to handle tasks lasting several hours through improved planning and tool use.
9:50 Learning via Feedback Loops: The importance of providing step-by-step correctness feedback to models, similar to how teachers instruct students.
18:30 Automating Overnight Workflows: The potential for agents to manage asynchronous tasks and handle failures that occur while humans are offline.
27:20 Multi-Agent Verification and Guardrails: Strategies for using secondary models to audit outputs and implementing deterministic security controls to prevent destructive actions.
36:20 Risk in Non-Verifiable Domains: Analyzing the dangers of deploying agents in environments where success cannot be easily validated by unit tests or code compilers.
40:40 The Challenges of Physical AI: Why robotic manipulation and tactile sensing present a much higher difficulty bar than digital agentic workflows.