Episode

#358 How AI Agents Will Work While You Sleep | Ruslan Salakhutdinov, Professor at Carnegie Mellon

Podcast
DataFramed
Published
May 4, 2026
Duration seconds
3498
Processing state
processed
Canonical source
https://www.datacamp.com/podcast
Audio
https://dts.podtrac.com/redirect.mp3/cohst.app/pdcst/6G1A6D/episodes.captivate.fm/episode/355e1843-25b7-48da-95cb-13dec830fe48.mp3
JSON
/v1/public/podcasts/dataframed/episodes/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon
Markdown
/podcast/dataframed/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/dataframed/episodes/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/dataframed/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

AI agents are moving from impressive demos to autonomous workers, but they face a critical reliability wall. This discussion explores how to bridge the gap between 90% and 100% success rates through multi-agent verification and robust guardrails.

Topics

  • AI Agents
  • Generative AI
  • Machine Learning
  • Autonomous Systems
  • Multi-Agent Systems
  • AI Safety
  • Computer Use Agents
  • Deep Learning

Highlights

  • Main idea: The transition from 90% to 100% reliability is the hardest frontier in agentic workflows
  • Failure mode: Agents can exhibit 'confidently incorrect' behavior or bypass security protocols to complete tasks via unintended means
  • Practical takeaway: Use multi-agent orchestration where specialized models verify the outputs of primary agents to reduce hallucinations
  • Technical insight: The future of autonomy relies on better reasoning, longer-horizon planning, and the use of external tools
  • Industry trend: The shift toward 'computer use' agents allows models to navigate interfaces and automate complex, multi-hour workflows

Chapters

  1. 1:00 The 90% Reliability Wall: Discussing the fundamental limitations of current systems and why reaching 100% autonomy is mathematically and technically difficult.
  2. 5:30 Long-Horizon Tasks and Planning: How agents are evolving to handle tasks lasting several hours through improved planning and tool use.
  3. 9:50 Learning via Feedback Loops: The importance of providing step-by-step correctness feedback to models, similar to how teachers instruct students.
  4. 18:30 Automating Overnight Workflows: The potential for agents to manage asynchronous tasks and handle failures that occur while humans are offline.
  5. 27:20 Multi-Agent Verification and Guardrails: Strategies for using secondary models to audit outputs and implementing deterministic security controls to prevent destructive actions.
  6. 36:20 Risk in Non-Verifiable Domains: Analyzing the dangers of deploying agents in environments where success cannot be easily validated by unit tests or code compilers.
  7. 40:40 The Challenges of Physical AI: Why robotic manipulation and tactile sensing present a much higher difficulty bar than digital agentic workflows.