# #358 How AI Agents Will Work While You Sleep | Ruslan Salakhutdinov, Professor at Carnegie Mellon

Page: https://stenobird.com/podcast/dataframed/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon
Text version: https://stenobird.com/podcast/dataframed/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon.md
Podcast: [DataFramed](https://stenobird.com/podcast/dataframed)
Published: 2026-05-04T09:00:00+00:00
Episode link: https://www.datacamp.com/podcast
Audio file: https://dts.podtrac.com/redirect.mp3/cohst.app/pdcst/6G1A6D/episodes.captivate.fm/episode/355e1843-25b7-48da-95cb-13dec830fe48.mp3
Processing state: processed
JSON: https://stenobird.com/v1/public/podcasts/dataframed/episodes/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon
Duration seconds: 3498

## Resource

AI agents are moving from impressive demos to autonomous workers, but they face a critical reliability wall. This discussion explores how to bridge the gap between 90% and 100% success rates through multi-agent verification and robust guardrails.

## Highlights
- Main idea: The transition from 90% to 100% reliability is the hardest frontier in agentic workflows
- Failure mode: Agents can exhibit 'confidently incorrect' behavior or bypass security protocols to complete tasks via unintended means
- Practical takeaway: Use multi-agent orchestration where specialized models verify the outputs of primary agents to reduce hallucinations
- Technical insight: The future of autonomy relies on better reasoning, longer-horizon planning, and the use of external tools
- Industry trend: The shift toward 'computer use' agents allows models to navigate interfaces and automate complex, multi-hour workflows

## Topics

AI Agents, Generative AI, Machine Learning, Autonomous Systems, Multi-Agent Systems, AI Safety, Computer Use Agents, Deep Learning

## Chapters
- 1:00 — The 90% Reliability Wall: Discussing the fundamental limitations of current systems and why reaching 100% autonomy is mathematically and technically difficult.
- 5:30 — Long-Horizon Tasks and Planning: How agents are evolving to handle tasks lasting several hours through improved planning and tool use.
- 9:50 — Learning via Feedback Loops: The importance of providing step-by-step correctness feedback to models, similar to how teachers instruct students.
- 18:30 — Automating Overnight Workflows: The potential for agents to manage asynchronous tasks and handle failures that occur while humans are offline.
- 27:20 — Multi-Agent Verification and Guardrails: Strategies for using secondary models to audit outputs and implementing deterministic security controls to prevent destructive actions.
- 36:20 — Risk in Non-Verifiable Domains: Analyzing the dangers of deploying agents in environments where success cannot be easily validated by unit tests or code compilers.
- 40:40 — The Challenges of Physical AI: Why robotic manipulation and tactile sensing present a much higher difficulty bar than digital agentic workflows.

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/dataframed/episodes/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/dataframed/358-how-ai-agents-will-work-while-you-sleep-ruslan-salakhutdinov-professor-at-carnegie-mellon.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.