Episode

Proactive Agents for the Web with Devi Parikh - #756

Podcast: The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Published: Nov 19, 2025
Duration seconds: 3364
Processing state: processed
Canonical source: https://twimlai.com/podcast/twimlai/proactive-agents-for-the-web/
Audio: https://pscrb.fm/rss/p/traffic.megaphone.fm/MLN8999995371.mp3?updated=1763502496
JSON: /v1/public/podcasts/twiml-ai-podcast/episodes/proactive-agents-for-the-web-with-devi-parikh-756
Markdown: /podcast/twiml-ai-podcast/proactive-agents-for-the-web-with-devi-parikh-756.md

Actions

POST https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/proactive-agents-for-the-web-with-devi-parikh-756/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/twiml-ai-podcast/proactive-agents-for-the-web-with-devi-parikh-756.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

The future of web interaction lies in moving from manual clicking to high-level abstraction via proactive, autonomous agents. Devi Parikh explains how Yutori uses visually-grounded models to navigate the web more reliably than traditional DOM-based approaches.

Topics

Proactive Agents
Web Automation
Computer Vision
Multimodal Models
Browser Use Models
Autonomous Agents
Yutori
AI Agents

Highlights

Main idea: Moving from DOM-based parsing to vision-based models provides much higher robustness against brittle web interfaces
Technical approach: Yutori utilizes a training pipeline involving supervised fine-tuning, rejection sampling, and reinforcement learning
Practical takeaway: Using 'Scouts' allows for ambient, background automation that monitors the web and reports findings without active user input
Failure mode: Traditional browser automation often breaks due to edge cases in website structures, necessitating a shift toward visual grounding
Future vision: The goal is to transition from simple information monitoring to complex, multi-step task automation that operates autonomously

Chapters

1:00 The Evolution of Web Interaction: A look back at the progress in AI and the shift toward browser-use agents.
9:15 The Rise of Browser Agents: Discussing the excitement around automating web tasks and the potential for broader platforms.
22:05 Scaling Complex Workflows: How improving foundation models and custom training pipelines pushes the ceiling of agent capabilities.
29:40 Beyond Static Reports: Moving from simple data retrieval to interactive, actionable outputs from web agents.
37:40 The Shift to Vision-Based Navigation: Why relying on screenshots and visual grounding is more reliable than parsing the DOM.
46:25 Adaptive Orchestration: How 'Scouts' use adaptive plans and tool-use to execute complex, multi-step web tasks.
50:30 Ambient Agentic Systems: The concept of background agents that monitor the web 24/7 and notify users of significant events.