# Proactive Agents for the Web with Devi Parikh - #756 Page: https://stenobird.com/podcast/twiml-ai-podcast/proactive-agents-for-the-web-with-devi-parikh-756 Text version: https://stenobird.com/podcast/twiml-ai-podcast/proactive-agents-for-the-web-with-devi-parikh-756.md Podcast: [The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)](https://stenobird.com/podcast/twiml-ai-podcast) Published: 2025-11-19T01:49:00+00:00 Episode link: https://twimlai.com/podcast/twimlai/proactive-agents-for-the-web/ Audio file: https://pscrb.fm/rss/p/traffic.megaphone.fm/MLN8999995371.mp3?updated=1763502496 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/proactive-agents-for-the-web-with-devi-parikh-756 Duration seconds: 3364 ## Resource The future of web interaction lies in moving from manual clicking to high-level abstraction via proactive, autonomous agents. Devi Parikh explains how Yutori uses visually-grounded models to navigate the web more reliably than traditional DOM-based approaches. ## Highlights - Main idea: Moving from DOM-based parsing to vision-based models provides much higher robustness against brittle web interfaces - Technical approach: Yutori utilizes a training pipeline involving supervised fine-tuning, rejection sampling, and reinforcement learning - Practical takeaway: Using 'Scouts' allows for ambient, background automation that monitors the web and reports findings without active user input - Failure mode: Traditional browser automation often breaks due to edge cases in website structures, necessitating a shift toward visual grounding - Future vision: The goal is to transition from simple information monitoring to complex, multi-step task automation that operates autonomously ## Topics Proactive Agents, Web Automation, Computer Vision, Multimodal Models, Browser Use Models, Autonomous Agents, Yutori, AI Agents ## Chapters - 1:00 — The Evolution of Web Interaction: A look back at the progress in AI and the shift toward browser-use agents. - 9:15 — The Rise of Browser Agents: Discussing the excitement around automating web tasks and the potential for broader platforms. - 22:05 — Scaling Complex Workflows: How improving foundation models and custom training pipelines pushes the ceiling of agent capabilities. - 29:40 — Beyond Static Reports: Moving from simple data retrieval to interactive, actionable outputs from web agents. - 37:40 — The Shift to Vision-Based Navigation: Why relying on screenshots and visual grounding is more reliable than parsing the DOM. - 46:25 — Adaptive Orchestration: How 'Scouts' use adaptive plans and tool-use to execute complex, multi-step web tasks. - 50:30 — Ambient Agentic Systems: The concept of background agents that monitor the web 24/7 and notify users of significant events. ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/proactive-agents-for-the-web-with-devi-parikh-756/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/twiml-ai-podcast/proactive-agents-for-the-web-with-devi-parikh-756.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.