# Video as a Universal Interface for AI Reasoning with Sherry Yang - #676 Page: https://stenobird.com/podcast/twiml-ai-podcast/video-as-a-universal-interface-for-ai-reasoning-with-sherry-yang-676 Text version: https://stenobird.com/podcast/twiml-ai-podcast/video-as-a-universal-interface-for-ai-reasoning-with-sherry-yang-676.md Podcast: [The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)](https://stenobird.com/podcast/twiml-ai-podcast) Published: 2024-03-18T17:09:00+00:00 Episode link: https://twimlai.com/podcast/twimlai/video-as-a-universal-interface-for-ai-reasoning/ Audio file: https://pscrb.fm/rss/p/traffic.megaphone.fm/MLN1270874992.mp3?updated=1710918505 Processing state: failed JSON: https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/video-as-a-universal-interface-for-ai-reasoning-with-sherry-yang-676 Duration seconds: 2974 ## Resource Today we’re joined by Sherry Yang, senior research scientist at Google DeepMind and a PhD student at UC Berkeley. In this interview, we discuss her new paper, "Video as the New Language for Real-World Decision Making,” which explores how generative video models can play a role similar to language models as a way to solve tasks in the real world. Sherry draws the analogy between natural language as a unified representation of information and text prediction as a common task interface and demonstrates how video as a medium and generative video as a task exhibit similar properties. This formulation enables video generation models to play a variety of real-world roles as planners, agents, compute engines, and environment simulators. Finally, we explore UniSim, an interactive demo of Sherry's work and a preview of her vision for interacting with AI-generated environments. The complete show notes for this episode can be found at twimlai.com/go/676. ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/video-as-a-universal-interface-for-ai-reasoning-with-sherry-yang-676/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/twiml-ai-podcast/video-as-a-universal-interface-for-ai-reasoning-with-sherry-yang-676.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.