{"podcast":{"title":"Latent Space: The AI Engineer Podcast","slug":"latent-space-ai-engineer","podcast_index_feed_id":6058902,"rss_url":"https://api.substack.com/feed/podcast/1084089.rss","website_url":"https://www.latent.space/podcast","image_url":"https://substackcdn.com/feed/podcast/1084089/ca7468da5614a246d2906ee8926f6de7.jpg","author":"Latent.Space","episode_count":204,"summary":"The AI Engineer newsletter + Top technical AI podcast. How leading labs build Agents, Models, Infra, & AI for Science. See https://latent.space/about for highlights from Greg Brockman, Andrej Karpathy, George Hotz, Simon Willison, Soumith Chintala et al!","last_synced_at":null,"page_url":"https://stenobird.com/podcast/latent-space-ai-engineer"},"episode":{"title":"Moonlake: Causal World Models should be Multimodal, Interactive, and Efficient — with Chris Manning and Fan-yun Sun","slug":"moonlake-causal-world-models-should-be-multimodal-interactive-and-efficient-with-chris-manning-and-fan-yun-sun","published_at":"2026-04-02T17:55:29+00:00","page_url":"https://stenobird.com/podcast/latent-space-ai-engineer/moonlake-causal-world-models-should-be-multimodal-interactive-and-efficient-with-chris-manning-and-fan-yun-sun","show_page_url":"https://stenobird.com/podcast/latent-space-ai-engineer","url":"https://www.latent.space/p/moonlake","audio_url":"https://api.substack.com/feed/podcast/192967759/1555edb9d5649c656d2244abc7f5eeff.mp3","summary":"Moonlake AI proposes a shift from pixel-heavy, static world models to efficient, causal, and interactive environments. By bootstrapping from game engines and using structured abstractions, they aim to create infinitely playable, multi-agent worlds for training embodied AI.","meta_description":"Explore Moonlake AI's approach to efficient world models using causal structure, game engine bootstrapping, and multimodal reasoning to achieve AGI.","key_points":["Main idea: Moving beyond blind scaling toward efficient world models that use structural and causal priors rather than high-resolution pixel density","Practical takeaway: Using game engines as a foundation allows for much higher interaction fidelity and longer horizons than current video-generation models","Failure mode: Current SOTA models suffer from physical glitches, such as objects clipping through each other or floating, due to a lack of underlying physics logic","Core thesis: Effective world modeling for planning does not require high-resolution visual input; abstracted, object-level representations are often sufficient","Strategic vision: Leveraging synthetic data from interactive environments to bridge the gap between simulation and real-world embodied intelligence"],"chapters":[{"start_ms":360000,"title":"The Need for Structure","summary":"Discussing the importance of incorporating geometry, physics, and affordances into the distillation of reasoning traces."},{"start_ms":645000,"title":"Abstraction via Language","summary":"Exploring how language serves as a high-level, human-designed abstraction of the physical world."},{"start_ms":950000,"title":"Efficiency through Latent Abstraction","summary":"Analyzing how representing important features in less space can lead to more efficient and scalable models."},{"start_ms":1565000,"title":"Physics Engines and Specialized Models","summary":"The potential for deploying specialized models, such as those focused on fluid dynamics, by leveraging existing physics engines."},{"start_ms":1905000,"title":"The Impact of World Priors on Rendering","summary":"How integrating world priors into the rendering loop enables novel, physically-grounded interactions for artists."},{"start_ms":2215000,"title":"Benchmarking World Models","summary":"The difficulty of evaluating world models across axes like logical reasoning, math, and visual fidelity."},{"start_ms":3415000,"title":"Multimodal Reasoning and Latent Space","summary":"The vision for a unified latent space that integrates audio, text, and video for complex reasoning."}],"topics":["World Models","Embodied AI","Causal Inference","Synthetic Data","Game Engines","Multimodal Learning","Computer Vision","Artificial General Intelligence"],"duration_seconds":4007,"processing_state":"processed","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/moonlake-causal-world-models-should-be-multimodal-interactive-and-efficient-with-chris-manning-and-fan-yun-sun/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/latent-space-ai-engineer/moonlake-causal-world-models-should-be-multimodal-interactive-and-efficient-with-chris-manning-and-fan-yun-sun.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}