{"podcast":{"title":"Latent Space: The AI Engineer Podcast","slug":"latent-space-ai-engineer","podcast_index_feed_id":6058902,"rss_url":"https://api.substack.com/feed/podcast/1084089.rss","website_url":"https://www.latent.space/podcast","image_url":"https://substackcdn.com/feed/podcast/1084089/ca7468da5614a246d2906ee8926f6de7.jpg","author":"Latent.Space","episode_count":204,"summary":"The AI Engineer newsletter + Top technical AI podcast. How leading labs build Agents, Models, Infra, & AI for Science. See https://latent.space/about for highlights from Greg Brockman, Andrej Karpathy, George Hotz, Simon Willison, Soumith Chintala et al!","last_synced_at":null,"page_url":"https://stenobird.com/podcast/latent-space-ai-engineer"},"episode":{"title":"Scaling Test Time Compute to Multi-Agent Civilizations — Noam Brown, OpenAI","slug":"scaling-test-time-compute-to-multi-agent-civilizations-noam-brown-openai","published_at":"2025-06-19T15:00:00+00:00","page_url":"https://stenobird.com/podcast/latent-space-ai-engineer/scaling-test-time-compute-to-multi-agent-civilizations-noam-brown-openai","show_page_url":"https://stenobird.com/podcast/latent-space-ai-engineer","url":"https://www.latent.space/p/scaling-test-time-compute-to-multi","audio_url":"https://api.substack.com/feed/podcast/186632807/f77612a5c99afad47e4fa732a753d087.mp3","summary":"Solving Poker and Diplomacy, Debating RL+Reasoning with Ilya, what’s *wrong* with the System 1/2 analogy, and where Test-Time Compute hits a wall Full Video Episode Timestamps 00:00 Intro – Diplomacy, Cicero & World Championship 02:00 Reverse Centaur: How AI Improved Noam’s Human Play 05:00 Turing Test Failures in Chat: Hallucinations & Steerability 07:30 Reasoning Models & Fast vs. Slow Thinking Paradigm 11:00 System 1 vs. System 2 in Visual Tasks (GeoGuessr, Tic-Tac-Toe) 14:00 The Deep Research Existence Proof for Unverifiable Domains 17:30 Harnesses, Tool Use, and Fragility in AI Agents 21:00 The Case Against Over-Reliance on Scaffolds and Routers 24:00 Reinforcement Fine-Tuning and Long-Term Model Adaptability 28:00 Ilya’s Bet on Reasoning and the O-Series Breakthrough 34:00 Noam’s Dev Stack: Codex, Windsurf & AGI Moments 38:00 Building Better AI Developers: Memory, Reuse, and PR Reviews 41:00 Multi-Agent Intelligence and the “AI Civilization” Hypothesis 44:30 Implicit World Models and Theory of Mind Through Scaling 48:00 Why Self-Play Breaks Down Beyond Go and Chess 54:00 Designing Better Benchmarks for Fuzzy Tasks 57:30 The Real Limits of Test-Time Compute: Cost vs. Time 1:00:30 Data Efficiency Gaps Between Humans and LLMs 1:03:00 Training Pipeline: Pretraining, Midtraining, Posttraining 1:05:00 Games as Research Proving Grounds: Poker, MTG, Stratego 1:10:00 Closing Thoughts – Five-Year View and Open Research Directions This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe","meta_description":"Solving Poker and Diplomacy, Debating RL+Reasoning with Ilya, what’s *wrong* with the System 1/2 analogy, and where Test-Time Compute hits a wall Full Vid…","key_points":[],"chapters":[],"topics":[],"duration_seconds":4667,"processing_state":"processed","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/scaling-test-time-compute-to-multi-agent-civilizations-noam-brown-openai/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/latent-space-ai-engineer/scaling-test-time-compute-to-multi-agent-civilizations-noam-brown-openai.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}