{"podcast":{"title":"Latent Space: The AI Engineer Podcast","slug":"latent-space-ai-engineer","podcast_index_feed_id":6058902,"rss_url":"https://api.substack.com/feed/podcast/1084089.rss","website_url":"https://www.latent.space/podcast","image_url":"https://substackcdn.com/feed/podcast/1084089/ca7468da5614a246d2906ee8926f6de7.jpg","author":"Latent.Space","episode_count":214,"summary":"The AI Engineer newsletter + Top technical AI podcast. How leading labs build Agents, Models, Infra, & AI for Science. See https://latent.space/about for highlights from Greg Brockman, Andrej Karpathy, George Hotz, Simon Willison, Soumith Chintala et al!","last_synced_at":"2026-07-17T00:20:53.505905+00:00","page_url":"https://stenobird.com/podcast/latent-space-ai-engineer"},"episode":{"title":"Owning the AI Pareto Frontier — Jeff Dean","slug":"owning-the-ai-pareto-frontier-jeff-dean","published_at":"2026-02-12T22:02:35+00:00","page_url":"https://stenobird.com/podcast/latent-space-ai-engineer/owning-the-ai-pareto-frontier-jeff-dean","show_page_url":"https://stenobird.com/podcast/latent-space-ai-engineer","url":"https://www.latent.space/p/jeffdean","audio_url":"https://api.substack.com/feed/podcast/187741497/443b8df57e77c5522b031c52b1302c0d.mp3","summary":"Jeff Dean explains how Google maintains the AI Pareto frontier by simultaneously optimizing for frontier capabilities and extreme efficiency. He details the critical role of hardware-software co-design, distillation, and energy-centric optimization in driving the next generation of low-latency, high-intelligence models.","meta_description":"Google's Jeff Dean discusses the future of AI: from TPU co-design and energy-efficient computing to the era of 10,000 tokens/sec and personalized models.","key_points":["Main idea: Owning the Pareto frontier requires a dual strategy of pushing top-tier reasoning capabilities while using distillation to create highly efficient 'Flash' models","Practical takeaway: Future breakthroughs in model utility will depend on reducing latency by 20-50x to enable real-time agentic workflows and chain-of-thought reasoning","Failure mode: Focusing solely on FLOPs is a mistake; the true bottleneck is energy consumption (picojoules per bit) and the cost of moving data across chips","Technical insight: Speculative decoding and precision reduction are essential tools for amortizing the energy cost of weight transfers during inference","Future vision: The next leap in UX will come from personalized models that can seamlessly retrieve and reason over a user's entire digital history, from emails to videos"],"chapters":[{"start_ms":60000,"title":"The Strategy of the Pareto Frontier","summary":"Jeff discusses the necessity of balancing high-end frontier models with cost-effective, low-latency models through distillation."},{"start_ms":445000,"title":"The Economy of Flash Models","summary":"An exploration of how inference-time scaling and model compression drive the dominance of efficient, small-scale models."},{"start_ms":815000,"title":"Pushing the Context Window Frontier","summary":"A look at Google's progress in expanding context windows to millions of tokens, enabling reasoning across hours of video."},{"start_ms":1200000,"title":"Multimodal Information Extraction","summary":"Discussing the transition of models from simple text processing to extracting structured data from massive video datasets."},{"start_ms":1575000,"title":"Evolution of Semantic Retrieval","summary":"Reflecting on how early search indexing techniques paved the way for modern semantic understanding in LLMs."},{"start_ms":1960000,"title":"Energy-Centric Computing","summary":"Why the true frontier of AI hardware is measured in picojoules per bit and the challenges of data movement on-chip."},{"start_ms":2330000,"title":"Precision and Sparsity in Training","summary":"How reducing bit precision and leveraging sparsity can significantly reduce the energy footprint of large-scale training."},{"start_ms":2700000,"title":"Solving the Reliability Gap","summary":"Addressing the open research problems in making large models more reliable for complex, multi-stage reasoning tasks."}],"topics":["AI Infrastructure","TPU Co-design","Model Distillation","Inference Optimization","Large Language Models","Energy-Efficient Computing","Speculative Decoding","Multimodal AI"],"duration_seconds":5011,"processing_state":"processed","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/owning-the-ai-pareto-frontier-jeff-dean/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/latent-space-ai-engineer/owning-the-ai-pareto-frontier-jeff-dean.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}