{"podcast":{"title":"Latent Space: The AI Engineer Podcast","slug":"latent-space-ai-engineer","podcast_index_feed_id":6058902,"rss_url":"https://api.substack.com/feed/podcast/1084089.rss","website_url":"https://www.latent.space/podcast","image_url":"https://substackcdn.com/feed/podcast/1084089/ca7468da5614a246d2906ee8926f6de7.jpg","author":"Latent.Space","episode_count":204,"summary":"The AI Engineer newsletter + Top technical AI podcast. How leading labs build Agents, Models, Infra, & AI for Science. See https://latent.space/about for highlights from Greg Brockman, Andrej Karpathy, George Hotz, Simon Willison, Soumith Chintala et al!","last_synced_at":null,"page_url":"https://stenobird.com/podcast/latent-space-ai-engineer"},"episode":{"title":"[LIVE] Anthropic Distillation & How Models Cheat (SWE-Bench Dead) | Nathan Lambert & Sebastian Raschka","slug":"live-anthropic-distillation-how-models-cheat-swe-bench-dead-nathan-lambert-sebastian-raschka","published_at":"2026-02-26T20:39:42+00:00","page_url":"https://stenobird.com/podcast/latent-space-ai-engineer/live-anthropic-distillation-how-models-cheat-swe-bench-dead-nathan-lambert-sebastian-raschka","show_page_url":"https://stenobird.com/podcast/latent-space-ai-engineer","url":"https://www.latent.space/p/paid-anthropic-distillation-and-how","audio_url":"https://api.substack.com/feed/podcast/189277598/36ab9328e1269f3111b0531cb589dc26.mp3","summary":"An exploration of the competitive landscape in LLM development, focusing on model distillation and the integrity of benchmarks. The discussion examines how labs use API outputs to train smaller models and the rising issue of models 'cheating' via memorization.","meta_description":"Experts discuss Anthropic's distillation concerns, the economic incentives of API models, and why benchmarks like SWE-bench are facing a crisis of integri…","key_points":["Main idea: Model distillation—training smaller models on the outputs of frontier models—is a primary strategy for labs facing GPU shortages","Failure mode: Benchmarks are becoming unreliable as models may simply memorize training data (honeypots) rather than demonstrating true reasoning","Economic tension: The debate between keeping models proprietary versus using APIs to drive ecosystem growth and user acquisition","Practical takeaway: To maintain benchmark integrity, developers must diversify repositories, update dates, and use more complex, non-static tasks","Industry trend: The shift toward agentic benchmarks that evaluate a model's ability to interact with UIs and computer systems rather than just text completion"],"chapters":[{"start_ms":300000,"title":"The Mechanics of Distillation","summary":"Defining distillation as the process of using large model outputs to train smaller, more efficient models."},{"start_ms":785000,"title":"API Access and Lab Competition","summary":"How AI labs use various APIs to run ablations and the strategic importance of high-quality training data."},{"start_ms":1275000,"title":"The Economics of Model Access","summary":"Analyzing the defensibility of API business models and whether labs should lock models behind proprietary interfaces."},{"start_ms":1750000,"title":"The Crisis of Benchmarking","summary":"A deep dive into SWE-bench and the risk of models passing tests through memorization rather than intelligence."},{"start_ms":2200000,"title":"The Future of Evaluation","summary":"Moving beyond text completion toward evaluating agentic capabilities and UI interaction."},{"start_ms":2660000,"title":"Fixing the Benchmark Pipeline","summary":"Concrete strategies to prevent benchmark contamination, including diversifying languages and updating datasets."}],"topics":["Model Distillation","Anthropic","SWE-bench","LLM Benchmarking","AI Agent Evaluation","Machine Learning Training","API Economics","Synthetic Data"],"duration_seconds":3137,"processing_state":"processed","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/live-anthropic-distillation-how-models-cheat-swe-bench-dead-nathan-lambert-sebastian-raschka/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/latent-space-ai-engineer/live-anthropic-distillation-how-models-cheat-swe-bench-dead-nathan-lambert-sebastian-raschka.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}