{"podcast":{"title":"Latent Space: The AI Engineer Podcast","slug":"latent-space-ai-engineer","podcast_index_feed_id":6058902,"rss_url":"https://api.substack.com/feed/podcast/1084089.rss","website_url":"https://www.latent.space/podcast","image_url":"https://substackcdn.com/feed/podcast/1084089/ca7468da5614a246d2906ee8926f6de7.jpg","author":"Latent.Space","episode_count":204,"summary":"The AI Engineer newsletter + Top technical AI podcast. How leading labs build Agents, Models, Infra, & AI for Science. See https://latent.space/about for highlights from Greg Brockman, Andrej Karpathy, George Hotz, Simon Willison, Soumith Chintala et al!","last_synced_at":null,"page_url":"https://stenobird.com/podcast/latent-space-ai-engineer"},"episode":{"title":"Why RL Won — Kyle Corbitt, OpenPipe (acq. CoreWeave)","slug":"why-rl-won-kyle-corbitt-openpipe-acq-coreweave","published_at":"2025-10-16T15:00:00+00:00","page_url":"https://stenobird.com/podcast/latent-space-ai-engineer/why-rl-won-kyle-corbitt-openpipe-acq-coreweave","show_page_url":"https://stenobird.com/podcast/latent-space-ai-engineer","url":"https://www.latent.space/p/why-rl-won-kyle-corbitt-openpipe","audio_url":"https://api.substack.com/feed/podcast/186621798/24731bd17b981bb97f76bae1e8c78d14.mp3","summary":"In this deep dive with Kyle Corbitt , co-founder and CEO of OpenPipe (recently acquired by CoreWeave), we explore the evolution of fine-tuning in the age of AI agents and the critical shift from supervised fine-tuning to reinforcement learning. Kyle shares his journey from leading YC’s Startup School to building OpenPipe, initially focused on distilling expensive GPT-4 workflows into smaller, cheaper models before pivoting to RL-based agent training as frontier model prices plummeted. The conversation reveals why 90% of AI projects remain stuck in proof-of-concept purgatory - not due to capability limitations, but reliability issues that Kyle believes can be solved through continuous learning from real-world experience. He discusses the breakthrough of RULER (Relative Universal Reinforcement Learning Elicited Rewards), which uses LLMs as judges to rank agent behaviors relatively rather than absolutely, making RL training accessible without complex reward engineering. Kyle candidly assesses the challenges of building realistic training environments for agents, explaining why GRPO (despite its advantages) may be a dead end due to its requirement for perfectly reproducible parallel rollouts. He shares insights on why LoRAs remain underrated for production deployments, why GEPA and prompt optimization haven’t lived up to the hype in his testing, and why the hardest part of deploying agents isn’t the AI - it’s sandboxing real-world systems with all their bugs and edge cases intact. The discussion also covers OpenPipe’s acquisition by CoreWeave , the launch of their serverless reinforcement learning platform, and Kyle’s vision for a future where every deployed agent continuously learns from production experience. He predicts that solving the reliability problem through conti…","meta_description":"In this deep dive with Kyle Corbitt , co-founder and CEO of OpenPipe (recently acquired by CoreWeave), we explore the evolution of fine-tuning in the age…","key_points":[],"chapters":[],"topics":[],"duration_seconds":4103,"processing_state":"processed","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/why-rl-won-kyle-corbitt-openpipe-acq-coreweave/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/latent-space-ai-engineer/why-rl-won-kyle-corbitt-openpipe-acq-coreweave.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}