{"podcast":{"title":"Latent Space: The AI Engineer Podcast","slug":"latent-space-ai-engineer","podcast_index_feed_id":6058902,"rss_url":"https://api.substack.com/feed/podcast/1084089.rss","website_url":"https://www.latent.space/podcast","image_url":"https://substackcdn.com/feed/podcast/1084089/ca7468da5614a246d2906ee8926f6de7.jpg","author":"Latent.Space","episode_count":204,"summary":"The AI Engineer newsletter + Top technical AI podcast. How leading labs build Agents, Models, Infra, & AI for Science. See https://latent.space/about for highlights from Greg Brockman, Andrej Karpathy, George Hotz, Simon Willison, Soumith Chintala et al!","last_synced_at":null,"page_url":"https://stenobird.com/podcast/latent-space-ai-engineer"},"episode":{"title":"[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI","slug":"state-of-post-training-from-gpt-4-1-to-5-1-rlvr-agent-token-efficiency-josh-mcgrath-openai","published_at":"2025-12-31T14:00:00+00:00","page_url":"https://stenobird.com/podcast/latent-space-ai-engineer/state-of-post-training-from-gpt-4-1-to-5-1-rlvr-agent-token-efficiency-josh-mcgrath-openai","show_page_url":"https://stenobird.com/podcast/latent-space-ai-engineer","url":"https://www.latent.space/p/state-of-post-training-from-gpt-41","audio_url":"https://api.substack.com/feed/podcast/186610564/4944e1f91a0d0d17e5525fb297469684.mp3","summary":"From pre-training data curation to shipping GPT-4o , o1 , o3 , and now GPT-5 thinking and the shopping model , Josh McGrath has lived through the full arc of OpenAI’s post-training evolution—from the PPO vs DPO debates of 2023 to today’s RLVR era, where the real innovation isn’t optimization methods but data quality, signal trust, and token efficiency . We sat down with Josh at NeurIPS 2025 to dig into the state of post-training heading into 2026: why RLHF and RLVR are both just policy gradient methods (the difference is the input data, not the math), how GRPO from DeepSeek Math was underappreciated as a shift toward more trustworthy reward signals (math answers you can verify vs. human preference you can’t), why token efficiency matters more than wall-clock time (GPT-5 to 5.1 bumped evals and slashed tokens), how Codex has changed his workflow so much he feels “trapped” by 40-minute design sessions followed by 15-minute agent sprints, the infrastructure chaos of scaling RL (”way more moving parts than pre-training”), why long context will keep climbing but agents + graph walks might matter more than 10M-token windows, the shopping model as a test bed for interruptability and chain-of-thought transparency, why personality toggles (Anton vs Clippy) are a real differentiator users care about, and his thesis that the education system isn’t producing enough people who can do both distributed systems and ML research —the exact skill set required to push the frontier when the bottleneck moves every few weeks. We discuss: * Josh’s path: pre-training data curation → post-training researcher at OpenAI , shipping GPT-4o, o1, o3, GPT-5 thinking, and the shopping model * Why he switched from pre-training to post-training: “Do I want to make 3% compute efficiency wins, or change be…","meta_description":"From pre-training data curation to shipping GPT-4o , o1 , o3 , and now GPT-5 thinking and the shopping model , Josh McGrath has lived through the full arc…","key_points":[],"chapters":[],"topics":[],"duration_seconds":1654,"processing_state":"processed","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/state-of-post-training-from-gpt-4-1-to-5-1-rlvr-agent-token-efficiency-josh-mcgrath-openai/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/latent-space-ai-engineer/state-of-post-training-from-gpt-4-1-to-5-1-rlvr-agent-token-efficiency-josh-mcgrath-openai.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}