{"podcast":{"title":"\"The Cognitive Revolution\" | AI Builders, Researchers, and Live Player Analysis","slug":"the-cognitive-revolution","podcast_index_feed_id":6011783,"rss_url":"https://feeds.megaphone.fm/RINTP3108857801","website_url":"https://www.cognitiverevolution.ai/","image_url":"https://megaphone.imgix.net/podcasts/30f818da-c930-11ed-9b4b-1352ca96fb17/image/888e2c534b7c2534213c97e025646932.png?ixlib=rails-4.3.1&max-w=3000&max-h=3000&fit=crop&auto=format,compress","author":"Turpentine","episode_count":346,"summary":"A biweekly podcast where hosts Nathan Labenz and Erik Torenberg interview the builders on the edge of AI and explore the dramatic shift it will unlock in the coming years. The Cognitive Revolution is part of the Turpentine podcast network. To learn more: turpentine.co","last_synced_at":null,"page_url":"https://stenobird.com/podcast/the-cognitive-revolution"},"episode":{"title":"Intelligence with Everyone: RL @ MiniMax, with Olive Song, from AIE NYC & Inference by Turing Post","slug":"intelligence-with-everyone-rl-minimax-with-olive-song-from-aie-nyc-inference-by-turing-post","published_at":"2026-02-22T16:58:00+00:00","page_url":"https://stenobird.com/podcast/the-cognitive-revolution/intelligence-with-everyone-rl-minimax-with-olive-song-from-aie-nyc-inference-by-turing-post","show_page_url":"https://stenobird.com/podcast/the-cognitive-revolution","url":"https://www.cognitiverevolution.ai/intelligence-with-everyone-rl-minimax-with-olive-song-from-aie-nyc-inference-by-turing-post/","audio_url":"https://pdst.fm/e/mgln.ai/e/1113/pscrb.fm/rss/p/traffic.megaphone.fm/RINTP9245442386.mp3?updated=1771777343","summary":"MiniMax researcher Olive Song reveals how tight feedback loops between developers and researchers drive the training of the M-series frontier models. The discussion covers technical breakthroughs in reinforcement learning, including the necessity of FP32 precision to prevent implementation gaps.","meta_description":"Learn how MiniMax uses RL, interleaved thinking, and FP32 precision to train high-performance open-weight models like the M2 series.","key_points":["Main idea: MiniMax leverages a unique structure where researchers and application developers work side-by-side to create tight product feedback loops","Technical breakthrough: The team discovered that running reinforcement learning at FP32 precision was essential to bridge the gap between theoretical algorithms and real-world implementation","Failure mode: Reward hacking remains a constant battle, requiring systematic environment perturbations and robust alignment strategies to prevent models from finding shortcuts","Practical takeaway: Implementing 'interleaved thinking'—allowing models to pause and process environmental feedback—is key to mastering long-horizon agentic tasks","Research approach: MiniMax uses a first-principles approach to debugging, analyzing log probabilities layer-by-layer to diagnose why accuracy fails to scale"],"chapters":[{"start_ms":60000,"title":"Introduction to MiniMax and the M-series","summary":"An introduction to Olive Song and the development of the M-series models that lead the OpenRouter leaderboards."},{"start_ms":320000,"title":"The Developer-Researcher Feedback Loop","summary":"How having in-house developers provides precise rewards and evaluations for training foundation models."},{"start_ms":800000,"title":"Agent Generalization and Tool Scaling","summary":"Exploring the limits of tool scaling and the move toward more robust agentic capabilities."},{"start_ms":1035000,"title":"The Engineering of Reinforcement Learning","summary":"A deep dive into the importance of engineering precision and the fight against reward hacking."},{"start_ms":1325000,"title":"Debugging via Layer-by-Layer Analysis","summary":"The story of discovering implementation gaps by analyzing log probabilities at the layer level."},{"start_ms":1840000,"title":"Alignment and Safety at Scale","summary":"How MiniMax handles large-scale alignment and safety evaluations before model launches."},{"start_ms":2130000,"title":"Long-Horizon Agentic Tasks","summary":"Discussing the implementation of interleaved thinking for complex, multi-step tasks."},{"start_ms":2635000,"title":"The Future of M2.2 and AGI","summary":"Looking ahead to improved multilingual coding and the ultimate goal of human-expert collaboration."}],"topics":["Reinforcement Learning","Large Language Models","MiniMax","AI Agents","Model Alignment","FP32 Precision","Agentic Workflows","Machine Learning Engineering"],"duration_seconds":3329,"processing_state":"processed","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/the-cognitive-revolution/episodes/intelligence-with-everyone-rl-minimax-with-olive-song-from-aie-nyc-inference-by-turing-post/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/the-cognitive-revolution/intelligence-with-everyone-rl-minimax-with-olive-song-from-aie-nyc-inference-by-turing-post.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}