# Intelligence with Everyone: RL @ MiniMax, with Olive Song, from AIE NYC & Inference by Turing Post Page: https://stenobird.com/podcast/the-cognitive-revolution/intelligence-with-everyone-rl-minimax-with-olive-song-from-aie-nyc-inference-by-turing-post Text version: https://stenobird.com/podcast/the-cognitive-revolution/intelligence-with-everyone-rl-minimax-with-olive-song-from-aie-nyc-inference-by-turing-post.md Podcast: ["The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis](https://stenobird.com/podcast/the-cognitive-revolution) Published: 2026-02-22T16:58:00+00:00 Episode link: https://www.cognitiverevolution.ai/intelligence-with-everyone-rl-minimax-with-olive-song-from-aie-nyc-inference-by-turing-post/ Audio file: https://pdst.fm/e/mgln.ai/e/1113/pscrb.fm/rss/p/traffic.megaphone.fm/RINTP9245442386.mp3?updated=1771777343 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/the-cognitive-revolution/episodes/intelligence-with-everyone-rl-minimax-with-olive-song-from-aie-nyc-inference-by-turing-post Duration seconds: 3329 ## Resource MiniMax researcher Olive Song reveals how tight feedback loops between developers and researchers drive the training of the M-series frontier models. The discussion covers technical breakthroughs in reinforcement learning, including the necessity of FP32 precision to prevent implementation gaps. ## Highlights - Main idea: MiniMax leverages a unique structure where researchers and application developers work side-by-side to create tight product feedback loops - Technical breakthrough: The team discovered that running reinforcement learning at FP32 precision was essential to bridge the gap between theoretical algorithms and real-world implementation - Failure mode: Reward hacking remains a constant battle, requiring systematic environment perturbations and robust alignment strategies to prevent models from finding shortcuts - Practical takeaway: Implementing 'interleaved thinking'—allowing models to pause and process environmental feedback—is key to mastering long-horizon agentic tasks - Research approach: MiniMax uses a first-principles approach to debugging, analyzing log probabilities layer-by-layer to diagnose why accuracy fails to scale ## Topics Reinforcement Learning, Large Language Models, MiniMax, AI Agents, Model Alignment, FP32 Precision, Agentic Workflows, Machine Learning Engineering ## Chapters - 1:00 — Introduction to MiniMax and the M-series: An introduction to Olive Song and the development of the M-series models that lead the OpenRouter leaderboards. - 5:20 — The Developer-Researcher Feedback Loop: How having in-house developers provides precise rewards and evaluations for training foundation models. - 13:20 — Agent Generalization and Tool Scaling: Exploring the limits of tool scaling and the move toward more robust agentic capabilities. - 17:15 — The Engineering of Reinforcement Learning: A deep dive into the importance of engineering precision and the fight against reward hacking. - 22:05 — Debugging via Layer-by-Layer Analysis: The story of discovering implementation gaps by analyzing log probabilities at the layer level. - 30:40 — Alignment and Safety at Scale: How MiniMax handles large-scale alignment and safety evaluations before model launches. - 35:30 — Long-Horizon Agentic Tasks: Discussing the implementation of interleaved thinking for complex, multi-step tasks. - 43:55 — The Future of M2.2 and AGI: Looking ahead to improved multilingual coding and the ultimate goal of human-expert collaboration. ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/the-cognitive-revolution/episodes/intelligence-with-everyone-rl-minimax-with-olive-song-from-aie-nyc-inference-by-turing-post/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/the-cognitive-revolution/intelligence-with-everyone-rl-minimax-with-olive-song-from-aie-nyc-inference-by-turing-post.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.