# Intelligence with Everyone: RL @ MiniMax, with Olive Song, from AIE NYC & Inference by Turing Post

Page: https://stenobird.com/podcast/the-cognitive-revolution/intelligence-with-everyone-rl-minimax-with-olive-song-from-aie-nyc-inference-by-turing-post
Text version: https://stenobird.com/podcast/the-cognitive-revolution/intelligence-with-everyone-rl-minimax-with-olive-song-from-aie-nyc-inference-by-turing-post.md
Podcast: ["The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis](https://stenobird.com/podcast/the-cognitive-revolution)
Published: 2026-02-22T16:58:00+00:00
Episode link: https://www.cognitiverevolution.ai/intelligence-with-everyone-rl-minimax-with-olive-song-from-aie-nyc-inference-by-turing-post/
Audio file: https://pdst.fm/e/mgln.ai/e/1113/pscrb.fm/rss/p/traffic.megaphone.fm/RINTP9245442386.mp3?updated=1771777343
Processing state: processed
JSON: https://stenobird.com/v1/public/podcasts/the-cognitive-revolution/episodes/intelligence-with-everyone-rl-minimax-with-olive-song-from-aie-nyc-inference-by-turing-post
Duration seconds: 3329

## Resource

MiniMax researcher Olive Song reveals how tight feedback loops between developers and researchers drive the training of the M-series frontier models. The discussion covers technical breakthroughs in reinforcement learning, including the necessity of FP32 precision to prevent implementation gaps.

## Highlights
- Main idea: MiniMax leverages a unique structure where researchers and application developers work side-by-side to create tight product feedback loops
- Technical breakthrough: The team discovered that running reinforcement learning at FP32 precision was essential to bridge the gap between theoretical algorithms and real-world implementation
- Failure mode: Reward hacking remains a constant battle, requiring systematic environment perturbations and robust alignment strategies to prevent models from finding shortcuts
- Practical takeaway: Implementing 'interleaved thinking'—allowing models to pause and process environmental feedback—is key to mastering long-horizon agentic tasks
- Research approach: MiniMax uses a first-principles approach to debugging, analyzing log probabilities layer-by-layer to diagnose why accuracy fails to scale

## Topics

Reinforcement Learning, Large Language Models, MiniMax, AI Agents, Model Alignment, FP32 Precision, Agentic Workflows, Machine Learning Engineering

## Chapters
- 1:00 — Introduction to MiniMax and the M-series: An introduction to Olive Song and the development of the M-series models that lead the OpenRouter leaderboards.
- 5:20 — The Developer-Researcher Feedback Loop: How having in-house developers provides precise rewards and evaluations for training foundation models.
- 13:20 — Agent Generalization and Tool Scaling: Exploring the limits of tool scaling and the move toward more robust agentic capabilities.
- 17:15 — The Engineering of Reinforcement Learning: A deep dive into the importance of engineering precision and the fight against reward hacking.
- 22:05 — Debugging via Layer-by-Layer Analysis: The story of discovering implementation gaps by analyzing log probabilities at the layer level.
- 30:40 — Alignment and Safety at Scale: How MiniMax handles large-scale alignment and safety evaluations before model launches.
- 35:30 — Long-Horizon Agentic Tasks: Discussing the implementation of interleaved thinking for complex, multi-step tasks.
- 43:55 — The Future of M2.2 and AGI: Looking ahead to improved multilingual coding and the ultimate goal of human-expert collaboration.

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/the-cognitive-revolution/episodes/intelligence-with-everyone-rl-minimax-with-olive-song-from-aie-nyc-inference-by-turing-post/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/the-cognitive-revolution/intelligence-with-everyone-rl-minimax-with-olive-song-from-aie-nyc-inference-by-turing-post.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.