# MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

Page: https://stenobird.com/podcast/daily-paper-cast-7079649/maxproof-scaling-mathematical-proof-with-generative-verifier-rl-and-population-level-test-time-scaling
Text version: https://stenobird.com/podcast/daily-paper-cast-7079649/maxproof-scaling-mathematical-proof-with-generative-verifier-rl-and-population-level-test-time-scaling.md
Podcast: [Daily Paper Cast](https://stenobird.com/podcast/daily-paper-cast-7079649)
Published: 2026-06-13T04:27:52+00:00
Episode link: https://share.transistor.fm/s/aa61258c
Audio file: https://media.transistor.fm/aa61258c/628ca92e.mp3
Processing state: not_requested
JSON: https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/maxproof-scaling-mathematical-proof-with-generative-verifier-rl-and-population-level-test-time-scaling
Duration seconds: 1430

## Resource

🤗 Upvotes: 69 | cs.LG, cs.AI, cs.CL Authors: Jiacheng Chen, Xinyu Zhang, Shunkai Zhang, Yanmohan Wang, Lin Li, Tiancheng Qin, Qin Wang, Zhengmao Zhu, Tianle Li, Jingyang Li, Zehan Li, Binyang Jiang, Jin Zhu, Han Ding, Fei Yu, Chenyu Du, Zijian Song, Jiayuan Song, Zhi Zhang, Yunan Huang, Weiyu Cheng, Pengyu Zhao, Yu Cheng Title: MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling Arxiv: http://arxiv.org/abs/2606.13473v1 Abstract: We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first trains three proof-oriented capabilities -- proof generation, proof verification, and critique-conditioned proof repair -- using a defense-in-depth generative verifier engineered for low false-positive rate. These capabilities are merged into a single released M3 model. At test time, MaxProof treats the model as a generator, verifier, refiner, and ranker, searches over a population of candidate proofs, and returns one final proof through tournament selection. With MaxProof test-time scaling, the M3 model reaches 35/42 on IMO 2025 and 36/42 on USAMO 2026, exceeding the human gold-medal threshold on both.

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/maxproof-scaling-mathematical-proof-with-generative-verifier-rl-and-population-level-test-time-scaling/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/daily-paper-cast-7079649/maxproof-scaling-mathematical-proof-with-generative-verifier-rl-and-population-level-test-time-scaling.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.