# InterleaveThinker: Reinforcing Agentic Interleaved Generation

Page: https://stenobird.com/podcast/daily-paper-cast-7079649/interleavethinker-reinforcing-agentic-interleaved-generation
Text version: https://stenobird.com/podcast/daily-paper-cast-7079649/interleavethinker-reinforcing-agentic-interleaved-generation.md
Podcast: [Daily Paper Cast](https://stenobird.com/podcast/daily-paper-cast-7079649)
Published: 2026-06-13T04:28:56+00:00
Episode link: https://share.transistor.fm/s/c1b1f49f
Audio file: https://media.transistor.fm/c1b1f49f/2a58ff62.mp3
Processing state: not_requested
JSON: https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/interleavethinker-reinforcing-agentic-interleaved-generation
Duration seconds: 1271

## Resource

🤗 Upvotes: 73 | cs.CV Authors: Dian Zheng, Harry Lee, Manyuan Zhang, Kaituo Feng, Zoey Guo, Ray Zhang, Hongsheng Li Title: InterleaveThinker: Reinforcing Agentic Interleaved Generation Arxiv: http://arxiv.org/abs/2606.13679v1 Abstract: Recent image generators have demonstrated impressive photorealism and instruction-following capabilities in single-image generation and editing. However, constrained by their architectures, they cannot achieve interleaved generation (text-image sequence), which has crucial applications in visual narratives, guidance, and embodied manipulation. Even the latest open-source Unified Multimodal Models (UMMs) exhibit limited performance in this regard. In this paper, we introduce InterleaveThinker, the first multi-agent pipeline designed to endow any existing image generator with interleaved generation capabilities. Specifically, we employ a planner agent to organize the image-text input sequence, instructing the image generator on the required execution at each step. Subsequently, we introduce a critic agent to evaluate the generator's outputs, identify samples that deviate from the planned instructions, and refine the instructions for regeneration. To implement this pipeline, we construct the Interleave-Planner-SFT-80k and Interleave-Critic-SFT-112k to perform a format cold-start. Then we develop Interleave-Critic-RL-13k to reinforce the step-wise instruction correction capability within a generation trajectory using GRPO. Since a single interleaved generation trajectory may involve over 25 generator calls, optimizing the entire trajectory is computationally impractical. Therefore, we propose accuracy reward and step-wise reward, allowing single-step RL to effectively guide the entire generation trajectory. The results show that InterleaveTh…

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/interleavethinker-reinforcing-agentic-interleaved-generation/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/daily-paper-cast-7079649/interleavethinker-reinforcing-agentic-interleaved-generation.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.