Episode
InterleaveThinker: Reinforcing Agentic Interleaved Generation
- Podcast
- Daily Paper Cast
- Published
- Jun 13, 2026
- Duration seconds
- 1271
- Processing state
not_requested- Canonical source
- https://share.transistor.fm/s/c1b1f49f
Actions
POST https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/interleavethinker-reinforcing-agentic-interleaved-generation/transcription-requests
Idempotently request low-priority transcript generation for this episode.GET https://stenobird.com/podcast/daily-paper-cast-7079649/interleavethinker-reinforcing-agentic-interleaved-generation.md
Read the agent-friendly Markdown representation of this episode resource.
Summary
🤗 Upvotes: 73 | cs.CV Authors: Dian Zheng, Harry Lee, Manyuan Zhang, Kaituo Feng, Zoey Guo, Ray Zhang, Hongsheng Li Title: InterleaveThinker: Reinforcing Agentic Interleaved Generation Arxiv: http://arxiv.org/abs/2606.13679v1 Abstract: Recent image generators have demonstrated impressive photorealism and instruction-following capabilities in single-image generation and editing. However, constrained by their architectures, they cannot achieve interleaved generation (text-image sequence), which has crucial applications in visual narratives, guidance, and embodied manipulation. Even the latest open-source Unified Multimodal Models (UMMs) exhibit limited performance in this regard. In this paper, we introduce InterleaveThinker, the first multi-agent pipeline designed to endow any existing image generator with interleaved generation capabilities. Specifically, we employ a planner agent to organize the image-text input sequence, instructing the image generator on the required execution at each step. Subsequently, we introduce a critic agent to evaluate the generator's outputs, identify samples that deviate from the planned instructions, and refine the instructions for regeneration. To implement this pipeline, we construct the Interleave-Planner-SFT-80k and Interleave-Critic-SFT-112k to perform a format cold-start. Then we develop Interleave-Critic-RL-13k to reinforce the step-wise instruction correction capability within a generation trajectory using GRPO. Since a single interleaved generation trajectory may involve over 25 generator calls, optimizing the entire trajectory is computationally impractical. Therefore, we propose accuracy reward and step-wise reward, allowing single-step RL to effectively guide the entire generation trajectory. The results show that InterleaveTh…