Episode

TMAS: Scaling Test-Time Compute via Multi-Agent Synergy

Podcast
Daily Paper Cast
Published
May 13, 2026
Duration seconds
1397
Processing state
not_requested
Canonical source
https://share.transistor.fm/s/3967f7ba
Audio
https://media.transistor.fm/3967f7ba/0fa39c65.mp3
JSON
/v1/public/podcasts/daily-paper-cast-7079649/episodes/tmas-scaling-test-time-compute-via-multi-agent-synergy
Markdown
/podcast/daily-paper-cast-7079649/tmas-scaling-test-time-compute-via-multi-agent-synergy.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/tmas-scaling-test-time-compute-via-multi-agent-synergy/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/daily-paper-cast-7079649/tmas-scaling-test-time-compute-via-multi-agent-synergy.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

🤗 Upvotes: 44 | cs.AI Authors: George Wu, Nan Jing, Qing Yi, Chuan Hao, Ming Yang, Feng Chang, Yuan Wei, Jian Yang, Ran Tao, Bryan Dai Title: TMAS: Scaling Test-Time Compute via Multi-Agent Synergy Arxiv: http://arxiv.org/abs/2605.10344v1 Abstract: Test-time scaling has become an effective paradigm for improving the reasoning ability of large language models by allocating additional computation during inference. Recent structured approaches have further advanced this paradigm by organizing inference across multiple trajectories, refinement rounds, and verification-based feedback. However, existing structured test-time scaling methods either weakly coordinate parallel reasoning trajectories or rely on noisy historical information without explicitly deciding what should be retained and reused, limiting their ability to balance exploration and exploitation. In this work, we propose TMAS, a framework for scaling test-time compute via multi-agent synergy. TMAS organizes inference as a collaborative process among specialized agents, enabling structured information flow across agents, trajectories, and refinement iterations. To support effective cross-trajectory collaboration, TMAS introduces hierarchical memories: the experience bank reuses low-level reliable intermediate conclusions and local feedback, while the guideline bank records previously explored high-level strategies to steer subsequent rollouts away from redundant reasoning patterns. Furthermore, we design a hybrid reward reinforcement learning scheme tailored to TMAS, which jointly preserves basic reasoning capability, enhances experience utilization, and encourages exploration beyond previously attempted solution strategies. Extensive experiments on challenging reasoning benchmarks demonstrate that TMAS achieves…