# EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

Page: https://stenobird.com/podcast/daily-paper-cast-7079649/envfactory-scaling-tool-use-agents-via-executable-environments-synthesis-and-robust-rl
Text version: https://stenobird.com/podcast/daily-paper-cast-7079649/envfactory-scaling-tool-use-agents-via-executable-environments-synthesis-and-robust-rl.md
Podcast: [Daily Paper Cast](https://stenobird.com/podcast/daily-paper-cast-7079649)
Published: 2026-05-21T04:35:34+00:00
Episode link: https://share.transistor.fm/s/9f618c24
Audio file: https://media.transistor.fm/9f618c24/28d9e7c3.mp3
Processing state: not_requested
JSON: https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/envfactory-scaling-tool-use-agents-via-executable-environments-synthesis-and-robust-rl
Duration seconds: 1642

## Resource

🤗 Upvotes: 41 | cs.CL, cs.LG Authors: Minrui Xu, Zilin Wang, Mengyi DENG, Zhiwei Li, Zhicheng Yang, Xiao Zhu, Yinhong Liu, Boyu Zhu, Baiyu Huang, Chao Chen, Heyuan Deng, Fei Mi, Lifeng Shang, Xingshan Zeng, Zhijiang Guo Title: EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL Arxiv: http://arxiv.org/abs/2605.18703v1 Abstract: Equipping LLMs with tool-use capabilities via Agentic Reinforcement Learning (Agentic RL) is bottlenecked by two challenges: the lack of scalable, robust execution environments and the scarcity of realistic training data that captures implicit human reasoning. Existing approaches depend on costly real-world APIs, hallucination-prone LLM simulators, or synthetic environments that are often single-turn or depend on pre-collected documents. Moreover, synthetic trajectories are frequently over-specified, resembling instruction sequences rather than natural human intents, reducing their effectiveness for RL training. We introduce EnvFactory, a fully automated framework that addresses both challenges. EnvFactory autonomously explores and verifies stateful, executable tool environments from authentic resources, and synthesizes natural multi-turn trajectories through topology-aware sampling and calibrated refinement, producing grounded queries with implicit intents. Using only 85 verified environments across 7 domains, EnvFactory generates 2,575 SFT and RL trajectories. Despite using significantly fewer environments than prior work, which are often 5 times more, EnvFactory achieves superior training efficiency and downstream performance, improving Qwen3-series models by up to +15% on BFCLv3, +8.6% on MCP-Atlas, and +6% on conversational benchmarks including $τ^2$-Bench and VitaBench. By fully automating both environme…

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/envfactory-scaling-tool-use-agents-via-executable-environments-synthesis-and-robust-rl/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/daily-paper-cast-7079649/envfactory-scaling-tool-use-agents-via-executable-environments-synthesis-and-robust-rl.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.