# [AIEWF Preview] Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect

Page: https://stenobird.com/podcast/latent-space-ai-engineer/aiewf-preview-multi-turn-rl-for-multi-hour-agents-with-will-brown-prime-intellect
Text version: https://stenobird.com/podcast/latent-space-ai-engineer/aiewf-preview-multi-turn-rl-for-multi-hour-agents-with-will-brown-prime-intellect.md
Podcast: [Latent Space: The AI Engineer Podcast](https://stenobird.com/podcast/latent-space-ai-engineer)
Published: 2025-05-23T15:00:00+00:00
Episode link: https://www.latent.space/p/aiewf-preview-multi-turn-rl-for-multi
Audio file: https://api.substack.com/feed/podcast/186632787/ae482cf4d9bab37796013ff9a2c7b3b3.mp3
Processing state: processed
JSON: https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/aiewf-preview-multi-turn-rl-for-multi-hour-agents-with-will-brown-prime-intellect
Duration seconds: 2398

## Resource

In an otherwise heavy week packed with Microsoft Build, Google I/O, and OpenAI io, the worst kept secret in biglab land was the launch of Claude 4, particularly the triumphant return of Opus, which many had been clamoring for. We will leave the specific Claude 4 recap to AINews, however we think that both Gemini’s progress on Deep Think this week and Claude 4 represent the next frontier of progress on inference time compute/reasoning (at last until GPT5 ships this summer). Will Brown’s talk at AIE NYC and open source work on verifiers have made him one of the most prominent voices able to publicly discuss (aka without the vaguepoasting LoRA they put on you when you join a biglab) the current state of the art in reasoning models and where current SOTA research directions lead. We discussed his latest paper on Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment and he has previewed his AIEWF talk on Agentic RL for those with the temerity to power thru bad meetup audio. Full Video Episode Timestamps 00:00 Introduction to the Podcast and Guests 01:00 Discussion on Claude 4 and AI Models 03:07 Extended Thinking and Tool Use in AI 06:47 Technical Highlights and Model Trustworthiness 10:31 Thinking Budgets and Their Implications 13:38 Controversy Surrounding Opus and AI Ethics 18:49 Reflections on AI Tools and Their Limitations 21:58 The Chaos of Predictive Systems 22:56 Marketing and Safety in AI Models 24:30 Evaluating AI Companies and Their Strategies 25:53 The Role of Academia in AI Evaluations 27:43 Teaching Taste in Research 28:41 Making Educated Bets in AI Research 30:12 Recent Developments in Multi-Turn Tool Use 32:50 Incentivizing Tool Use in AI Models 34:45 The Future of Reward Models in AI39:10 Exploring Flexible Reward Systems This is a…

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/aiewf-preview-multi-turn-rl-for-multi-hour-agents-with-will-brown-prime-intellect/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/latent-space-ai-engineer/aiewf-preview-multi-turn-rl-for-multi-hour-agents-with-will-brown-prime-intellect.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.