# Reward Models | Data Brew | Episode 40 Page: https://stenobird.com/podcast/data-brew-by-databricks/reward-models-data-brew-episode-40 Text version: https://stenobird.com/podcast/data-brew-by-databricks/reward-models-data-brew-episode-40.md Podcast: [Data Brew by Databricks](https://stenobird.com/podcast/data-brew-by-databricks) Published: 2025-03-20T16:00:00+00:00 Episode link: https://www.buzzsprout.com/1370119/episodes/16181357-reward-models-data-brew-episode-40.mp3 Audio file: https://www.buzzsprout.com/1370119/episodes/16181357-reward-models-data-brew-episode-40.mp3 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/data-brew-by-databricks/episodes/reward-models-data-brew-episode-40 Duration seconds: 2398 ## Resource In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF). Highlights include: - How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes. - Techniques like Policy Proximal Optimization (PPO) and Direct Preference Optimization (DPO) for enhancing response quality. - The role of reward models in improving ... ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/data-brew-by-databricks/episodes/reward-models-data-brew-episode-40/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/data-brew-by-databricks/reward-models-data-brew-episode-40.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.