Episode

Reward Models | Data Brew | Episode 40

Podcast
Data Brew by Databricks
Published
Mar 20, 2025
Duration seconds
2398
Processing state
processed
Canonical source
https://www.buzzsprout.com/1370119/episodes/16181357-reward-models-data-brew-episode-40.mp3
Audio
https://www.buzzsprout.com/1370119/episodes/16181357-reward-models-data-brew-episode-40.mp3
JSON
/v1/public/podcasts/data-brew-by-databricks/episodes/reward-models-data-brew-episode-40
Markdown
/podcast/data-brew-by-databricks/reward-models-data-brew-episode-40.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/data-brew-by-databricks/episodes/reward-models-data-brew-episode-40/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/data-brew-by-databricks/reward-models-data-brew-episode-40.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF). Highlights include: - How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes. - Techniques like Policy Proximal Optimization (PPO) and Direct Preference Optimization (DPO) for enhancing response quality. - The role of reward models in improving ...