{"podcast":{"title":"CoRecursive: Coding Stories","slug":"corecursive-coding-stories","podcast_index_feed_id":987298,"rss_url":"https://corecursive.libsyn.com/feed","website_url":"http://corecursive.com","image_url":"https://static.libsyn.com/p/assets/d/7/a/5/d7a5a500931246e3/Coding_Stories.png","author":"Adam Gordon Bell - Software Developer","episode_count":116,"summary":"The stories and people behind the code. Hear stories of software development from interesting people.","last_synced_at":"2026-06-14T18:20:04.608497+00:00","page_url":"https://stenobird.com/podcast/corecursive-coding-stories"},"episode":{"title":"The Bitter Lesson: The history of reinforcement learning","slug":"the-bitter-lesson-the-history-of-reinforcement-learning","published_at":"2026-06-13T19:28:00+00:00","page_url":"https://stenobird.com/podcast/corecursive-coding-stories/the-bitter-lesson-the-history-of-reinforcement-learning","show_page_url":"https://stenobird.com/podcast/corecursive-coding-stories","url":"https://corecursive.com/the-bitter-lesson/","audio_url":"https://traffic.libsyn.com/secure/corecursive/reward-is-enough.mp3?dest-id=628353","summary":"I've been trying to understand how machine learning actually works. Not use it, understand it, down to the ifs and loops. How does a program built out of plain conditionals get better on its own? So late one night I sent Don a paper. Three words in the title: reward is enough. The claim is that all of intelligence, the whole thing, comes down to a system maximizing a reward. Don thought that was far too reductive. I wanted to pull it apart and see if it held up. We backed up through the history to find out how far \"reward is enough\" really goes: B.F. Skinner training pigeons, a backgammon program that taught itself, the Go move no human would have played. It's a story about machine learning, and what that leaves for the rest of us who still do it by hand. Episode Page Support The Show Subscribe To The Podcast Join The Newsletter","meta_description":"I've been trying to understand how machine learning actually works. Not use it, understand it, down to the ifs and loops. How does a program built out of…","key_points":[],"chapters":[],"topics":[],"duration_seconds":3601,"processing_state":"not_requested","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/corecursive-coding-stories/episodes/the-bitter-lesson-the-history-of-reinforcement-learning/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/corecursive-coding-stories/the-bitter-lesson-the-history-of-reinforcement-learning.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}