Episode

New top score on ARC-AGI-2-pub (29.4%) - Jeremy Berman

Podcast
Machine Learning Street Talk (MLST)
Published
Sep 27, 2025
Duration seconds
4107
Processing state
processed
Canonical source
https://podcasters.spotify.com/pod/show/machinelearningstreettalk/episodes/New-top-score-on-ARC-AGI-2-pub-29-4---Jeremy-Berman-e38pj96
Audio
https://traffic.megaphone.fm/APO8526044538.mp3
JSON
/v1/public/podcasts/machine-learning-street-talk/episodes/new-top-score-on-arc-agi-2-pub-29-4-jeremy-berman
Markdown
/podcast/machine-learning-street-talk/new-top-score-on-arc-agi-2-pub-29-4-jeremy-berman.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/machine-learning-street-talk/episodes/new-top-score-on-arc-agi-2-pub-29-4-jeremy-berman/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/machine-learning-street-talk/new-top-score-on-arc-agi-2-pub-29-4-jeremy-berman.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

Jeremy Berman explains how shifting from Python code generation to natural language instructions allowed his system to achieve a top score on the ARC-AGI-2-pub leaderboard. The discussion explores the transition from pattern memorization to true algorithmic reasoning and the potential for models to synthesize new knowledge.

Topics

  • ARC-AGI
  • Program Synthesis
  • Natural Language Processing
  • Reinforcement Learning
  • Artificial General Intelligence
  • Symbolic Reasoning
  • Evolutionary Algorithms
  • Machine Learning

Highlights

  • Main idea: Natural language provides a more expressive programming medium than Python for solving complex visual reasoning tasks
  • Practical takeaway: In the ARC-AGI-2-pub challenge, a stronger 'checker' model is more critical for success than a stronger 'instruction creator'
  • Failure mode: Relying solely on pre-training can actually hinder reasoning by encouraging pattern memorization over logical deduction
  • Technical insight: The trade-off in ARC-AGI-2-pub involves balancing the breadth of the search space with the depth of the instruction complexity
  • Future vision: True AGI requires a meta-skill for reasoning that allows models to learn and synthesize new skills without losing existing knowledge

Chapters

  1. 1:00 The Goal of Knowledge Synthesis: Discussing the need for AI to move beyond data compression toward systems that can integrate and learn new information dynamically.
  2. 6:10 Evolutionary Program Synthesis: A look at the transition from program synthesis to reinforcement learning with verifiable feedback.
  3. 11:40 The Shift to Natural Language: Why moving from Python to English instructions improved accuracy by increasing the degrees of freedom in the solution space.
  4. 17:05 Neural Networks vs. Turing Completeness: Debating whether LLMs possess true intelligence or are simply searching through the space of Turing programs.
  5. 22:05 The Challenge of Continual Learning: Exploring the possibility of freezing expert layers to allow for new learning without catastrophic forgetting.
  6. 27:35 The Power of Expressive Programs: Analyzing how combining neural networks with a Python terminal can bridge the gap between intuition and execution.
  7. 54:10 Pre-training as a Barrier to Reasoning: A provocative take on how massive pre-training might act as a 'consultant' that knows names but lacks deductive capability.