Episode

New top score on ARC-AGI-2-pub (29.4%) - Jeremy Berman

Podcast: Machine Learning Street Talk (MLST)
Published: Sep 27, 2025
Duration seconds: 4107
Processing state: processed
Canonical source: https://podcasters.spotify.com/pod/show/machinelearningstreettalk/episodes/New-top-score-on-ARC-AGI-2-pub-29-4---Jeremy-Berman-e38pj96
Audio: https://traffic.megaphone.fm/APO8526044538.mp3
JSON: /v1/public/podcasts/machine-learning-street-talk/episodes/new-top-score-on-arc-agi-2-pub-29-4-jeremy-berman
Markdown: /podcast/machine-learning-street-talk/new-top-score-on-arc-agi-2-pub-29-4-jeremy-berman.md

Actions

POST https://stenobird.com/v1/public/podcasts/machine-learning-street-talk/episodes/new-top-score-on-arc-agi-2-pub-29-4-jeremy-berman/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/machine-learning-street-talk/new-top-score-on-arc-agi-2-pub-29-4-jeremy-berman.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

Jeremy Berman explains how shifting from Python code generation to natural language instructions allowed his system to achieve a top score on the ARC-AGI-2-pub leaderboard. The discussion explores the transition from pattern memorization to true algorithmic reasoning and the potential for models to synthesize new knowledge.

Topics

ARC-AGI
Program Synthesis
Natural Language Processing
Reinforcement Learning
Artificial General Intelligence
Symbolic Reasoning
Evolutionary Algorithms
Machine Learning

Highlights

Main idea: Natural language provides a more expressive programming medium than Python for solving complex visual reasoning tasks
Practical takeaway: In the ARC-AGI-2-pub challenge, a stronger 'checker' model is more critical for success than a stronger 'instruction creator'
Failure mode: Relying solely on pre-training can actually hinder reasoning by encouraging pattern memorization over logical deduction
Technical insight: The trade-off in ARC-AGI-2-pub involves balancing the breadth of the search space with the depth of the instruction complexity
Future vision: True AGI requires a meta-skill for reasoning that allows models to learn and synthesize new skills without losing existing knowledge

Chapters

1:00 The Goal of Knowledge Synthesis: Discussing the need for AI to move beyond data compression toward systems that can integrate and learn new information dynamically.
6:10 Evolutionary Program Synthesis: A look at the transition from program synthesis to reinforcement learning with verifiable feedback.
11:40 The Shift to Natural Language: Why moving from Python to English instructions improved accuracy by increasing the degrees of freedom in the solution space.
17:05 Neural Networks vs. Turing Completeness: Debating whether LLMs possess true intelligence or are simply searching through the space of Turing programs.
22:05 The Challenge of Continual Learning: Exploring the possibility of freezing expert layers to allow for new learning without catastrophic forgetting.
27:35 The Power of Expressive Programs: Analyzing how combining neural networks with a Python terminal can bridge the gap between intuition and execution.
54:10 Pre-training as a Barrier to Reasoning: A provocative take on how massive pre-training might act as a 'consultant' that knows names but lacks deductive capability.