Episode
New top score on ARC-AGI-2-pub (29.4%) - Jeremy Berman
- Published
- Sep 27, 2025
- Duration seconds
- 4107
- Processing state
processed
Actions
POST https://stenobird.com/v1/public/podcasts/machine-learning-street-talk/episodes/new-top-score-on-arc-agi-2-pub-29-4-jeremy-berman/transcription-requests
Idempotently request low-priority transcript generation for this episode.GET https://stenobird.com/podcast/machine-learning-street-talk/new-top-score-on-arc-agi-2-pub-29-4-jeremy-berman.md
Read the agent-friendly Markdown representation of this episode resource.
Summary
Jeremy Berman explains how shifting from Python code generation to natural language instructions allowed his system to achieve a top score on the ARC-AGI-2-pub leaderboard. The discussion explores the transition from pattern memorization to true algorithmic reasoning and the potential for models to synthesize new knowledge.
Topics
- ARC-AGI
- Program Synthesis
- Natural Language Processing
- Reinforcement Learning
- Artificial General Intelligence
- Symbolic Reasoning
- Evolutionary Algorithms
- Machine Learning
Highlights
- Main idea: Natural language provides a more expressive programming medium than Python for solving complex visual reasoning tasks
- Practical takeaway: In the ARC-AGI-2-pub challenge, a stronger 'checker' model is more critical for success than a stronger 'instruction creator'
- Failure mode: Relying solely on pre-training can actually hinder reasoning by encouraging pattern memorization over logical deduction
- Technical insight: The trade-off in ARC-AGI-2-pub involves balancing the breadth of the search space with the depth of the instruction complexity
- Future vision: True AGI requires a meta-skill for reasoning that allows models to learn and synthesize new skills without losing existing knowledge
Chapters
1:00The Goal of Knowledge Synthesis: Discussing the need for AI to move beyond data compression toward systems that can integrate and learn new information dynamically.6:10Evolutionary Program Synthesis: A look at the transition from program synthesis to reinforcement learning with verifiable feedback.11:40The Shift to Natural Language: Why moving from Python to English instructions improved accuracy by increasing the degrees of freedom in the solution space.17:05Neural Networks vs. Turing Completeness: Debating whether LLMs possess true intelligence or are simply searching through the space of Turing programs.22:05The Challenge of Continual Learning: Exploring the possibility of freezing expert layers to allow for new learning without catastrophic forgetting.27:35The Power of Expressive Programs: Analyzing how combining neural networks with a Python terminal can bridge the gap between intuition and execution.54:10Pre-training as a Barrier to Reasoning: A provocative take on how massive pre-training might act as a 'consultant' that knows names but lacks deductive capability.