Episode
Francois Chollet - ARC reflections - NeurIPS 2024
- Published
- Jan 9, 2025
- Duration seconds
- 5206
- Processing state
processed
Actions
POST https://stenobird.com/v1/public/podcasts/machine-learning-street-talk/episodes/francois-chollet-arc-reflections-neurips-2024/transcription-requests
Idempotently request low-priority transcript generation for this episode.GET https://stenobird.com/podcast/machine-learning-street-talk/francois-chollet-arc-reflections-neurips-2024.md
Read the agent-friendly Markdown representation of this episode resource.
Summary
François Chollet discusses the outcomes of the ARC-AGI (Abstraction and Reasoning Corpus) Prize competition in 2024, where accuracy rose from 33% to 55.5% on a private evaluation set. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? They are hosting an event in Zurich on January 9th with the ARChitects, join if you can. Goto https://tufalabs.ai/ *** Read about the recent result on o3 with ARC here (Chollet knew about it at the time of the interview but wasn't allowed to say): https://arcprize.org/blog/oai-o3-pub-breakthrough TOC: 1. Introduction and Opening [00:00:00] 1.1 Deep Learning vs. Symbolic Reasoning: François’s Long-Standing Hybrid View [00:00:48] 1.2 “Why Do They Call You a Symbolist?” – Addressing Misconceptions [00:01:31] 1.3 Defining Reasoning 3. ARC Competition 2024 Results and Evolution [00:07:26] 3.1 ARC Prize 2024: Reflecting on the Narrative Shift Toward System 2 [00:10:29] 3.2 Comparing Private Leaderboard vs. Public Leaderboard Solutions [00:13:17] 3.3 Two Winning Approaches: Deep Learning–Guided Program Synthesis and Test-Time Training 4. Transduction vs. Induction in ARC [00:16:04] 4.1 Test-Time Training, Overfitting Concerns, and Developer-Aware Generalization [00:19:35] 4.2 Gradient Descent Adaptation vs. Discrete Program Search 5. ARC-2 Development and Future Directions [00:23:51] 5.1 Ensemble Methods, Benchmark Flaws, and the Need for ARC-2 [00:25:35] 5.2 Human-Level Performance Metrics a…