Episode

He Co-Invented the Transformer. Now: Continuous Thought Machines - Llion Jones and Luke Darlow [Sakana AI]

Podcast: Machine Learning Street Talk (MLST)
Published: Nov 23, 2025
Duration seconds: 4359
Processing state: processed
Canonical source: https://podcasters.spotify.com/pod/show/machinelearningstreettalk/episodes/He-Co-Invented-the-Transformer--Now-Continuous-Thought-Machines---Llion-Jones-and-Luke-Darlow-Sakana-AI-e3bbt96
Audio: https://traffic.megaphone.fm/APO6903071163.mp3
JSON: /v1/public/podcasts/machine-learning-street-talk/episodes/he-co-invented-the-transformer-now-continuous-thought-machines-llion-jones-and-luke-darlow-sakana-ai
Markdown: /podcast/machine-learning-street-talk/he-co-invented-the-transformer-now-continuous-thought-machines-llion-jones-and-luke-darlow-sakana-ai.md

Actions

POST https://stenobird.com/v1/public/podcasts/machine-learning-street-talk/episodes/he-co-invented-the-transformer-now-continuous-thought-machines-llion-jones-and-luke-darlow-sakana-ai/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/machine-learning-street-talk/he-co-invented-the-transformer-now-continuous-thought-machines-llion-jones-and-luke-darlow-sakana-ai.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

Llion Jones, co-inventor of the Transformer, argues that current scaling laws are trapping AI in a local minimum of pattern matching rather than true reasoning. He and Luke Darlow introduce Continuous Thought Machines (CTM) as a biologically-inspired alternative that allows models to 'ponder' and process information step-by-step.

Topics

Transformer Architecture
Continuous Thought Machines
Sakana AI
Adaptive Computation
Neural Network Architecture
Machine Learning Research
Artificial General Intelligence
Pattern Recognition

Highlights

Main idea: The Transformer architecture excels at pattern recognition but lacks the ability to genuinely 'think' through complex, multi-step problems
Failure mode: Current LLMs use 'brute force' scaling to mimic complex shapes or logic, effectively faking understanding through high-dimensional straight lines
Practical takeaway: Continuous Thought Machines (CTM) enable adaptive computation, allowing a model to spend more time on harder tasks by 'walking' through a problem
Technical insight: CTM uses a self-bootstrapping mechanism where the model is trained to predict only the next step in a sequence it has already partially mastered
Research philosophy: Moving away from 'architecture lottery' and fixed-compute models toward systems that can naturally backtrack and correct errors

Chapters

1:05 Stepping Back from Transformers: Llion Jones discusses the shift in AI research from the open-ended exploration of the Transformer era to the current era of reduced research freedom.
6:40 The Era of Technology Capture: An exploration of how the ubiquity of the Transformer architecture may be creating a 'local minimum' in AI development.
17:15 The Limits of Scaling: A critique of how current models can produce clearly incorrect outputs despite massive scale, signaling a fundamental architectural flaw.
28:40 Introducing Continuous Thought Machines: A deep dive into the CTM architecture and how it differs from the 'instantaneous' processing of standard Transformers.
34:00 Adaptive Computation & Maze Solving: Using the maze analogy to explain how CTM can use attention to retrieve information and 'think' through steps sequentially.
39:40 Technical Deep Dive: CTM Architecture: A technical look at neuron synchronization and measuring activations over time within the CTM framework.
55:45 The Future of AI Research: Advice for young researchers on navigating the 'maze' of AI and the importance of pursuing passion-driven, bottom-up research.