Episode

He Co-Invented the Transformer. Now: Continuous Thought Machines - Llion Jones and Luke Darlow [Sakana AI]

Podcast
Machine Learning Street Talk (MLST)
Published
Nov 23, 2025
Duration seconds
4359
Processing state
processed
Canonical source
https://podcasters.spotify.com/pod/show/machinelearningstreettalk/episodes/He-Co-Invented-the-Transformer--Now-Continuous-Thought-Machines---Llion-Jones-and-Luke-Darlow-Sakana-AI-e3bbt96
Audio
https://traffic.megaphone.fm/APO6903071163.mp3
JSON
/v1/public/podcasts/machine-learning-street-talk/episodes/he-co-invented-the-transformer-now-continuous-thought-machines-llion-jones-and-luke-darlow-sakana-ai
Markdown
/podcast/machine-learning-street-talk/he-co-invented-the-transformer-now-continuous-thought-machines-llion-jones-and-luke-darlow-sakana-ai.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/machine-learning-street-talk/episodes/he-co-invented-the-transformer-now-continuous-thought-machines-llion-jones-and-luke-darlow-sakana-ai/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/machine-learning-street-talk/he-co-invented-the-transformer-now-continuous-thought-machines-llion-jones-and-luke-darlow-sakana-ai.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

Llion Jones, co-inventor of the Transformer, argues that current scaling laws are trapping AI in a local minimum of pattern matching rather than true reasoning. He and Luke Darlow introduce Continuous Thought Machines (CTM) as a biologically-inspired alternative that allows models to 'ponder' and process information step-by-step.

Topics

  • Transformer Architecture
  • Continuous Thought Machines
  • Sakana AI
  • Adaptive Computation
  • Neural Network Architecture
  • Machine Learning Research
  • Artificial General Intelligence
  • Pattern Recognition

Highlights

  • Main idea: The Transformer architecture excels at pattern recognition but lacks the ability to genuinely 'think' through complex, multi-step problems
  • Failure mode: Current LLMs use 'brute force' scaling to mimic complex shapes or logic, effectively faking understanding through high-dimensional straight lines
  • Practical takeaway: Continuous Thought Machines (CTM) enable adaptive computation, allowing a model to spend more time on harder tasks by 'walking' through a problem
  • Technical insight: CTM uses a self-bootstrapping mechanism where the model is trained to predict only the next step in a sequence it has already partially mastered
  • Research philosophy: Moving away from 'architecture lottery' and fixed-compute models toward systems that can naturally backtrack and correct errors

Chapters

  1. 1:05 Stepping Back from Transformers: Llion Jones discusses the shift in AI research from the open-ended exploration of the Transformer era to the current era of reduced research freedom.
  2. 6:40 The Era of Technology Capture: An exploration of how the ubiquity of the Transformer architecture may be creating a 'local minimum' in AI development.
  3. 17:15 The Limits of Scaling: A critique of how current models can produce clearly incorrect outputs despite massive scale, signaling a fundamental architectural flaw.
  4. 28:40 Introducing Continuous Thought Machines: A deep dive into the CTM architecture and how it differs from the 'instantaneous' processing of standard Transformers.
  5. 34:00 Adaptive Computation & Maze Solving: Using the maze analogy to explain how CTM can use attention to retrieve information and 'think' through steps sequentially.
  6. 39:40 Technical Deep Dive: CTM Architecture: A technical look at neuron synchronization and measuring activations over time within the CTM framework.
  7. 55:45 The Future of AI Research: Advice for young researchers on navigating the 'maze' of AI and the importance of pursuing passion-driven, bottom-up research.