Episode

Is It Time to Rethink LLM Pre-Training? with Aditi Raghunathan - #747

Podcast
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Published
Sep 16, 2025
Duration seconds
3506
Processing state
processed
Canonical source
https://twimlai.com/podcast/twimlai/is-it-time-to-rethink-llm-pre-training/
Audio
https://pscrb.fm/rss/p/traffic.megaphone.fm/MLN5916308473.mp3?updated=1758046985
JSON
/v1/public/podcasts/twiml-ai-podcast/episodes/is-it-time-to-rethink-llm-pre-training-with-aditi-raghunathan-747
Markdown
/podcast/twiml-ai-podcast/is-it-time-to-rethink-llm-pre-training-with-aditi-raghunathan-747.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/is-it-time-to-rethink-llm-pre-training-with-aditi-raghunathan-747/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/twiml-ai-podcast/is-it-time-to-rethink-llm-pre-training-with-aditi-raghunathan-747.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

Next-token prediction limits the creative and reasoning potential of LLMs, often leading to a gap between benchmark performance and real-world utility. This discussion explores new training objectives and architectural interventions to enable structured exploration and more reliable model updates.

Topics

  • Large Language Models
  • Machine Learning
  • Next-token prediction
  • Model Fine-tuning
  • Artificial Intelligence Research
  • Neural Network Architecture
  • Algorithmic Creativity
  • Information Unlearning

Highlights

  • Main idea: Next-token prediction struggles with 'leaps of thought' and novel idea generation because it lacks structured exploration
  • Failure mode: 'Catastrophic overtraining' occurs when increasing training data improves benchmarks but degrades the model's ability to be fine-tuned for new tasks
  • Practical takeaway: Injecting randomness at the start of generation (Roll the Dice) can help models move beyond predictable, repetitive outputs
  • Main idea: 'Memorization sinks' offer a way to isolate specific information within MLP layers to enable targeted unlearning and better privacy control
  • Practical takeaway: Future architectures should aim to disentangle factual memory from reasoning capabilities to make models easier to update

Chapters

  1. 1:05 Beyond Next-Token Prediction: An introduction to Aditi Raghunathan's award-winning research on overcoming the creative limits of current LLM training paradigms.
  2. 5:35 The Benchmark-Utility Gap: Discussing why high performance on static benchmarks does not necessarily translate to a better user experience or model reliability.
  3. 10:05 Rethinking Pre-training Dynamics: Examining the relationship between token counts, parameter scale, and the fundamental need to rethink how we approach pre-training.
  4. 14:30 Catastrophic Overtraining: Exploring the phenomenon where excessive training data can actually reduce a model's plasticity and fine-tuning potential.
  5. 18:35 Safety and Alignment via Post-training: Analyzing how post-hoc training methods are used to teach models safety boundaries and desirable behaviors.
  6. 23:00 Isolating Knowledge in MLP Layers: A deep dive into using architectural separation to manage memorization and enable the targeted removal of specific information.
  7. 31:45 The Future of Structured Exploration: Looking toward the next frontier of AI: building models capable of complex, open-ended tasks and scientific discovery.