# Is It Time to Rethink LLM Pre-Training? with Aditi Raghunathan - #747

Page: https://stenobird.com/podcast/twiml-ai-podcast/is-it-time-to-rethink-llm-pre-training-with-aditi-raghunathan-747
Text version: https://stenobird.com/podcast/twiml-ai-podcast/is-it-time-to-rethink-llm-pre-training-with-aditi-raghunathan-747.md
Podcast: [The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)](https://stenobird.com/podcast/twiml-ai-podcast)
Published: 2025-09-16T18:08:00+00:00
Episode link: https://twimlai.com/podcast/twimlai/is-it-time-to-rethink-llm-pre-training/
Audio file: https://pscrb.fm/rss/p/traffic.megaphone.fm/MLN5916308473.mp3?updated=1758046985
Processing state: processed
JSON: https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/is-it-time-to-rethink-llm-pre-training-with-aditi-raghunathan-747
Duration seconds: 3506

## Resource

Next-token prediction limits the creative and reasoning potential of LLMs, often leading to a gap between benchmark performance and real-world utility. This discussion explores new training objectives and architectural interventions to enable structured exploration and more reliable model updates.

## Highlights
- Main idea: Next-token prediction struggles with 'leaps of thought' and novel idea generation because it lacks structured exploration
- Failure mode: 'Catastrophic overtraining' occurs when increasing training data improves benchmarks but degrades the model's ability to be fine-tuned for new tasks
- Practical takeaway: Injecting randomness at the start of generation (Roll the Dice) can help models move beyond predictable, repetitive outputs
- Main idea: 'Memorization sinks' offer a way to isolate specific information within MLP layers to enable targeted unlearning and better privacy control
- Practical takeaway: Future architectures should aim to disentangle factual memory from reasoning capabilities to make models easier to update

## Topics

Large Language Models, Machine Learning, Next-token prediction, Model Fine-tuning, Artificial Intelligence Research, Neural Network Architecture, Algorithmic Creativity, Information Unlearning

## Chapters
- 1:05 — Beyond Next-Token Prediction: An introduction to Aditi Raghunathan's award-winning research on overcoming the creative limits of current LLM training paradigms.
- 5:35 — The Benchmark-Utility Gap: Discussing why high performance on static benchmarks does not necessarily translate to a better user experience or model reliability.
- 10:05 — Rethinking Pre-training Dynamics: Examining the relationship between token counts, parameter scale, and the fundamental need to rethink how we approach pre-training.
- 14:30 — Catastrophic Overtraining: Exploring the phenomenon where excessive training data can actually reduce a model's plasticity and fine-tuning potential.
- 18:35 — Safety and Alignment via Post-training: Analyzing how post-hoc training methods are used to teach models safety boundaries and desirable behaviors.
- 23:00 — Isolating Knowledge in MLP Layers: A deep dive into using architectural separation to manage memorization and enable the targeted removal of specific information.
- 31:45 — The Future of Structured Exploration: Looking toward the next frontier of AI: building models capable of complex, open-ended tasks and scientific discovery.

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/is-it-time-to-rethink-llm-pre-training-with-aditi-raghunathan-747/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/twiml-ai-podcast/is-it-time-to-rethink-llm-pre-training-with-aditi-raghunathan-747.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.