# Is It Time to Rethink LLM Pre-Training? with Aditi Raghunathan - #747 Page: https://stenobird.com/podcast/twiml-ai-podcast/is-it-time-to-rethink-llm-pre-training-with-aditi-raghunathan-747 Text version: https://stenobird.com/podcast/twiml-ai-podcast/is-it-time-to-rethink-llm-pre-training-with-aditi-raghunathan-747.md Podcast: [The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)](https://stenobird.com/podcast/twiml-ai-podcast) Published: 2025-09-16T18:08:00+00:00 Episode link: https://twimlai.com/podcast/twimlai/is-it-time-to-rethink-llm-pre-training/ Audio file: https://pscrb.fm/rss/p/traffic.megaphone.fm/MLN5916308473.mp3?updated=1758046985 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/is-it-time-to-rethink-llm-pre-training-with-aditi-raghunathan-747 Duration seconds: 3506 ## Resource Next-token prediction limits the creative and reasoning potential of LLMs, often leading to a gap between benchmark performance and real-world utility. This discussion explores new training objectives and architectural interventions to enable structured exploration and more reliable model updates. ## Highlights - Main idea: Next-token prediction struggles with 'leaps of thought' and novel idea generation because it lacks structured exploration - Failure mode: 'Catastrophic overtraining' occurs when increasing training data improves benchmarks but degrades the model's ability to be fine-tuned for new tasks - Practical takeaway: Injecting randomness at the start of generation (Roll the Dice) can help models move beyond predictable, repetitive outputs - Main idea: 'Memorization sinks' offer a way to isolate specific information within MLP layers to enable targeted unlearning and better privacy control - Practical takeaway: Future architectures should aim to disentangle factual memory from reasoning capabilities to make models easier to update ## Topics Large Language Models, Machine Learning, Next-token prediction, Model Fine-tuning, Artificial Intelligence Research, Neural Network Architecture, Algorithmic Creativity, Information Unlearning ## Chapters - 1:05 — Beyond Next-Token Prediction: An introduction to Aditi Raghunathan's award-winning research on overcoming the creative limits of current LLM training paradigms. - 5:35 — The Benchmark-Utility Gap: Discussing why high performance on static benchmarks does not necessarily translate to a better user experience or model reliability. - 10:05 — Rethinking Pre-training Dynamics: Examining the relationship between token counts, parameter scale, and the fundamental need to rethink how we approach pre-training. - 14:30 — Catastrophic Overtraining: Exploring the phenomenon where excessive training data can actually reduce a model's plasticity and fine-tuning potential. - 18:35 — Safety and Alignment via Post-training: Analyzing how post-hoc training methods are used to teach models safety boundaries and desirable behaviors. - 23:00 — Isolating Knowledge in MLP Layers: A deep dive into using architectural separation to manage memorization and enable the targeted removal of specific information. - 31:45 — The Future of Structured Exploration: Looking toward the next frontier of AI: building models capable of complex, open-ended tasks and scientific discovery. ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/is-it-time-to-rethink-llm-pre-training-with-aditi-raghunathan-747/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/twiml-ai-podcast/is-it-time-to-rethink-llm-pre-training-with-aditi-raghunathan-747.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.