{"podcast":{"title":"The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)","slug":"twiml-ai-podcast","podcast_index_feed_id":1045879,"rss_url":"https://feeds.megaphone.fm/MLN2155636147","website_url":"https://twimlai.com","image_url":"https://megaphone.imgix.net/podcasts/35230150-ee98-11eb-ad1a-b38cbabcd053/image/TWIML_AI_Podcast_Official_Cover_Art_1400px.png?ixlib=rails-4.3.1&max-w=3000&max-h=3000&fit=crop&auto=format,compress","author":"TWIML","episode_count":785,"summary":"Machine learning and artificial intelligence are dramatically changing the way businesses operate and people live. The TWIML AI Podcast brings the top minds and ideas from the world of ML and AI to a broad and influential community of ML/AI researchers, data scientists, engineers and tech-savvy business and IT leaders. Hosted by Sam Charrington, a sought after industry analyst, speaker, commentator and thought leader. Technologies covered include machine learning, artificial intelligence, deep learning, natural language processing, neural networks, analytics, computer science, data science and more.","last_synced_at":null,"page_url":"https://stenobird.com/podcast/twiml-ai-podcast"},"episode":{"title":"Is It Time to Rethink LLM Pre-Training? with Aditi Raghunathan - #747","slug":"is-it-time-to-rethink-llm-pre-training-with-aditi-raghunathan-747","published_at":"2025-09-16T18:08:00+00:00","page_url":"https://stenobird.com/podcast/twiml-ai-podcast/is-it-time-to-rethink-llm-pre-training-with-aditi-raghunathan-747","show_page_url":"https://stenobird.com/podcast/twiml-ai-podcast","url":"https://twimlai.com/podcast/twimlai/is-it-time-to-rethink-llm-pre-training/","audio_url":"https://pscrb.fm/rss/p/traffic.megaphone.fm/MLN5916308473.mp3?updated=1758046985","summary":"Next-token prediction limits the creative and reasoning potential of LLMs, often leading to a gap between benchmark performance and real-world utility. This discussion explores new training objectives and architectural interventions to enable structured exploration and more reliable model updates.","meta_description":"Explore the limits of next-token prediction and new methods for LLM creativity, catastrophic overtraining, and controlled information unlearning.","key_points":["Main idea: Next-token prediction struggles with 'leaps of thought' and novel idea generation because it lacks structured exploration","Failure mode: 'Catastrophic overtraining' occurs when increasing training data improves benchmarks but degrades the model's ability to be fine-tuned for new tasks","Practical takeaway: Injecting randomness at the start of generation (Roll the Dice) can help models move beyond predictable, repetitive outputs","Main idea: 'Memorization sinks' offer a way to isolate specific information within MLP layers to enable targeted unlearning and better privacy control","Practical takeaway: Future architectures should aim to disentangle factual memory from reasoning capabilities to make models easier to update"],"chapters":[{"start_ms":65000,"title":"Beyond Next-Token Prediction","summary":"An introduction to Aditi Raghunathan's award-winning research on overcoming the creative limits of current LLM training paradigms."},{"start_ms":335000,"title":"The Benchmark-Utility Gap","summary":"Discussing why high performance on static benchmarks does not necessarily translate to a better user experience or model reliability."},{"start_ms":605000,"title":"Rethinking Pre-training Dynamics","summary":"Examining the relationship between token counts, parameter scale, and the fundamental need to rethink how we approach pre-training."},{"start_ms":870000,"title":"Catastrophic Overtraining","summary":"Exploring the phenomenon where excessive training data can actually reduce a model's plasticity and fine-tuning potential."},{"start_ms":1115000,"title":"Safety and Alignment via Post-training","summary":"Analyzing how post-hoc training methods are used to teach models safety boundaries and desirable behaviors."},{"start_ms":1380000,"title":"Isolating Knowledge in MLP Layers","summary":"A deep dive into using architectural separation to manage memorization and enable the targeted removal of specific information."},{"start_ms":1905000,"title":"The Future of Structured Exploration","summary":"Looking toward the next frontier of AI: building models capable of complex, open-ended tasks and scientific discovery."}],"topics":["Large Language Models","Machine Learning","Next-token prediction","Model Fine-tuning","Artificial Intelligence Research","Neural Network Architecture","Algorithmic Creativity","Information Unlearning"],"duration_seconds":3506,"processing_state":"processed","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/is-it-time-to-rethink-llm-pre-training-with-aditi-raghunathan-747/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/twiml-ai-podcast/is-it-time-to-rethink-llm-pre-training-with-aditi-raghunathan-747.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}