# Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

Page: https://stenobird.com/podcast/twiml-ai-podcast/grokking-generalization-collapse-and-the-dynamics-of-training-deep-neural-networks-with-charles-martin-734
Text version: https://stenobird.com/podcast/twiml-ai-podcast/grokking-generalization-collapse-and-the-dynamics-of-training-deep-neural-networks-with-charles-martin-734.md
Podcast: [The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)](https://stenobird.com/podcast/twiml-ai-podcast)
Published: 2025-06-05T00:10:00+00:00
Episode link: https://twimlai.com/podcast/twimlai/grokking-generalization-collapse-and-the-dynamics-of-training-deep-neural-networks/
Audio file: https://pscrb.fm/rss/p/traffic.megaphone.fm/MLN4861884526.mp3?updated=1749083459
Processing state: failed
JSON: https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/grokking-generalization-collapse-and-the-dynamics-of-training-deep-neural-networks-with-charles-martin-734
Duration seconds: 5121

## Resource

Today, we're joined by Charles Martin, founder of Calculation Consulting, to discuss Weight Watcher, an open-source tool for analyzing and improving Deep Neural Networks (DNNs) based on principles from theoretical physics. We explore the foundations of the Heavy-Tailed Self-Regularization (HTSR) theory that underpins it, which combines random matrix theory and renormalization group ideas to uncover deep insights about model training dynamics. Charles walks us through WeightWatcher’s ability to detect three distinct learning phases—underfitting, grokking, and generalization collapse—and how its signature “layer quality” metric reveals whether individual layers are underfit, overfit, or optimally tuned. Additionally, we dig into the complexities involved in fine-tuning models, the surprising correlation between model optimality and hallucination, the often-underestimated challenges of search relevance, and their implications for RAG. Finally, Charles shares his insights into real-world applications of generative AI and his lessons learned from working in the field. The complete show notes for this episode can be found at https://twimlai.com/go/734.

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/grokking-generalization-collapse-and-the-dynamics-of-training-deep-neural-networks-with-charles-martin-734/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/twiml-ai-podcast/grokking-generalization-collapse-and-the-dynamics-of-training-deep-neural-networks-with-charles-martin-734.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.