# #235 - Sonnet 4.6, Deep-thinking tokens, Anthropic vs Pentagon Page: https://stenobird.com/podcast/last-week-in-ai/235-sonnet-4-6-deep-thinking-tokens-anthropic-vs-pentagon Text version: https://stenobird.com/podcast/last-week-in-ai/235-sonnet-4-6-deep-thinking-tokens-anthropic-vs-pentagon.md Podcast: [Last Week in AI](https://stenobird.com/podcast/last-week-in-ai) Published: 2026-03-03T08:00:00+00:00 Episode link: https://rss.art19.com/episodes/2ffc750f-2f06-4af6-8fa5-439366064b2f.mp3?rss_browser=BAhJIg90cmFuc2NyaWJyBjoGRVQ%3D--952c5701c84ad333c69d5faa668f8177091704f0 Audio file: https://rss.art19.com/episodes/2ffc750f-2f06-4af6-8fa5-439366064b2f.mp3?rss_browser=BAhJIg90cmFuc2NyaWJyBjoGRVQ%3D--952c5701c84ad333c69d5faa668f8177091704f0 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/last-week-in-ai/episodes/235-sonnet-4-6-deep-thinking-tokens-anthropic-vs-pentagon Duration seconds: 6108 ## Resource A deep dive into the next generation of frontier models, focusing on Anthropic's Sonnet 4.6 and Google's Gemini 3.1 Pro performance on ARC-AGI-2. The episode also explores the technical mechanics of 'deep-thinking tokens' and the geopolitical tensions surrounding AI infrastructure and defense contracts. ## Highlights - Main idea: 'Deep-thinking tokens' serve as a measurable signal for model reasoning, where high fluctuation in intermediate layers correlates with increased accuracy - Practical takeaway: The rise of multi-agent coordinators, like Perplexity's 'Computer,' marks a shift from single-model usage to agentic orchestration - Failure mode: Distillation attacks pose a significant security risk, potentially allowing adversaries to rapidly replicate frontier model capabilities - Main idea: China is bypassing GPU constraints by using advanced packaging and networking techniques to scale 7nm/5nm wafer output - Geopolitical tension: The debate intensifies over AI labs' responsibilities regarding government contracts, specifically Anthropic's relationship with the Pentagon ## Topics Anthropic Sonnet, Gemini 3.1 Pro, ARC-AGI-2, Deep-thinking tokens, AI Agents, Machine Learning Interpretability, AI Geopolitics, Model Distillation ## Chapters - 9:20 — Frontier Model Benchmarks: Analysis of Sonnet 4.6 and Gemini 3.1 Pro performance on the ARC-AGI-2 reasoning benchmark. - 17:00 — The Rise of AI Agents: Discussion on xAI's Grok 4.2 beta and the emergence of multi-agent systems like Perplexity's 'Computer'. - 32:40 — Mechanics of Deep Thinking: A technical breakdown of how token fluctuation and Jensen-Shannon divergence can signal active model reasoning. - 40:40 — Global Compute & Infrastructure: Examining China's chip packaging strategies and the massive capital investments in specialized AI hardware. - 1:20:05 — AI Security & Geopolitics: The impact of distillation attacks and the ethical dilemmas of AI labs fulfilling defense-related contracts. ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/last-week-in-ai/episodes/235-sonnet-4-6-deep-thinking-tokens-anthropic-vs-pentagon/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/last-week-in-ai/235-sonnet-4-6-deep-thinking-tokens-anthropic-vs-pentagon.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.